![Guidewire](https://media.trabajo.org/img/noimg.jpg)
Site Reliability Engineer III
1 month ago
Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments
Oversee and automate the team’s growing presence in AWS
Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc
Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform
Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems
Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure
Create system documentation and training materials to empower and educate our fellow team members
Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure
Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues
Enhance platform observability with helping create a self-healing approach to platform reliability
Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product
REQUIRED SKILLS AND EXPERIENCEEducation and Work Experience: Bachelor’s Degree in Computer Science or related field.
Software engineering and task automation skills with Bash, Python, and/or Go are a must.
Familiarity with the Agile software development lifecycle.
Deep background with Linux systems and engineering.
Highly experienced with engineering and automating on Amazon Web Services (AWS).
Experience supporting web applications running on Java / Apache / Tomcat in a live production environment.
Prior experience with IaC tools like Terraform/Terragrunt/Terraspace.
Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions.
Production-At-Scale support background in a heavily microservice-based world.
Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking).
Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta).
Seasoned expertise around certificate technology and basic concepts of encryption.
Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS.
Advanced exposure to application development, web UI (design and development), JSON, application architecture.
Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty.
Familiarity with event store/stream-processing technologies like Kafka or AWS SQS.
Understanding of Open Application Model systems such as KubeVela or Crossplane.
Personal Qualities and Soft Skills:
You greatly prefer writing code than clicking a GUI.
You enjoy teaching, being a mentor to others, and working across boundaries.
Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving.
Strong analytical mind with a penchant for process development and enhancement.
A highly positive can-do attitude with desire for being a team player.
Great communication skills and ability to explain complex technical concepts to a varied audience.
Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments.
Other Requirements:
Ability to read, write, and speak English.
We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support.
Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings.
#J-18808-LjbffrWe have other current jobs related to this field that you can find below
-
Site Reliability Engineer III
5 days ago
Old Toronto, Canada Rakuten Kobo Full timeThe Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...
-
Site Reliability Engineer III
5 days ago
Old Toronto, Canada Rakuten Kobo Full timeThe Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...
-
Site Reliability Engineer III
2 weeks ago
Toronto, Ontario, Canada Rakuten Kobo Full timeThe Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...
-
Site Reliability Engineer III
4 weeks ago
Toronto, Canada Rakuten Kobo Full timeThe Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability...
-
Site Reliability Engineer
2 weeks ago
Old Toronto, Ontario, Canada CB Canada Full timeSite Reliability EngineerOn behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job DescriptionAzure cloudJira and confluenceCICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure Kubernetes...
-
Site Reliability Engineer
20 hours ago
Old Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Reperio Human Capital Full timeSite Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Reperio Human Capital Full timeSite Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and...
-
Site Reliability Engineer
5 days ago
Old Toronto, Canada E-Solutions Full timeJob Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...
-
Site Reliability Engineer
5 days ago
Old Toronto, Canada E-Solutions Full timeJob Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Epsilon Solutions Ltd. Full timeJob Title: Site Reliability EngineerLocation: Toronto, ONSkills And Responsibilities Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and dashboards....
-
(Canada) Site Reliability Engineer
6 days ago
Old Toronto, Canada Thomson Reuters Full time(Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...
-
(Canada) Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Thomson Reuters Full time(Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...
-
Site Reliability Engineer
12 hours ago
Old Toronto, Canada Skillfinder Full timeSITE RELIABILITY ENGINEER - WARSAW, POLAND Contract (hybrid working) - 12 months + Role Overview My client serves a variety of world class financial services clients with their state of the art integrated investment management system. For their office in Warsaw, they are seeking a team of Site Reliability Engineers to assist them with a major client...
-
Site Reliability Engineer
23 hours ago
Old Toronto, Canada Skillfinder Full timeSITE RELIABILITY ENGINEER - WARSAW, POLAND Contract (hybrid working) - 12 months + Role Overview My client serves a variety of world class financial services clients with their state of the art integrated investment management system. For their office in Warsaw, they are seeking a team of Site Reliability Engineers to assist them with a major client...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean Time to...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Canada eTeam Full timeRemote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
6 days ago
Old Toronto, Canada Equifax, Inc. Full timeSynopsis of the role Site Reliability Engineering (SRE) combines software and systems engineering to create scalable and highly reliable software systems. SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services. What experience you need ...