See more Collapse

Site Reliability Engineer III

1 month ago


Old Toronto, Canada Guidewire Full time
ESSENTIAL DUTIES AND RESPONSIBILITIES

Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments

Oversee and automate the team’s growing presence in AWS

Contribute to core infrastructure systems development with features, bug fixes, reliability improvements, etc

Platform reliability engineering of a complex single sign-on SAML/OAuth-based central authentication platform

Creatively build and develop tooling to aid in driving 24x7x365 follow-the-sun operations of critical production systems

Automate deployment tasks for core product and infrastructure tools and maintain automation infrastructure

Create system documentation and training materials to empower and educate our fellow team members

Build and maintain observability tooling, metrics, and dashboarding for a global platform product infrastructure

Improve our incident management lifecycle to identify, mitigate, and learn from reliability risks and issues

Enhance platform observability with helping create a self-healing approach to platform reliability

Collaborate with engineering teams, providing product feedback and where necessary contribute code to the product

REQUIRED SKILLS AND EXPERIENCE

Education and Work Experience: Bachelor’s Degree in Computer Science or related field.

Software engineering and task automation skills with Bash, Python, and/or Go are a must.

Familiarity with the Agile software development lifecycle.

Deep background with Linux systems and engineering.

Highly experienced with engineering and automating on Amazon Web Services (AWS).

Experience supporting web applications running on Java / Apache / Tomcat in a live production environment.

Prior experience with IaC tools like Terraform/Terragrunt/Terraspace.

Prior experience with devops/gitops tools (Git, Bitbucket, Flux CD, Teamcity) for gate promotions.

Production-At-Scale support background in a heavily microservice-based world.

Hands-on engineering and ops expertise in containerization (Docker, Helm, Kubernetes/EKS, CNI and Ingress networking).

Strong understanding of Single-Sign On, SAML, OAuth (Bonus if hands-on experience with Okta).

Seasoned expertise around certificate technology and basic concepts of encryption.

Experience working with Relational Databases such as Aurora Postgres and/or Oracle RDS.

Advanced exposure to application development, web UI (design and development), JSON, application architecture.

Experience strongly utilizing observability tools (logging/APM) like Datadog, CloudWatch, and PagerDuty.

Familiarity with event store/stream-processing technologies like Kafka or AWS SQS.

Understanding of Open Application Model systems such as KubeVela or Crossplane.

Personal Qualities and Soft Skills:

You greatly prefer writing code than clicking a GUI.

You enjoy teaching, being a mentor to others, and working across boundaries.

Outstanding troubleshooting skills; ability to think critically and display an aptitude for problem solving.

Strong analytical mind with a penchant for process development and enhancement.

A highly positive can-do attitude with desire for being a team player.

Great communication skills and ability to explain complex technical concepts to a varied audience.

Demonstrate strong follow-through, a strong work ethic and consistently keep and meet commitments.

Other Requirements:

Ability to read, write, and speak English.

We provide 24x7 support to our customers, so we expect you to take turns with your teammates being on-call for weekend production emergencies or to provide rotating weekend operational support.

Travel – Expect occasional travel (less than 5%) to other Guidewire offices for training and team meetings.

#J-18808-Ljbffr

We have other current jobs related to this field that you can find below


  • Old Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Old Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Toronto, Ontario, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability and...


  • Toronto, Canada Rakuten Kobo Full time

    The Role At Rakuten Kobo, we develop software that covers a rich set of domains, including hardware devices, eCommerce, content rendering, and an expanding data ecosystem. Our SRE team provides the safety net that empowers our 50+ product developers to move fast. We are seeking an experienced Site Reliability Engineer III to help ensure the reliability...


  • Old Toronto, Ontario, Canada CB Canada Full time

    Site Reliability EngineerOn behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer.Site Reliability Engineer – Job DescriptionAzure cloudJira and confluenceCICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure Kubernetes...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Site Reliability Engineer Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience...


  • Old Toronto, Canada Reperio Human Capital Full time

    Site Reliability Engineer 100421 Desired skills: Site Reliability Engineer, SRE, Cloud, Permanent, Remote Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada E-Solutions Full time

    Job Title: Site Reliability Engineer Location: Toronto, ON Skills and Responsibilities: Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and...


  • Old Toronto, Canada Epsilon Solutions Ltd. Full time

    Job Title: Site Reliability EngineerLocation: Toronto, ONSkills And Responsibilities Collaborate with teams to enhance application and transaction scalability using Azure Kubernetes Service (AKS) and Azure scalability features. Develop application monitoring strategies using New Relic, Devo, and Azure Monitor, including creating monitors and dashboards....


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Skillfinder Full time

    SITE RELIABILITY ENGINEER - WARSAW, POLAND Contract (hybrid working) - 12 months + Role Overview My client serves a variety of world class financial services clients with their state of the art integrated investment management system. For their office in Warsaw, they are seeking a team of Site Reliability Engineers to assist them with a major client...


  • Old Toronto, Canada Skillfinder Full time

    SITE RELIABILITY ENGINEER - WARSAW, POLAND Contract (hybrid working) - 12 months + Role Overview My client serves a variety of world class financial services clients with their state of the art integrated investment management system. For their office in Warsaw, they are seeking a team of Site Reliability Engineers to assist them with a major client...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean Time to...


  • Old Toronto, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Equifax, Inc. Full time

    Synopsis of the role Site Reliability Engineering (SRE) combines software and systems engineering to create scalable and highly reliable software systems. SREs are responsible for the availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services. What experience you need ...