Staff Site Reliability Engineer

2 weeks ago


Toronto, Ontario, Canada Index Exchange Full time

About the Role:

We are seeking a highly skilled Staff Site Reliability Engineer to own and develop on-premise and hybrid cloud environments, focusing on low-latency performance on Kubernetes platforms supporting a robust developer experience framework.

The ideal candidate will have a deep technical understanding of on-premise and hybrid cloud architectures and a proven track record of managing SRE teams in a global setting.

Key Responsibilities:

  • Drive initiatives that produce positive outcomes across divisions.
  • Act as a technical leader on projects, architecting the design of projects to meet the needs of the business outcome, and to align with existing architectural vision.
  • Collaborate with engineering teams and lead initiatives cross-functionally to architect innovative solutions that enhance our observability capabilities.
  • Drive operational excellence through proactive monitoring, automation, and the development of robust incident management processes.
  • Implement SRE best practices in the software development life cycle, including designing scalable and resilient systems.
  • Lead incident response efforts, ensuring rapid resolution and post-incident analysis to prevent recurrence.
  • Develop and maintain meaningful performance metrics and reporting mechanisms to track the health and reliability of our systems.

Requirements:

  • Proven experience (6+ years) in SRE roles, with a focus on low-latency, global-scale environments built on upstream Kubernetes.
  • Strong software engineering skills, including proficiency in programming languages such as Golang, Python, Perl.
  • Excellent understanding of on-premise and hybrid cloud architectures.
  • Exceptional leadership and team-building skills with a track record of developing high-performing teams.
  • Expertise in incident management, root cause analysis, and post-incident reviews.
  • Strong analytical and problem-solving abilities.
  • Extensive experience with industry-standard SRE tools and technologies within the CNCF portfolio such as ArgoCD, Cilium, Rook, OPA, Jaeger.
  • Significant experience with configuration management tools such as Ansible, Puppet or Salt.
  • Strong background in working with observability stack components such as ELK, Prometheus, Mimir, OpenTelemetry.

About Us:

Index Exchange is a leading advertising exchange that operates at a global scale. We are dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

As a Staff Site Reliability Engineer, you will be part of a tight-knit global team that is committed to innovation, integrity, and customer relationships. You will have the opportunity to work with cutting-edge technologies and collaborate with cross-functional teams to drive positive outcomes across divisions.

Why Work with Us:

  • Comprehensive health, dental, and vision plans for you and your dependents.
  • Paid time off, health days, and personal obligation days plus flexible work schedules.
  • Competitive retirement matching plans.
  • Equity packages.
  • Generous parental leave available to birthing, non-birthing, and adoptive parents.
  • Annual well-being allowance plus fitness discounts and group wellness activities.
  • Commuter benefits and discounts, where available.
  • Employee assistance program.
  • Mental health first aid program that provides an in-the-moment point of contact and reassurance.
  • One day of volunteer time off per year and a donation-matching program.
  • Bi-weekly town halls and regular community-led team events.
  • Multiple resources and programming to support continuous learning.


  • Toronto, Ontario, Canada Index Exchange Full time

    About the RoleWe are seeking an experienced Staff Engineer with a strong background in Site Reliability Engineering (SRE) to own and develop on-premise and hybrid cloud environments, with a focus on optimizing performance low-latency on Kubernetes platforms supporting a robust developer experience framework.The ideal candidate will have a deep technical...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Senior Site Reliability EngineerThe Senior Site Reliability Engineer plays a crucial role in ensuring the reliability and efficiency of our systems. This position oversees the creation and implementation of Service Level Objectives (SLOs) and handles service reliability solutions and processes of increasing complexity.Key Responsibilities:Interface with...


  • Toronto, Ontario, Canada SGS Full time

    Job DescriptionThe Site Reliability Engineer will play a critical role in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API.Partner with developers and product operations teams to understand application requirements and translate them into operational practices.Design,...


  • Toronto, Ontario, Canada The Toronto-Dominion Bank (Canada) Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at The Toronto-Dominion Bank (Canada). As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.Key ResponsibilitiesProvide technical leadership and expertise in designing and...


  • Toronto, Ontario, Canada SGS Full time

    Job Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at SGS Canada. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications.Key Responsibilities:Partner with developers and...


  • Toronto, Ontario, Canada The Home Depot Canada Full time

    Unlock Your Potential at The Home Depot CanadaAs a Site Reliability Engineering Manager, you will lead a team of Site Reliability Engineers to ensure the reliability, performance, and operational support of our eCommerce systems, with a focus on Google Cloud Platform (GCP) environments.Key Responsibilities:Lead and mentor a team of Site Reliability Engineers...


  • Toronto, Ontario, Canada Criteo Full time

    About the Role:We are seeking a skilled Senior Site Reliability Engineer to join our team at Criteo. As a key member of our Product Reliability Engineering group, you will work closely with product engineering to improve the reliability of our apps, systems, and pipelines.Your Responsibilities:Collaborate with product engineering to identify and prioritize...


  • Toronto, Ontario, Canada Criteo Full time

    About the Role:We are seeking a skilled Site Reliability Engineer to join our team at Criteo. As a Site Reliability Engineer, you will work closely with product engineering to improve the reliability of our apps, systems, and pipelines.Key Responsibilities:Collaborate with product engineering to design, develop, and deploy scalable and reliable systems.Work...


  • Toronto, Ontario, Canada Lyons Consulting Group Full time

    Job SummaryLyons Consulting Group is seeking a highly skilled Site Reliability Engineer to join our team. As a Site Reliability Engineer, you will be responsible for ensuring the reliability and performance of our infrastructure and applications.Key ResponsibilitiesProvide hands-on SRE support, including incident management, problem management, root cause...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Site Reliability Engineer Role OverviewThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs). This role involves handling service reliability solutions and processes of increasing complexity, as well as mentoring and leading less experienced...


  • Toronto, Ontario, Canada Vantage Full time

    Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Vantage. As a key member of our engineering team, you will play a pivotal role in ensuring the seamless operation of our large-scale, distributed systems.Key ResponsibilitiesCollaborate with software engineers to drive project success and...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Site Reliability Engineer Role OverviewThe Senior Site Reliability Engineer at Northbridge Financial Corporation is responsible for overseeing the creation and implementation of Service Level Objectives (SLOs). This role involves handling service reliability solutions and processes of increasing complexity, as well as mentoring and leading less experienced...


  • Toronto, Ontario, Canada Royal Bank of Canada> Full time

    Job SummaryJob DescriptionWhat is the Opportunity?Royal Bank of Canada is seeking a highly skilled Site Reliability Engineer to join our team. As a key member of our Site Reliability Engineering team, you will be responsible for designing, building, and managing complex platforms to support business processes, reduce toil, and develop new technology...


  • Old Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineering Specialist to join our team at Thomson Reuters. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable systems and services that meet the needs of our customers.Key ResponsibilitiesDesign and implement scalable systems and...


  • Toronto, Ontario, Canada Compunnel Inc. Full time

    Compunnel Inc. is a leading provider of innovative technology solutions.We are seeking an experienced Site Reliability Engineering Lead to join our team in Toronto, Canada.The estimated salary for this position is $170,000 per year, considering the location and industry standards.About the JobThis role is perfect for someone who is passionate about driving...


  • Toronto, Ontario, Canada Vantage Full time

    Senior Site Reliability EngineerWe are seeking a highly skilled Senior Site Reliability Engineer to join our team at Vantage. As a key member of our engineering team, you will play a pivotal role in ensuring the seamless operation of our large-scale, distributed systems.Key Responsibilities:Collaborate with software engineers to drive project forward through...


  • Toronto, Ontario, Canada Behavox Full time

    About the RoleAt Behavox, we're building a scalable and fault-tolerant platform to manage and analyze massive volumes of data. Our platform is designed to handle millions of data items, allowing our clients to search, filter, and visualize relationships between entities in the system.As a Site Reliability Engineer, you'll be responsible for ensuring the...


  • Toronto, Ontario, Canada State Street Full time

    At State Street, we are seeking a Cloud Platform/Site Reliability Engineer to join our team.Key Responsibilities:Design and implement scalable cloud infrastructure solutions.Ensure high availability and reliability of cloud-based systems.Collaborate with cross-functional teams to drive cloud adoption and innovation.Requirements:Strong background in cloud...


  • Toronto, Ontario, Canada KPMG Canada Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at KPMG Canada. As a key member of our Operations team, you will play a critical role in ensuring the smooth operation of our Managed Service.Key ResponsibilitiesDesign and implement scalable and reliable cloud infrastructure solutionsCollaborate with cross-functional...


  • Old Toronto, Ontario, Canada Thomson Reuters Full time

    Site Reliability Engineer (Contract)Contract (5 months 29 days)Closed OpportunityThomson Reuters is seeking a skilled Site Reliability Engineer to join our Service Management Organization.The ideal candidate will have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure.As a Site Reliability...