Staff Site Reliability Engineer

4 months ago

Toronto, Canada Index Exchange Full time

We shaped the earliest forms of ad tech, and we’re looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that’s where the technical skills of our team make a real difference.

Our exchange handles over 350 billion requests every day (for comparison Google serves an estimated 9 billion searches a day), all running in our own global data centers. Every member of our technology team has an enormous amount of autonomy in building and managing our systems to support and enable our growing level of scale. Through the transparency of our technology, dedication to innovation and integrity, and long-standing customer relationships, we lead through change.

What’s it like to work at Index?

We have more than 550 Indexers around the globe dedicated to building a safe and transparent marketplace that provides a trusted experience for consumers.

Index is an exciting and fast-paced place to work. We’re built on our values of change, support, learning and teaching, trust, and intention. We pride ourselves on our independence and openness, not only in our technology, but in our teams, too. Our diverse and inclusive culture celebrates how we can leverage our unique differences to help drive Index forward.

Our culture of success is truly supportive and collaborative. In working together across our teams, we’re continually investing in the people and technology to solve the industry’s most complex problems. As we extend the promise of ad tech to every channel, we’re looking for talented engineers to help advance Index, and the industry, forward.

Are you ready to join the programmatic evolution?

Index Exchange funds the open web. Content and journalism across the internet are funded through advertising, and we are the engine that helps to make that happen transparently, safely and efficiently. Handling hundreds of billions of auctions per day within milliseconds requires an intense understanding of the exchange and the ecosystem that we live in.

Our business is growing significantly every year and is poised to grow even faster. Our people and our platforms are the foundation and enabler of that growth. We are significantly expanding our technology teams, and are looking for technologists with a passion for high performance software development, and a drive to deliver software products and platforms that enable and empower industries at a global scale.

About The Role:

We are seeking an experienced Staff Engineer with a strong background in Site Reliability Engineering (SRE) to own and develop on-premise and hybrid cloud environments, with a focus on optimizing performance low-latency on Kubernetes platforms supporting a robust developer experience framework. The ideal candidate will have a deep technical understanding of on-premise and hybrid cloud environments and a proven track record of managing SRE teams in a global setting.

Index’s scale spans the globe, our transactions happen 24x7 in our global data centers, and every second that passes millions of requests are evaluated across our exchange. In order to achieve our mission, global efficiency and reliability are absolutely key, as every millisecond quite literally counts in our business.

Here’s What You’ll be Doing

Vision: Have a deep understanding of Index and its products and processes and stay informed on the latest events in the industry, whether product or technology changes. Drive initiatives that produce positive outcomes across divisions. Project Management: Act as a technical leader on projects, architecting the design of projects to meet the needs of the business outcome, and to align with existing architectural vision. Collaborate with subject matter experts and with a network of peers to ensure on-time quality delivery. Technical Leadership: Usinga deep understanding of on-premise and hybrid cloud environments, collaborate with engineering teams and lead initiatives cross-functionally to architect innovative solutions that enhance our observability capabilities. Operational Excellence: Drive operational excellence through proactive monitoring, automation, and the development of robust incident management processes. Software Engineering Skills: Collaborate with software engineering teams to implement SRE best practices in the software development life cycle, including designing scalable and resilient systems. Incident Management: Lead incident response efforts, ensuring rapid resolution and post-incident analysis to prevent recurrence. Maintain incident reports and contribute to continuous improvement. Reporting and Metrics: Develop and maintain meaningful performance metrics and reporting mechanisms to track the health and reliability of our systems. Use data-driven insights to guide decision-making and triaging. Global Scale: Manage SRE operations at global scale, considering regional nuances and ensuring consistent, reliable service delivery across geographies.

Here's What You Need

Proven experience (6+ years) in SRE roles, with a focus on low-latency, global-scale environments built on upstream Kubernetes. Strong software engineering skills, including proficiency in programming languages such as Golang, Python, Perl. Excellent understanding of on-premise and hybrid cloud architectures. Exceptional leadership and team-building skills with a track record of developing high-performing teams with at least 3 years of experience in that role. Expertise in incident management, root cause analysis, and post-incident reviews. Strong analytical and problem-solving abilities. Extensive experience with industry-standard SRE tools and technologies within the CNCF portfolio such as ArgoCD, Cilium, Rook, OPA, Jaeger. Significant experience with configuration management tools such as Ansible, Puppet or Salt. Strong background in working with observability stack components such as ELK, Prometheus, Mimir, OpenTelemetry. Excellent communication skills, with the ability to collaborate effectively with cross-functional teams.

Why You’ll Love Working Here:

Comprehensive health, dental, and vision plans for you and your dependents Paid time off, health days, and personal obligation days plus flexible work schedules Competitive retirement matching plans Equity packages Generous parental leave available to birthing, non-birthing, and adoptive parents Annual well-being allowance plus fitness discounts and group wellness activities Commuter benefits and discounts, where available Employee assistance program Mental health first aid program that provides an in-the-moment point of contact and reassurance One day of volunteer time off per year and a donation-matching program Bi-weekly town halls and regular community-led team events Multiple resources and programming to support continuous learning A workplace that supports a diverse, equitable, and inclusive environment – learn more here

Notification

Index Exchange is aware that there have been recent scams directed toward candidates regarding job interviews and offers.

Please be vigilant and do not accept interview requests, job offers, or other hiring-related documents from anyone other than our dedicated recruitment team, from the domain of @indexexchange.com. Our interview process consists of several steps, including phone screens and video interviews. We do not conduct interviews via an email questionnaire or request money at any point in the process.

We remain dedicated to resolving this matter and we appreciate your support.

Equal employment opportunity

At Index Exchange, we believe that successful products are built by teams just as diverse as the audience who uses them. As such, we are committed to equal employment opportunities. We celebrate diversity of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or expression, or veteran status. Additionally, we realize that diversity is deeper than any status or classification—diversity is the human experience. For those who show grit, passion, and humility—Index will welcome you.

Accessibility for applicants with disabilities

Index Exchange welcomes and encourages individuals with disabilities to apply to work with us.

If you require an accommodation, please share the details of your request and any information how we can assist you with the hiring recruiter when they contact you. Index Exchange will make reasonable efforts to ensure accommodation requests are met throughout the recruitment process.

Index Everywhere, Index Anywhere

Our corporate headquarters are in Toronto, with major offices in New York, Montreal, Kitchener, London, San Francisco, and many other global cities. As a major global advertising exchange, we are committed to operating as a tightly knit global team and embracing and empowering talent wherever our colleagues may be.

#LI-ONSITE

#LI-LP1

Staff Site Reliability Engineer

1 month ago

Toronto, Ontario, Canada Index Exchange Full time

About the Role:We are seeking a highly skilled Staff Site Reliability Engineer to own and develop on-premise and hybrid cloud environments, focusing on low-latency performance on Kubernetes platforms supporting a robust developer experience framework.The ideal candidate will have a deep technical understanding of on-premise and hybrid cloud architectures and...
Site Reliability Engineer

6 months ago

Toronto, Canada CB Canada Full time

Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
Senior Site Reliability Engineer

4 months ago

Toronto, Canada Northbridge Financial Corporation Full time

What is it like to be a Senior Site Reliability Engineer at Northbridge Financial The Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and are responsible for mentoring and leading less experienced...
Digital Site Reliability Engineer

2 months ago

Old Toronto, Canada Mastech Inc. Full time

Mastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...
Site Reliability Engineering Lead

7 days ago

Old Toronto, Canada TD Full time

Job OverviewWe are seeking a highly skilled Site Reliability Engineering Lead to join our team at TD. As a key member of our technology group, you will be responsible for ensuring the stability, scalability, and reliability of our platforms.About the RoleThe ideal candidate will have a minimum of 8 years of experience in site reliability engineering, with a...
Site Reliability Engineer

2 weeks ago

Toronto, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
Site Reliability Engineer

2 weeks ago

Toronto, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
Site Reliability Engineering Leader

2 weeks ago

Toronto, Ontario, Canada Royal Bank of Canada Full time

Royal Bank of Canada is seeking a highly skilled Site Reliability Engineering (SRE) leader to join our team in Toronto, Canada. As an SRE leader, you will be responsible for leading the development and implementation of SRE solutions that improve the reliability and performance of our applications.The ideal candidate will have 5+ years of experience as a...
Site Reliability Engineer

2 months ago

Toronto, Canada SGS Full time

Job Description The Site Reliability Engineer will play a critical part in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API. Partner with developers and product operations teams to understand application requirements and translate them into operational practices....
AWS Site Reliability Engineer

1 month ago

Old Toronto, Canada Street Context Full time

p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...
AWS Site Reliability Engineer

1 month ago

Old Toronto, Canada Soda Full time

Job Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...
Site Reliability Engineer

2 weeks ago

Toronto, ON, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. PointsBet is a sports & casino betting operator...
AWS Site Reliability Engineer

2 months ago

Old Toronto, Canada Sentry Full time

p>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...
Site Reliability Engineer

2 weeks ago

Toronto, ON, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
Site Reliability Engineer

2 weeks ago

Toronto, ON, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
Site Reliability Engineer

2 weeks ago

Toronto, Ontario, Ontario, Canada PointsBet Canada Full time

SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
Site Reliability Engineer

2 weeks ago

Toronto, Canada Teranet Inc. Full time

Site Reliability Engineer Who We AreTeranet is Canada’s leader in the delivery and transformation of statutory registry services with extensive expertise in land and commercial registries. We also market insightful property and data solutions, as well as practice management automation to thousands of customers in the real estate, financial services,...
Site Reliability Engineer

2 weeks ago

Toronto, Canada Teranet Inc. Full time

Site Reliability Engineer Who We AreTeranet is Canada’s leader in the delivery and transformation of statutory registry services with extensive expertise in land and commercial registries. We also market insightful property and data solutions, as well as practice management automation to thousands of customers in the real estate, financial services,...
Site Reliability Engineering Lead

4 weeks ago

Toronto, Ontario, Canada Compunnel Inc. Full time

Compunnel Inc. is a leading provider of innovative technology solutions.We are seeking an experienced Site Reliability Engineering Lead to join our team in Toronto, Canada.The estimated salary for this position is $170,000 per year, considering the location and industry standards.About the JobThis role is perfect for someone who is passionate about driving...
AWS Site Reliability Engineer

1 month ago

Old Toronto, Canada Olx Full time

p>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...

Americas

Europe

Asia / Oceania

Africa

Staff Site Reliability Engineer