Staff Site Reliability Engineer

1 day ago

Toronto, Ontario, Canada Achievers Full time

Our Site Reliability Engineering team sits at the intersection of software engineering and operations, building reliable, scalable cloud systems that our teams and customers can trust.
As Staff Site Reliability Engineer, you'll play a critical role in the management and advancement of our global infrastructure. You'll leverage approximately 15 years of technical expertise - specifically focusing on the evolution of high-concurrency, distributed systems, and the orchestration of hyper-scale cloud environments. In this position, you will leverage your expertise to architect our GCP/GKE environment and lead the integration of AI-driven workflows. This includes utilizing bots, automated PR remediation, and intelligent alerting to ensure our platform can scale efficiently and reliably. Why you'll love this role:

Lead high-impact initiatives that shape how millions of people experience work around the world.
Bring your unique perspective to complex and challenging projects - apply your expertise in architecture, influence technical direction, and mentor fellow team members.
Join a close-knit, no-ego, high-performing team that solves meaningful problems and celebrates successes together.
Work alongside an experienced leadership team who is genuinely invested in your career growth.
Thrive in a fast-paced, high-growth environment where innovation is encouraged and your voice truly matters.

How you'll shape our cloud infrastructure:

Architectural Leadership: Lead the design and ongoing evolution of our global, high-availability infrastructure, focusing on Google Cloud Platform (GCP) and Kubernetes (GKE).
AI & Automation Strategy: Identify repetitive operational tasks and implement AI-integrated workflows, such as Slack or Teams bots for incident triage, AI-augmented alerting, and automated PR generation to address infrastructure drift.
Cross-Functional Influence: Collaborate with Product, Engineering, and Leadership teams to identify systemic risks, manage complex changes, and define the long-term reliability roadmap.
Infrastructure-as-Code (IaC): Establish and exemplify best practices for Terraform and CI/CD pipelines, empowering development teams to deploy code rapidly and securely.
System Resiliency: Lead high-level initiatives in disaster recovery, multi-region networking, and the design of zero-trust security architectures.
Technical Mentorship: Guide design reviews and promote best practices, enhancing the technical skills and capabilities of the entire SRE organization.

Experience we feel will set you up for success:

The 15-Year Lens: Possess extensive systems engineering experience, with in-depth knowledge of Linux kernels, network protocols (TCP/IP, BGP, DNS), and cloud-native architecture.
GCP Expertise: Demonstrated, hands-on experience in architecting and managing production workloads on Google Cloud Platform and GKE.
AI/Workflow Automation: Practical experience or a strong vision for integrating AI tools and LLMs to automate SRE tasks, documentation, or incident response.
Code Proficiency: Advanced skills in Python or Go, with the ability to develop sophisticated internal tools and automation frameworks.
Observability Mastery: Expert understanding of observability frameworks (such as New Relic, Prometheus, Grafana) to enable data-driven decision-making.
Database Foundations: Deep knowledge of managing relational databases (MySQL, MongoDB) at scale.
Communication: Exceptional ability to clearly convey complex technical infrastructure challenges as actionable business insights to non-technical stakeholders.

The Achievers MindsetDisruptive Innovator: Set industry trends by applying emerging technologies like AI to address longstanding infrastructure challenges.Self-Starter: Maintain a mindset of continuous improvement, always seeking opportunities to automate processes.Culture of Success: Believe that platform reliability is fundamental to both employee success and customer trust. Bonus PointsHands-on experience with Service Mesh (Istio) and advanced GCP Networking features, such as Interconnect and Shared VPC.A proven history of migrating legacy automation systems to modern, AI-augmented CI/CD workflows. Why Achievers is a Great Place to Work
At Achievers, we believe recognition is a powerful driver of connection. With more than 4.3 million users across 190 countries, our employee recognition and rewards platform empowers organizations to build cultures where people feel seen and valued, everyday. We're a team of passionate, thoughtful builders who care deeply about our product, our customers, and each other. Visit to see how we're inspiring recognition everywhere.
Our Approach to Total Rewards
$124,000 - $170,000 reflects the salary range for this role, depending on experience, skills, and market data. We're committed to providing a fair and competitive offer based on what you bring to the team. Each A-Players' compensation is reviewed at least annually against performance and impact in role. We want you to see your path to growth, understand your impact, and feel valued every step of the way.
Benefits and Perks for permanent full-time employees: Rewards for your impact through our Recognition and Rewards program Health Benefits and Life Insurance Coverage beginning on your first day Parental Leave Top-up Employer matched RRSP contributions Flexible Vacation to recharge, so you can bring your best Employee and Family Assistance Program offering mental health, legal, and financial counselling Supported professional development and career growth (Linkedin Learning, mentorship) Employee-Led Employee Resource Groups that celebrate our diversity Regular events designed to build connection, belonging, and well-being Hybrid flexibility, with time in our beautiful Liberty Village, Toronto office
This posting is for a currently vacancy on our team.
Achievers is proud to be an equal opportunity employer committed to building a diverse, inclusive workplace where everyone can do their best work. We encourage qualified candidates from all backgrounds and experiences to apply.
Achievers is committed to ensuring an inclusive and accessible recruitment process for all candidates. If you require any accommodations for your interview, such as assistive technology, wheelchair accessibility, or alternative formats of materials, please let us know. We are happy to make necessary arrangements to support your needs. We may use artificial intelligence (AI) tools to support parts of our hiring process, such as reviewing applications or analyzing resumes. These tools help our recruitment team but never replace decisions made by real people. We believe in a human-first approach to hiring, where your experience, personality, and potential are recognized by people, not algorithms, and where final hiring decisions are made by humans. If you would like more information about how your data is processed, please contact us.

Staff Site Reliability Engineer

1 day ago

Toronto, Ontario, Canada Okta Full time

Get to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
Staff Site Reliability Engineer

1 day ago

Toronto, Ontario, Canada Okta Full time

Get to know OktaOkta is The World's Identity Company. We free everyone to safely use any technology, anywhere, on any device or app. Our flexible and neutral products, Okta Platform and Auth0 Platform, provide secure access, authentication, and automation, placing identity at the core of business security and growth.At Okta, we celebrate a variety of...
Staff Site Reliability Engineer

18 minutes ago

Toronto, Ontario, Canada Achievers Full time

Our Site Reliability Engineering team sits at the intersection of software engineering and operations, building reliable, scalable cloud systems that our teams and customers can trust As Staff Site Reliability Engineer, you'll play a critical role in the management and advancement of our global infrastructure. You'll leverage approximately 15 years of...
Staff Site Reliability Engineer

31 minutes ago

Toronto, Ontario, Canada Achievers Full time US$124,000 - US$170,000

Our Site Reliability Engineering team sits at the intersection of software engineering and operations, building reliable, scalable cloud systems that our teams and customers can trust. As Staff Site Reliability Engineer, you'll play a critical role in the management and advancement of our global infrastructure. You'll leverage approximately 15 years of...
Staff, Site Reliability Engineer

28 minutes ago

Toronto, Ontario, Canada RBC Full time

Job DescriptionWhat is the opportunity?We are seeking a Staff, Site Reliability Engineer - Observability (Global Security) to own the resilience and "see-ability" of our mission-critical Identity and Access Management (IAM) platform. Your primary mission will be todesign, build, and scale an end-to-end observability stackthat provides deep, actionable...
Staff Site Reliability Engineer

38 minutes ago

Toronto, Ontario, Canada Confluent Full time

We're not just building better tech. We're rewriting how data moves and what the world can do with it. With Confluent, data doesn't sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.It takes a certain kind of person to join...
Site Reliability Engineer

20 minutes ago

Toronto, Ontario, Canada Compass Digital Full time

Join Compass Digital as an Intermediate Site Reliability Engineer and help power the future of hospitality tech You'll design, build, and automate cloud-native systems that are reliable, observable, and scalable—working with AWS, Go, TypeScript, serverless, containers, and cutting-edge DevOps tools.WHO WE ARECompass Digital is an organization that drives...
Site-Reliability Engineer

31 minutes ago

Toronto, Ontario, Canada Aarorn Technologies Inc Full time

Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...
Site Reliability Engineer

21 minutes ago

Toronto, Ontario, Canada Scotiabank Full time

Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...
Site Reliability Engineer

6 minutes ago

Toronto, Ontario, Canada Scotiabank Full time

Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...

Americas

Europe

Asia / Oceania

Africa

Staff Site Reliability Engineer