Lead Site Reliability Engineer

2 weeks ago

Canada East Coast Masabi Full time $120,000 - $180,000 per year

Introducing Masabi

// At Masabi, we're driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel.

Our Justride platform is used in over 250 locations globally, including some of the largest cities in the world. With our industry-first mobile ticketing SDK, we've partnered with large players in the transport space, including Uber, Moovit and Transit.

Your own journey is important to us too. Choosing a role here means joining a network of innovators from all walks of life; a group of passionate individuals who consistently deliver. Here, you'll find the tools you need to build the career you want. Whether you're taking the direct route or trying a new path, we'll support you no matter what.

The Role_
// We're looking for a Lead Site Reliability Engineer to join our platform team, someone who's confident working hands-on with infrastructure, but also ready to shape how we scale and operate as a global team.

You'll take ownership of key systems, lead cross-functional work, and help evolve the way we build for performance, reliability, and security. This role is ideal for those who enjoy solving complex problems, improving systems through automation, and supporting others as they grow. It's a chance to have both technical depth and meaningful influence, while staying close to the work that matters.

Location_
This role is available in a remote model to candidates based in Canada (East Coast time zone).

What You'll Be Doing_

Build and automate reliable systems

Lead design discussions and make key architectural decisions for reliability, scalability, and performance.
Establish SRE standards and best practices (IaC patterns, CI/CD maturity, observability, etc.) across teams.
Design and manage infrastructure using Terraform and CloudFormation
Build and evolve CI/CD pipelines that support fast, safe, and frequent deployments
Automate manual tasks to reduce operational load and enable faster delivery
Help expand our infrastructure globally, scaling up new environments with care

Improve visibility, scale and performance

Define and maintain SLIs, SLOs, and alerting strategies aligned with user experience
Implement monitoring solutions that give us clear, early signals during incidents
Lead capacity planning and performance tuning as our systems and teams grow
Identify opportunities to improve architecture for resilience and cost-effectiveness

Own reliability and incident response

Lead or contribute to incident response, root cause analysis, and post-incident reviews
Design and maintain disaster recovery and failover strategies
Partner with compliance and security teams to meet frameworks like SOC 2 and PCI

Support others and share your knowledge

Collaborate with engineers, architects, and product teams to embed SRE practices from the start and define long-term platform reliability strategy
Mentor others in areas like observability, incident readiness, and infrastructure-as-code
Document systems and processes clearly to support learning and long-term success
Partake of the on-call rotation, shared with the team and paid on top of salary

About You_

// You're an experienced SRE who combines technical depth with curiosity, care, and a desire to make things better for the platform, the team, and the people using our systems.

You've worked in SRE, platform, or DevOps roles where reliability was business-critical (24/7)
You have proven experience designing and evolving production-grade systems for scale and resilience.
You're comfortable designing and operating in AWS, with strong knowledge of cloud architecture, networking and security (VPC design, IAM, least privilege)
You have hands-on experience with Terraform, infrastructure automation, and CI/CD systems
You've led or contributed to high-impact projects involving observability, performance, incident command and/or reliability (distributed tracing, log correlation, metrics maturity, etc)
You communicate clearly and drive cross-functional reliability improvements in distributed, async-first teams
You enjoy helping others grow and value a kind, collaborative engineering culture
You take pride in doing things the right way, but you're pragmatic and focused on impact

Nice To Have_

Familiarity with PCI DSS v4 or similar compliance standards
Experience with container orchestration
AWS certifications

Our Tech Stack_

// Our platform is JVM-based and cloud-native, running on AWS. The SRE team works across both modern infrastructure and legacy systems as we continue to scale globally.

We use a range of proven tools to support performance, reliability, and speed of delivery:

Monitoring & Observability: Grafana, Prometheus, CloudWatch, Pingdom, Kibana
Infrastructure as Code: Terraform, CloudFormation
CI/CD & Automation: GitLab CI, Rundeck
Configuration Management & Logging: Puppet, Confluent Cloud

Some of Our Benefits_

20 days of vacation per year (in addition to public holidays), plus the option to buy an additional 5 days of vacation each year. On top of this, our office is shut every year between Christmas and New Year, totalling a whopping 28+ days of vacation
Private Healthcare and Life Insurance via Trinet
Menopause support
Choice of a workstation
Training allowance of up to CAD$1000 per year
CAD$325 per year to spend on your home office
Ability to work for up to 3 months per year from any country in the world
Enhanced family leave
Fun and collaborative environment with a focus on making a difference in the world

Careers at Masabi are for people going places - driven by a mission to make transit fair and accessible for all.

We are a network of innovators from all walks of life, passionate about making a difference. At Masabi, we operate with openness and trust, creating an environment where everyone feels empowered to bring their whole, authentic selves to work.

Whoever you are, just be yourself.
We welcome applications from underrepresented backgrounds and encourage you to share your pronouns at any stage. Together, we simplify journeys, remove barriers, and improve daily life for millions.

Why Join Masabi?

Driven by Purpose – We believe in journeys made simple. The work isn't always easy, but the best things never are.
Encouraged to Accelerate – Masabi is going places and our people are in the driving seat. Whether you're taking the direct route or exploring new paths, we support your journey.
Advancing with Empathy – We put people first and foster a culture of learning, not blame. No matter your cargo, we share the load.

We're already powering journeys - are you ready to join us?

Lead Site Reliability Engineer

24 hours ago

, , Canada Masabi Full time

Introducing Masabi At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel. The Role We’re...
Site Reliability Engineer

3 weeks ago

, , Canada Orion Innovation Full time

Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...
Director, Site Reliability Engineering

3 weeks ago

, , Canada Icon Full time

Helping SaaS companies scale Engineering teams. Director, Site Reliability Engineering We are seeking an accomplished Director of Site Reliability Engineering (SRE) to lead the reliability, scalability, and performance initiatives across multiple enterprise technology domains, including AML, Risk, Finance, Corporate Treasury, and Human Resources systems....
Senior Site Reliability Engineer

3 days ago

, , Canada Thinkific Full time

Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
Senior Site Reliability Engineer

3 weeks ago

, , Canada Akamai Technologies Full time

Senior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
Senior Site Reliability Engineer

3 weeks ago

, , Canada Orion Innovation Full time

Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
Systems Reliability Engineer

1 week ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time $120,000 - $180,000 per year

Requisition ID: 239640Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The RoleAs a member of the Systems Reliability Engineering team, the System Reliability Engineer will collaborate closely with Engineering and development teams, peers, and business partners to continuously improve the stability,...
Senior Site Reliability Engineer

3 weeks ago

, , Canada Orion Innovation Full time

We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...
Staff Site Reliability Engineer

4 weeks ago

, BC, Canada Branch Full time

Overview At Branch, we’re transforming how brands and users interact across digital platforms. Our mobile marketing and deep linking solutions deliver seamless experiences that increase ROI, decrease wasted spend, and eliminate siloed attribution. Our team values ownership, collaboration, and a motto: Build Together, Grow Together, Win Together. As a Staff...
Senior Site Reliability Engineer

3 days ago

, , Canada DuckDuckGo Full time

6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...

Americas

Europe

Asia / Oceania

Africa

Lead Site Reliability Engineer