Lead Site Reliability Engineer

3 weeks ago

Canada Masabi Full time

Introducing Masabi At Masabi, we’re driving the fare payment revolution, powering the journeys of millions all over the world. We build fare collection platforms that allow riders to seamlessly buy and present tickets for public transport either on their mobile phones, from a ticket machine, or even by tapping their bank card to travel. The Role We’re looking for a Lead Site Reliability Engineer to join our platform team, someone who’s confident working hands‑on with infrastructure, but also ready to shape how we scale and operate as a global team. You’ll take ownership of key systems, lead cross‑functional work, and help evolve the way we build for performance, reliability, and security. This role is ideal for those who enjoy solving complex problems, improving systems through automation, and supporting others as they grow. It’s a chance to have both technical depth and meaningful influence, while staying close to the work that matters. Location This role is available in a remote model to candidates based in Canada (East Coast time zone). What You’ll Be Doing Build and automate reliable systems Lead design discussions and make key architectural decisions for reliability, scalability, and performance. Establish SRE standards and best practices (IaC patterns, CI/CD maturity, observability, etc.) across teams. Design and manage infrastructure using Terraform and CloudFormation. Build and evolve CI/CD pipelines that support fast, safe, and frequent deployments. Automate manual tasks to reduce operational load and enable faster delivery. Help expand our infrastructure globally, scaling up new environments with care. Improve visibility, scale and performance Define and maintain SLIs, SLOs, and alerting strategies aligned with user experience. Implement monitoring solutions that give us clear, early signals during incidents. Lead capacity planning and performance tuning as our systems and teams grow. Identify opportunities to improve architecture for resilience and cost-effectiveness. Own reliability and incident response Lead or contribute to incident response, root cause analysis, and post‑incident reviews. Design and maintain disaster recovery and failover strategies. Partner with compliance and security teams to meet frameworks like SOC 2 and PCI. Support others and share your knowledge Collaborate with engineers, architects, and product teams to embed SRE practices from the start and define long‑term platform reliability strategy. Mentor others in areas like observability, incident readiness, and infrastructure‑as‑code. Document systems and processes clearly to support learning and long‑term success. Partake of the on‑call rotation, shared with the team and paid on top of salary. About You You’re an experienced SRE who combines technical depth with curiosity, care, and a desire to make things better for the platform, the team, and the people using our systems. You’ve worked in SRE, platform, or DevOps roles where reliability was business‑critical (24/7). You have proven experience designing and evolving production‑grade systems for scale and resilience. You’re comfortable designing and operating in AWS, with strong knowledge of cloud architecture, networking and security (VPC design, IAM, least privilege). You have hands‑on experience with Terraform, infrastructure automation, and CI/CD systems. You’ve led or contributed to high‑impact projects involving observability, performance, incident command and/or reliability (distributed tracing, log correlation, metrics maturity, etc). You communicate clearly and drive cross‑functional reliability improvements in distributed, async‑first teams. You enjoy helping others grow and value a kind, collaborative engineering culture. You take pride in doing things the right way, but you’re pragmatic and focused on impact. Nice To Have Familiarity with PCI DSS v4 or similar compliance standards. Experience with container orchestration. AWS certifications. Our platform is JVM‑based and cloud‑native, running on AWS. The SRE team works across both modern infrastructure and legacy systems as we continue to scale globally. Tools we use Monitoring & Observability: Grafana, Prometheus, CloudWatch, Pingdom, Kibana. Infrastructure as Code: Terraform, CloudFormation. Configuration Management & Logging: Puppet, Confluent Cloud. Why Join Masabi? Driven by Purpose – We believe in journeys made simple. The work isn’t always easy, but the best things never are. Encouraged to Accelerate – Masabi is going places and our people are in the driving seat. Whether you’re taking the direct route or exploring new paths, we support your journey. Advancing with Empathy – We put people first and foster a culture of learning, not blame. No matter your cargo, we share the load. Seniority level Mid‑Senior level Employment type Full‑time Industries IT Services and IT Consulting, Software Development, and Urban Transit Services #J-18808-Ljbffr

Site Reliability Engineer

2 days ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

1 day ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time

Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

2 weeks ago

Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
Site Reliability Engineer

2 weeks ago

Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
Site Reliability Engineer

2 weeks ago

, , Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3 items to see on resumes 5+ years of experience in DevOps, Site...
Site Reliability Engineer

2 weeks ago

Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top...
Site Reliability Engineer

2 weeks ago

Canada SPECTRAFORCE Full time

Job Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
Senior Site Reliability Engineer

4 weeks ago

, , Canada Thinkific Full time

Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
Site Reliability Engineer

3 weeks ago

Canada Blue Signal Search Full time

Site Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....
Site Reliability Engineer

2 weeks ago

Canada Blue Signal Search Full time

Site Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....

Americas

Europe

Asia / Oceania

Africa

Lead Site Reliability Engineer