Site Reliability Engineer

1 week ago


Toronto Montreal Calgary Vancouver Edmonton Old Toronto Ottawa Mississauga Quebec Winnipeg Halifax Saskatoon Burnaby Hamilton Surrey Victoria London Halton Hills Regina Markham Brampton Vaughan Kelowna Laval Southwestern Ontario R, Canada Tecsys Inc. Full time

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our conveniently located offices and collaborative workspaces, provide our team with the freedom and flexibility to work in the way that makes our employees most productive. About us Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tescys could be a good fit for you About the Role We are looking for a Site Reliability Engineer to join our Network and Security Operations Center (NOC), a team at the heart of platform reliability for mission-critical SaaS environments. You will help maintain, optimize, and ensure the reliability and performance of the systems that power our cloud infrastructure across AWS and Kubernetes, with a strong focus on automation, observability, and continuous improvement. This role blends reliability engineering with incident command, giving you real ownership over uptime, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and continuous improvement through automation and resilience engineering. Your responsibilities Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews. Innovate relentlessly: Identify pain points, propose creative solutions, and drive initiatives that simplify, scale, and strengthen the platform. Maintain services once they are live by measuring and monitoring availability, latency and overall system health. Own observability: Enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that drive reliability outcomes. Drive automation: Develop and improve internal tooling, IaC frameworks, and pipelines (Terraform, GitLab CI/CD) to reduce manual intervention and enable self-healing systems. Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity. Be on‑call. Practice sustainable incident response and blameless postmortems. Lead post‑incident reviews (RCAs) and identify long‑term fixes that improve stability, reliability, and developer experience. Implement monitoring, Logging, alerting, and SLA Reporting. Create and maintain technical documentation. Implement, maintain and mature SRE best practices. Lead incidents: Act as Incident Commander for Incidents; coordinate cross‑team response, manage communications, and ensure rapid service restoration. Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth. Collaborate with members of the Platform Engineering team to implement and support far‑reaching strategic efforts, provide constructive feedback, and foster a collaborative environment. Work cross‑functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users. 5+ years in Site Reliability, Cloud, or DevOps Engineering, ideally in SaaS or large‑scale production environments. Experience designing and deploying large scale systems, multi‑vendor platforms and globally distributed infrastructure. Proven experience managing cloud infrastructure in AWS (multi‑account, VPC, EC2, EKS) and Kubernetes at scale. Strong hands‑on experience with IaC and automation (Terraform, Ansible, or similar). Familiarity with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable). Deep understanding of monitoring and observability using Datadog (or equivalent), including metric design, log pipelines, alerting, and dashboards. Experience with incident management, on‑call participation, escalation, and structured postmortems. Scripting skills in Python, Bash, Java or equivalent for automation and diagnostics. Curiosity, ownership, and a bias for action; you see a problem, you solve it, and you share the lessons learned. Experience with Fedramp (The Federal Risk and Authorization Management Program) compliance is a strong asset. Basic knowledge of Java‑ or .Net‑based development required. Strong English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues beyond the province of Quebec. Additional requirements: Escalation on‑call rotation Occasional travel (quarterly offsites, conferences – less than 10%) At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team. Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview. NB: if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada, OR, have a valid Canadian work permit. #J-18808-Ljbffr



  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full time

    Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Surrey, Victoria, London, Halton Hills, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Tecsys Inc. Full time

    Get AI-powered advice on this job and more exclusive features.Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end....


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full time

    Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote [Working EST hours] Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...


  • Ottawa, Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Canonical Full time

    OverviewJoin to apply for the Site Reliability Engineer role at Canonical.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our...

  • Site Reliability

    1 week ago


    Winnipeg, Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Halifax, Saskatoon, Burnaby, Hamilton, Surrey, Victoria, London, Halton Hills, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Canonical Full time

    Join to apply for the Site Reliability / Gitops Engineer role at Canonical1 day ago Be among the first 25 applicantsJoin to apply for the Site Reliability / Gitops Engineer role at CanonicalGet AI-powered advice on this job and more exclusive features.Canonical is a leading provider of open source software and operating systems to the global enterprise and...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Targeted Talent Full time

    OverviewWe are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg. Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the key...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full time

    We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go, coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...


  • Edmonton, Toronto, Montreal, Calgary, Vancouver, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Surrey, Victoria, London, Halton Hills, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Canonical Full time

    OverviewJoin to apply for the Senior Site Reliability Engineer role at Canonical.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. We operate...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada TekRek Full time

    This range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...


  • Vancouver, Toronto, Montreal, Calgary, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Chainlink Labs Full time

    Join to apply for the Senior Site Reliability Engineer role at Chainlink LabsJoin to apply for the Senior Site Reliability Engineer role at Chainlink LabsGet AI-powered advice on this job and more exclusive features.About UsChainlink Labs is the primary contributing developer of Chainlink, the decentralized computing platform powering the verifiable web....