Current jobs related to Site Reliability Engineer - Toronto, Ontario - Tecsys Inc.


  • Toronto, Ontario, Canada Aarorn Technologies Inc Full time

    Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...


  • Toronto, Ontario, Canada Compass Digital Full time

    Join Compass Digital as an Intermediate Site Reliability Engineer and help power the future of hospitality tech You'll design, build, and automate cloud-native systems that are reliable, observable, and scalable—working with AWS, Go, TypeScript, serverless, containers, and cutting-edge DevOps tools.WHO WE ARECompass Digital is an organization that drives...


  • Toronto, Ontario, Canada Verto Health Full time

    About Verto HealthAt Verto Health, we're transforming how healthcare organizations connect and collaborate through delivery of digital twin & AI-enabled journeys for population health. Our solutions use patented technology to transform structured and unstructured data, from any source, into seamless patient journeys - reducing administrative burden for...


  • Toronto, Ontario, Canada Moneris Full time

    Your Moneris Career - The OpportunityAs the Site Reliability Engineer (SRE), you will play a crucial role in ensuring the reliability, performance, and scalability of our systems. You will work closely with development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system healthLocation:You will be...


  • Toronto, Ontario, Canada Fivetran Full time

    About the RoleFivetran is looking for a high-performance engineer to be a part of a team of Site Reliability Engineers. You will be working closely with engineering teams, product managers, as well as support and sales engineers to build the future of the Fivetran Data Platform Reliability. As a member of the Site  Reliability Engineering team, you will...


  • Toronto, Ontario, Canada Scotiabank Full time

    Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview:As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive the...

  • Site Reliability

    7 hours ago


    Toronto, Ontario, Canada Infotek Consulting Inc. Full time

    CAN – Site Reliability / DevOps Engineer (Expert) (Contract)Start Date:Approx. 02/03/2026Duration:12 monthsSchedule:Mon–Fri, core business hours (37.5 hrs/week)Location:Onsite – Toronto, ON occasional WFH for work-life balance)Overtime:NoRole OverviewSeeking an experienced Site Reliability / DevOps Engineer to support a market data engineering...


  • Toronto, Ontario, Canada Tubi Full time

    About Tubi:Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of Hollywood movies and TV shows, thousands of creator-led stories and hundreds of Tubi Originals made for the most passionate fans. Headquartered in San Francisco and founded in 2014,...


  • Toronto, Ontario, Canada Intelliswift - An LTTS Company Full time

    Pay rate range - $80/hr. to $86/hr.Hybrid work requirements: 2 days/week in officeRole Mandate:The DevOps and Automation is looking for a Site Reliability Engineer with strong expertise in Dynatrace to ensure the reliability, performance and observability of large scale, distributed systems.Role Responsibilities:Monitoring application flow (transactions) to...


  • Toronto, Ontario, Canada Denvr Full time

    Who We AreDenvr is a vertically integrated AI Platform Services company with headquarters in Calgary, Canada. We provide foundational compute infrastructure and services to support the broader AI ecosystem and its end users. The platform includes cloud-native solutions for training, inference, high-performance computing, data processing, scalable storage,...

Site Reliability Engineer

3 weeks ago


Toronto, Ontario, Canada Tecsys Inc. Full time

Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our conveniently located offices and collaborative workspaces, provide our team with the freedom and flexibility to work in the way that makes our employees most productive.

About Us
Tecsys is a fast-growing innovator offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tescys could be a good fit for you

About The Role
We are looking for a Site Reliability Engineer to join our Network and Security Operations Center (NOC), a team at the heart of platform reliability for mission-critical SaaS environments. You will help
maintain, optimize, and ensure the reliability and performance
of the systems that power our cloud infrastructure across AWS and Kubernetes, with a strong focus on automation, observability, and continuous improvement. This role blends reliability engineering with incident command, giving you real ownership over uptime, performance, and innovation. You will be part of a highly skilled team that values creative problem-solving, operational excellence, and continuous improvement through automation and resilience engineering.

Your Responsibilities

  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
  • Innovate relentlessly: Identify pain points, propose creative solutions, and drive initiatives that simplify, scale, and strengthen the platform
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
  • Own observability: Enhance and expand monitoring and alerting using Datadog; define SLOs/SLIs and create actionable dashboards that drive reliability outcomes
  • Drive automation: Develop and improve internal tooling, IaC frameworks, and pipelines (Terraform, GitLab CI/CD) to reduce manual intervention and enable self-healing systems
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity
  • Be on-call
  • Practice sustainable incident response and blameless postmortems. Lead post-incident reviews (RCAs) and identify long-term fixes that improve stability, reliability, and developer experience
  • Implement monitoring, Logging, alerting, and SLA Reporting
  • Create and maintain technical documentation
  • Implement, maintain and mature SRE best practices
  • Lead incidents: Act as Incident Commander for Incidents; coordinate cross-team response, manage communications, and ensure rapid service restoration
  • Provide support for our planning and deployment teams to enable stability, predictability, and scale in our continued growth
  • Collaborate with members of the Platform Engineering team to implement and support far-reaching strategic efforts, provide constructive feedback, and foster a collaborative environment
  • Work cross-functionally with internal teams and vendors to manage our growth around the globe, with a strong focus on maintaining the high level of performance, availability, and reliability for our users

Requirements

  • 5+ years in Site Reliability, Cloud, or DevOps Engineering, ideally in SaaS or large-scale production environments
  • Experience designing and deploying large scale systems, multi-vendor platforms and globally distributed infrastructure
  • Proven experience managing cloud infrastructure in AWS (multi-account, VPC, EC2, EKS) and Kubernetes at scale
  • Strong hands-on experience with IaC and automation (Terraform, Ansible, or similar)
  • Familiarity with CI/CD pipelines and release automation (GitLab preferred, Jenkins acceptable)
  • Deep understanding of monitoring and observability using Datadog (or equivalent), including metric design, log pipelines, alerting, and dashboards
  • Experience with incident management, on-call participation, escalation, and structured postmortems
  • Scripting skills in Python, Bash, Java or equivalent for automation and diagnostics
  • Curiosity, ownership, and a bias for action; you see a problem, you solve it, and you share the lessons learned
  • Experience with Fedramp (The Federal Risk and Authorization Management Program) compliance is a strong asset
  • Basic knowledge of Java- or .Net-based development required
  • Strong English communication skills, both written and spoken, are essential for effective correspondence with customers, business partners and colleagues beyond the province of Quebec

Additional requirements:

  • Escalation on-call rotation
  • Occasional travel (quarterly offsites, conferences - less than 10%)

At Tecsys, we are committed to fostering a diverse and inclusive workplace where all employees feel valued, respected, and empowered. We believe that diversity drives innovation and strengthens our ability to deliver exceptional solutions. We welcome and encourage applicants from all backgrounds, experiences, and perspectives to join our team.

Tecsys is an equal opportunity employer. Accommodation is available for applicants selected for an interview.
NB: if you are applying to this position, you must be a Canadian Citizen or a Permanent Resident of Canada,
OR
, have a valid Canadian work permit.