Site Reliability Engineer

3 weeks ago


Toronto, Ontario, Canada Tecsys Inc. Full time
About the Role

We are looking for an exceptional Site Reliability Engineer to join our Network and Security Operations Center team. As a key member of our team, you will be responsible for ensuring the reliability and uptime of our platform and applications.

Key Responsibilities:
  • Collaborate with Engineering teams to support services through system design consulting, software development, capacity planning, and launch reviews.
  • Maintain services once live by measuring and monitoring availability, latency, and overall system health.
  • Develop tools and automation on top of Azure and AWS to reduce manual intervention.
  • Scale systems sustainably through automation and evolve systems to improve reliability and velocity.
  • Be on-call and practice sustainable incident response and blameless postmortems.
  • Implement automated solutions for continuous integration and delivery (CI/CD).
  • Implement monitoring, logging, alerting, and SLA reporting.
  • Implement service monitoring dashboards displaying key metrics.
  • Create and maintain technical documentation.
  • Apply SRE best practices.
  • Take command of high-severity incidents and facilitate their resolution.
Requirements:
  • Bachelor's degree in computer science or related technical discipline.
  • At least 5 years' experience in systems engineering, with demonstrable technical experience in new platform development, orchestration, product ownership, and iterative design and deployment.
  • Experience designing and deploying large-scale systems, multi-vendor platforms, and globally distributed infrastructure.
  • Strong knowledge of system design, high-performance computing, file, block, and storage technologies, and integration of compute, storage, and network technologies.
  • High-level understanding and examples of executing projects with full-stack automation.
  • Self-organize, collaborate, and manage efforts with peers and teams across responsibility areas, languages, geography, and time zones.
  • Be a self-starter, curious, and not afraid to ask questions and challenge the way things are done today.
  • See a problem or opportunity, take ownership, and act on it independently.
  • Knowledge of Datadog, Rapid7 Insight, AWS, Azure, Java, .Net, GitLab, and SaaS company experience are preferred.


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Royal Bank of Canada is seeking a highly skilled Site Reliability Engineering (SRE) leader to join our team in Toronto, Canada. As an SRE leader, you will be responsible for leading the development and implementation of SRE solutions that improve the reliability and performance of our applications.The ideal candidate will have 5+ years of experience as a...


  • Toronto, Ontario, Ontario, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...


  • Toronto, Ontario, Canada Compunnel Inc. Full time

    Compunnel Inc. is a leading provider of innovative technology solutions.We are seeking an experienced Site Reliability Engineering Lead to join our team in Toronto, Canada.The estimated salary for this position is $170,000 per year, considering the location and industry standards.About the JobThis role is perfect for someone who is passionate about driving...


  • Toronto, Ontario, Canada Index Exchange Full time

    About the Role:We are seeking a highly skilled Staff Site Reliability Engineer to own and develop on-premise and hybrid cloud environments, focusing on low-latency performance on Kubernetes platforms supporting a robust developer experience framework.The ideal candidate will have a deep technical understanding of on-premise and hybrid cloud architectures and...


  • Toronto, Ontario, Canada Index Exchange Full time

    About Index ExchangeWe have a rich history of shaping the earliest forms of ad tech, and we're now looking for talented engineers to help drive its future. Our customers face unique challenges that require technical expertise at internet scale.Our infrastructure handles over 450 billion requests daily, all running in our own global data centers. We provide...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    We are seeking an experienced Senior SRE to join our Shared Capabilities, Service Reliability and Operation team in Toronto. As a Cloud Native Site Reliability Engineer, you will be responsible for implementing site reliability engineering and DevOps best practices, building and maintaining monitoring for all aspects of infrastructure, micro-services, usage...


  • Toronto, Ontario, Canada Sentry Full time

    About SentryWe're on a mission to help developers write better software faster, so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations that believe we're on to something, we're building performance and error monitoring tools that help companies like Disney, Microsoft, and Atlassian spend less time fixing...


  • Toronto, Ontario, Canada Compunnel Inc. Full time

    At Compunnel Inc., we are looking for a talented Senior Site Reliability Engineer/DevOps to join our team. This is a challenging opportunity to work with the latest tools and technologies to drive forward Automation, Observability and CI/CD automation.The ideal candidate is passionate about driving SRE DevSecOps mindset and culture in a fast-paced...


  • Toronto, Ontario, Canada Royal Bank of Canada> Full time

    Job SummaryWe are seeking a talented Site Reliability Engineer to join our Digital team at Royal Bank of Canada. This is an exciting opportunity to accelerate our cloud native initiatives and make a difference in the industry.Job DescriptionWe are looking for an individual who embodies leadership, mentorship, and decision-making qualities. As a Site...


  • Toronto, Ontario, Canada Criteo Full time

    About the Role:This is a challenging opportunity for an experienced engineer to join Criteo's PRE team as a Site Reliability Engineer. The role involves working closely with product engineering to improve the reliability of our apps, systems, and pipelines, assessing where optimization is needed most, and telling stories with meaningful monitoring.Key...


  • Toronto, Ontario, Canada mccainfood Full time

    Job SummaryWe are seeking a highly skilled Global Site Reliability Engineer to join our team. As a key member of our organization, you will be responsible for ensuring the reliability, performance, and scalability of our global communication services.


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Job SummaryRoyal Bank of Canada is seeking an experienced professional to lead our Site Reliability Engineering (SRE) efforts for our US Cash Management Technology. This is a unique opportunity to shape the future technology landscape of the company, delivering key business values and implementing strategic components across all RBC functions defined in our...


  • Toronto, Ontario, Canada Estée Lauder Companies Full time

    Reliability Engineering Manager RoleWe are seeking a highly skilled Reliability Engineering Manager to join our team at Estée Lauder Companies. As a key member of the Plant Management Team, you will be responsible for leading maintenance and reliability processes to achieve operational excellence.The ideal candidate will have a strong background in plant...


  • Toronto, Ontario, Canada Criteo Full time

    About the Role:Criteo is seeking a talented Site Reliability Engineer to join our PRE team.What You'll Do: As a Site Reliability Engineer, you'll work closely with product engineering to improve the reliability of our apps, systems, and pipelines. You'll assess where optimization is needed most and tell stories with meaningful monitoring.How You'll Make an...


  • Toronto, Ontario, Canada Index Exchange Full time

    About Index ExchangeWe are shaping the future of ad tech and seeking an experienced Senior Site Reliability Engineering Manager to lead our SRE team.As a key member of our technical leadership, you will be responsible for building and managing a high-performing SRE team, fostering a culture of innovation, collaboration, and accountability. You will provide...


  • Toronto, Ontario, Canada SGS Full time

    Job SummaryThe Site Reliability Engineer will play a critical part in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with ASP.NET MVC, Angular, and Web API.Key ResponsibilitiesPartner with developers and product operations teams to understand application requirements and translate them into...


  • Toronto, Ontario, Canada Compunnel Inc. Full time

    About Compunnel Inc.Compunnel Inc. is a fast-paced and dynamic company seeking a skilled Site Reliability Engineer/DevOps Expert to join our team.Job DescriptionWe are looking for an experienced professional who can drive the SRE DevSecOps mindset and culture in our organization. The ideal candidate will have a strong passion for driving automation,...


  • Toronto, Ontario, Canada Lorven Technologies Full time

    We are seeking a skilled Site Reliability Engineer to support our long-term project in a hybrid environment. The successful candidate will have strong expertise in Azure and OpenShift, as well as experience with Dynatrace/ELK/Splunk for monitoring and observability.Key Responsibilities:Develop SRE solutions (monitoring and alerting, machine learning anomaly...


  • Toronto, Ontario, Canada Teranet Inc. Full time

    About TeranetTeranet is a leading innovator in electronic services and solutions, operating one of the most advanced and secure registration systems worldwide.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our DevOps team. The ideal candidate will possess strong software engineering principles and infrastructure expertise to...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleIn this opportunity as a Senior Site Reliability Engineer, you will:Identify options for problem resolution and initiate action.Engage others as appropriate and escalate as required.Liaise with various application development and content teams, customer service teams, and other software and hardware support teams.Proactively monitor production...