Senior Site Reliability Engineer
21 hours ago
Senior Site Reliability Engineer Location: RemoteAbout the RoleWe are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go, coupled with hands-on mastery of Linux systems, advanced networking concepts (specifically IPSec), and modern container orchestration. You will be instrumental in bridging development velocity with operational stability, creating self-healing systems, and eliminating toil across the organization.What You Will DoEngineering Reliability: Design, implement, and maintain highly available, scalable, and secure production infrastructure using Infrastructure as Code (IaC) principles.System & Tooling Development: Lead the development of new internal services, automation tools, and critical performance enhancements, leveraging Go for high-performance systems and Ruby for platform-specific tooling and automation.Networking & Security: Configure, manage, and troubleshoot advanced network connectivity, with a focus on IPSec tunnels and complex Linux networking for secure data transfer between environments.Observability: Build and optimize comprehensive monitoring and alerting systems using Prometheus and Grafana to define and meet critical Service Level Objectives (SLOs).Container Management: Own the deployment, lifecycle, and configuration of containerized applications using Helm to standardize releases across Kubernetes clusters.Automation: Drastically reduce operational toil by writing sophisticated scripts in Python and advanced Shell Scripting/PowerShell for operational tasks, CI/CD pipelines, and automated incident response.Incident Response: Serve as a technical escalation point in on-call rotations, leading the effort to conduct root cause analyses (RCAs) and implementing preventative measures to ensure issues never recur.GitOps: Utilize Git workflows for all infrastructure and application configuration changes, promoting a systematic and auditable approach to system management.Required Skills & ExperienceDeep Technical Expertise (Must Haves)Expert Proficiency in Ruby (5+ years): Demonstrated ability to write, debug, and optimize complex services and high-level automation/frameworks in Ruby.Expert Proficiency in Go (Golang) (3+ years): Proven experience building production-grade infrastructure tools, microservices, or platform components in Go.Networking Security (IPSec): Deep, hands-on experience configuring, troubleshooting, and securing VPNs/site-to-site connectivity using IPSec in a production environment.Container Orchestration: Expertise with Helm for packaging and deploying applications on Kubernetes (K8s).Observability: Extensive experience setting up, administering, and utilizing the Prometheus time-series database and Grafana for data visualization and dashboard creation.Operating Systems: Mastery of the Linux operating system and its internals.Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.Core SRE Skills - PreferredAutomation & Scripting: Strong proficiency in Python and robust Shell Scripting (Bash/Zsh) or PowerShell for system automation and administration.Infrastructure as Code & Version Control: Excellent knowledge of Git for source control and significant experience with IaC tools (e.g., Terraform, Ansible) for infrastructure automation.Troubleshooting: Elite-level debugging and performance tuning skills across the entire stack (kernel, network, application, database).
-
Senior Site Reliability Engineer
1 week ago
Ontario, Canada Orion Innovation Full timeSenior Site Reliability Engineer Location: Remote About the Role We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands-on mastery of...
-
Site Reliability Engineer
1 week ago
Ontario, Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote [Working EST hours] Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Site Reliability Engineer
2 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full time US$80,000 - US$140,000 per yearRequisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
2 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full time $105,000 - $170,000 per yearRequisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Senior Site Reliability Engineer
2 weeks ago
Southwestern Ontario, Canada Canonical Full timeJoin to apply for the Senior Site Reliability Engineer role at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our...
-
Senior Site Reliability Engineer
3 weeks ago
Southwestern Ontario, Canada Canonical Full timeJoin to apply for the Senior Site Reliability Engineer role at CanonicalCanonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation and IoT. Our customers...
-
Senior Site Reliability Engineer
4 weeks ago
Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote [Working EST hours] Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Senior Site Reliability Engineer
4 weeks ago
Southwestern Ontario, Canada CARTA Full timeThe Company You’ll Join The Company You’ll Join Carta develops purpose-built software that transforms traditional accounting into a powerful growth engine. Carta’s world-class fund administration platform supports nearly 7,000 funds and SPVs, and represents nearly $130B in assets under management in venture capital and private equity. Trusted by more...
-
Southwestern Ontario, Canada SAP Full timeSenior Site Reliability Engineering Specialist Join to apply for the Senior Site Reliability Engineering Specialist role at SAP. Hybrid Work Arrangement This is a hybrid role based out of Waterloo. Hybrid is 3 days a week onsite and 2 days a week remote. We help the world run betterAt SAP, we keep it simple: you bring your best to us, and we'll bring out the...
-
Southwestern Ontario, Canada SAP Full timeSenior Site Reliability Engineering Specialist Join to apply for the Senior Site Reliability Engineering Specialist role at SAP. Hybrid Work Arrangement This is a hybrid role based out of Waterloo. Hybrid is 3 days a week onsite and 2 days a week remote. We help the world run better At SAP, we keep it simple: you bring your best to us, and we'll bring out...