Director, Site Reliability Engineering
3 weeks ago
Helping SaaS companies scale Engineering teams. Director, Site Reliability Engineering We are seeking an accomplished Director of Site Reliability Engineering (SRE) to lead the reliability, scalability, and performance initiatives across multiple enterprise technology domains, including AML, Risk, Finance, Corporate Treasury, and Human Resources systems. This role is ideal for a technical leader with a software engineering mindset who thrives on cross-functional collaboration and operational excellence. As the Director of SRE, you’ll guide the strategy and execution of reliability practices across critical enterprise platforms. You will ensure continuous system availability, drive automation, enhance observability, and lead incident management processes to maintain the highest levels of uptime and stability. Key Responsibilities Leadership Partner with engineering, infrastructure, and business technology teams to establish reliability standards and best practices. Lead major incident response efforts, ensuring timely resolution and clear communication with stakeholders. Serve as a reliability advocate across teams, driving alignment on priorities and technical direction without direct line management. Participate in change management and governance processes, providing technical insight and assessing potential risks. System Reliability and Monitoring Collaborate with infrastructure and cloud engineering teams to build advanced monitoring systems and proactive issue detection. Establish alerting frameworks and escalation paths to ensure swift action during critical events. Analyze performance metrics and system telemetry to identify patterns, eliminate bottlenecks, and optimize scalability. Incident Response and Postmortem Analysis Lead coordinated responses to major production incidents to minimize downtime and impact. Conduct post-incident reviews, documenting root causes and implementing long-term preventative measures. Automation and Infrastructure-as-Code Champion automation across provisioning, deployment, scaling, and recovery processes. Partner with development teams to embed reliability into CI/CD pipelines and infrastructure workflows. Capacity Planning and Optimization Collaborate with application and infrastructure leaders to forecast demand and design resilient, scalable architectures. Ensure cost efficiency and performance optimization across hybrid and cloud environments. Collaboration and Communication Communicate effectively across technical and business audiences, ensuring transparency and alignment on system health, reliability goals, and incident learnings. Qualifications Proven experience in Site Reliability Engineering, DevOps, or Production Engineering leadership. Strong background in both on-premises and cloud infrastructure (AWS, Azure, or GCP). Expertise in Infrastructure-as-Code (Terraform, CloudFormation, etc.) and automation tools. Hands-on experience with monitoring and observability stacks (Datadog, Prometheus, Grafana, Splunk, etc.). Demonstrated ability to manage high-impact incidents and drive lasting process improvements. Excellent communication and influence skills across engineering, operations, and business stakeholders. Toronto, Ontario, Canada | Salary: CA$80,000.00 - CA$120,000.00 (3 weeks ago) #J-18808-Ljbffr
-
Systems Reliability Engineer
1 week ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full time $120,000 - $180,000 per yearRequisition ID: 239640Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The RoleAs a member of the Systems Reliability Engineering team, the System Reliability Engineer will collaborate closely with Engineering and development teams, peers, and business partners to continuously improve the stability,...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Glia Technologies, Inc. Full timeOur award-winning technology powers conversations with customers for some of the world’s largest enterprises. We believe that combining the human touch with technology is the best way to create amazing customer experiences. When human abilities such as problem-solving, creative thinking and relationship building are enhanced with technology... magical...
-
Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeSenior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...
-
Senior Site Reliability Engineer
2 days ago
, , Canada Thinkific Full timeJoin to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Akamai Technologies Full timeSenior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
-
Senior Site Reliability Engineer
6 hours ago
Canada (Remote) Glia Full time $120,000 - $180,000 per yearAbout GliaGlia is the leading AI customer service solution for banks and credit unions. Our platform unifies AI and human agents across every voice and digital conversation through our proprietary ChannelLess Architecture. With AI for All, organizations overcome the tradeoff between efficiency and experience by using AI to automate conversations and elevate...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Targeted Talent Full timeOverview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...
-
Senior Site Reliability Engineer
2 days ago
, , Canada DuckDuckGo Full time6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
-
Senior Site Reliability Engineer
2 days ago
, , Canada TextNow Full timeThis range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...