Senior Site Reliability Engineer
3 weeks ago
Our award-winning technology powers conversations with customers for some of the world’s largest enterprises. We believe that combining the human touch with technology is the best way to create amazing customer experiences. When human abilities such as problem-solving, creative thinking and relationship building are enhanced with technology... magical moments happen. The Team You’ll be joining our dedicated Infrastructure Team, which is responsible for the reliability, scalability, and performance of Glia’s cloud‑native core infrastructure serving the conversational AI. Our team focuses on operational excellence and proactive problem‑solving to ensure our systems are always available and performing optimally. All SREs on the team report to a dedicated Engineering Manager. Our work is driven by Objectives and Key Results, defined quarterly in collaboration with the Director of Engineering. All projects are planned, led, and executed by our engineers. Our SRE team is located primarily in Vancouver and Toronto and works in the Pacific Time zone (PT). We are optimized for remote collaboration and welcome candidates from anywhere in Canada. The Work As a Senior Site Reliability Engineer, your primary focus will be on the health and performance of our production services. Responsibilities will include: Defining, measuring, and reporting on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services. Partnering with development teams to establish error budgets and the operational consequences of their consumption. Writing software to automate production operations, eliminating manual toil and improving system resilience. Leading the incident response process for complex outages, including conducting blameless postmortems to drive systemic improvements. Engineering and improving deployment systems and CI/CD pipelines to increase release velocity while maintaining production stability. Conducting deep dives into system performance, engaging in capacity planning, and performing production readiness reviews. Developing and maintaining operational runbooks and incident response playbooks. Participating in a periodic on‑call rotation as an escalation point for critical service interruptions. Persistence: Amazon Aurora Serverless for Postgres, RabbitMQ Cache: Amazon ElastiCache for Valkey Monitoring & Observability: DataDog with a focus on dashboards and alerts for system health. CI/CD: Github Actions, ArgoCD, Jenkins, Helm, with a focus on automation and pipeline optimization. Infrastructure as Code: Terraform Additionally, our Engineering teams use: Backend: Python, Elixir, Node.js, and Ruby Native mobile SDKs: Java and Swift Candidate Requirements 5+ years of relevant experience in Site Reliability Engineering or a closely related discipline (e.g., DevOps, Platform Engineering, Infrastructure). Deep, practical understanding of Site Reliability Engineering (SRE) principles (SLOs, error budgets, toil reduction). Demonstrable experience analyzing and troubleshooting large‑scale distributed systems. Expert‑level proficiency with AWS and Kubernetes (EKS), particularly in areas of observability, networking, and auto‑scaling. Strong software development skills in a language like Python or Go, used to build operational tools, services, or automation. Experience with modern observability platforms (e.g., DataDog, Prometheus) and a deep understanding of metrics, logging, and tracing. Expertise in designing and operating robust CI/CD pipelines for a microservices architecture (e.g., using ArgoCD, Github Actions, Helm). A systematic, data‑driven approach to problem‑solving and root cause analysis. We are insatiably curious and hungry for knowledge here at Glia. Even if you don’t meet all the requirements exactly, we encourage you to apply as long as you are passionate about mastering your craft and developing your skills. *Glia is an equal‑opportunity employer. Glia does not discriminate against any employee or applicant because of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), or any other basis protected by law. #J-18808-Ljbffr
-
Senior Site Reliability Engineer
2 days ago
, , Canada Thinkific Full timeJoin to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Akamai Technologies Full timeSenior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
-
Senior Site Reliability Engineer
2 days ago
, , Canada DuckDuckGo Full time6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeJob Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Targeted Talent Full timeOverview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...
-
Senior Site Reliability Engineer
2 days ago
, , Canada TextNow Full timeThis range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeWe are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...
-
Site Reliability Engineer
3 weeks ago
, , Canada Orion Innovation Full timeSenior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada TekRek Full timeThis range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada D-Wave Full timeJoin to apply for the Senior Site Reliability Engineer role at D‑Wave . D‑Wave (NYSE: QBTS) is a leader in the development and delivery of quantum computing systems, software, and services. We are the world’s first commercial supplier of quantum computers, and the only company building both annealing and gate‑model quantum computers. Our mission is...