Senior Site Reliability Engineer

3 weeks ago


Canada D-Wave Full time

Join to apply for the Senior Site Reliability Engineer role at D‑Wave . D‑Wave (NYSE: QBTS) is a leader in the development and delivery of quantum computing systems, software, and services. We are the world’s first commercial supplier of quantum computers, and the only company building both annealing and gate‑model quantum computers. Our mission is to help customers realize the value of quantum today. Our quantum computers provide sub‑second response times and can be deployed on‑premises or accessed through our quantum cloud service, which offers 99.9% availability. More than 100 organizations trust D‑Wave with their toughest computational challenges, having submitted over 200 million problems to our quantum systems to date. About The Role We are seeking a talented and experienced Senior Site Reliability Engineer (SRE) to join our DevOps team. As a key member, you will be responsible for the reliability of our SaaS product, the research laboratory, and the infrastructure supporting our production quantum computers worldwide. You will play a critical role in ensuring the reliability, scalability, and performance of our company’s systems and infrastructure. What You’ll Do Refine, refactor, and evolve monitoring systems and related tools covering workloads in AWS, GCP, on‑premises, and remote field systems worldwide. Work with teams including software and hardware engineering, processor development, cryogenics, and customer support to elicit requirements, collect and store metrics, analyze trends, and provide dashboards and other tooling to enable observability across the organization. Own alerting with other SREs to support infrastructure and on‑call management systems, ensuring alerting is reliable and scalable. Work closely with the DevOps and Test Engineering teams to enable instrumenting builds and deploys to ensure reliability through every step of the software development lifecycle. About You 4+ years of experience operating and troubleshooting SaaS/PaaS applications and environments on a major cloud platform – AWS and GCP preferred – including platform‑specific monitoring technologies like CloudWatch and Stackdriver. 4+ years of experience with high‑level SRE work including incident management, process design, managing on‑call rotations (with PagerDuty), and cross‑training new and existing employees. Experience with on‑premises compute, including servers, storage, power, virtualization, and networking equipment, and specifically using SNMP to monitor networked devices. 4+ years of experience with AOS/Elasticsearch/Loki or similar log management tools. Experience with time‑series databases like Prometheus/InfluxDB, document stores like MongoDB, and classic relational databases like PostgreSQL, AWS Redshift, etc. Proficiency in InfluxQL and PromQL. Significant expertise supporting and integrating analytics and monitoring systems such as ELK, Grafana, Prometheus, Zabbix, LibreNMS, Intermapper, etc. At least two years of programming experience in Python, Go, Bash, Ruby, or equivalent. Degree in Computing Science, Engineering, or equivalent education and experience. Excellent oral and written communication skills – you like to document your work Bonus Points 3+ years specific experience with Elasticsearch / AWS OpenSearch, Fluent, Grafana Cloud. Experience with Kubernetes monitoring. Experience with producing synthetic metrics and instrumenting existing applications and platforms to extract metrics for analysis. Experience with OpenTelemetry. Proven record of cross‑training and evangelizing observability as a critical aspect of all systems. A D‑Waver's DNA We look at the future and say "why not"; we see possibilities where others see problems or routines. We show the way ahead and are committed to achieving ambitious goals. We practice straight talk and listen generously to each other with empathy. We value different opinions and points of view. We ensure that we connect outside as well as inside to learn from others and inspire each other. We hold ourselves accountable for delivering results. We make decisions & take responsibility so that we can act & support each other. As leaders we motivate & engage our teams to undertake beyond what they originally thought possible, by developing our teams & creating the conditions for people to grow and empower themselves through enabling & coaching. Compensation Philosophy We believe providing D‑Wavers with company ownership, competitive pay, and a range of meaningful benefits is the start of creating a culture where people want to give the best they’ve got — not because they’re simply making money, but because they’ve fallen in love with our vision, mission, values, and team. Inclusion We celebrate diverse perspectives to drive innovation in our pursuit. Our employees range from distinguished domain experts with decades of experience in their respective fields to bright and motivated graduates eager to make their mark. Our diverse and innovative team will make you feel appreciated, supported, and empower your career growth at D‑Wave. Fine Print No 3rd party candidates will be accepted. EEO Statement It is D‑Wave Systems Inc. policy to provide equal employment opportunity (EEO) to all persons regardless of race, color, religion, sex, national origin, age, sexual orientation, gender identity, genetic information, physical or mental disability, protected veteran status, or any other characteristic protected by federal, state/provincial, local law. Base Pay Range 124,364 – 185,545 USD (Remote, United States) 124,364 – 185,545 CAD (Remote, Canada) Seniority Level Mid‑Senior level Employment Type Full‑time Job Function Engineering and Information TechnologyIT Services and IT Consulting #J-18808-Ljbffr



  • , , Canada Thinkific Full time

    Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...


  • , , Canada Akamai Technologies Full time

    Senior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...


  • , , Canada DuckDuckGo Full time

    6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...


  • , , Canada Orion Innovation Full time

    Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...


  • , , Canada Targeted Talent Full time

    Overview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...


  • , , Canada TextNow Full time

    This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...


  • , , Canada Orion Innovation Full time

    We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...


  • , , Canada Orion Innovation Full time

    Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...


  • , , Canada TekRek Full time

    This range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...


  • , , Canada Bitcomplete Full time

    Join us as a Senior Site Reliability Engineer to help us run an industry-scale GPU cluster via Kubernetes. Together with senior members of our team, you will combine your strong understanding of system scaling and security practices with your cloud-native expertise to stand up and maintain Kubernetes clusters from scratch. Your role will also be pivotal in...