Senior Site Reliability Engineering Specialist

3 days ago


Waterloo, Canada SAP Full time

We help the world run betterAt SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed. This is a hybrid role based out of Waterloo. Hybrid is 3 days a week onsite and 2 days a week remote. As a Senior Site Reliability Engineer in Supply Chain Management (SCM) – Make & Deliver, you will ensure that SAP Digital Manufacturing and SAP Logistics Management operate reliably and efficiently at scale. These solutions support critical manufacturing and logistics processes worldwide, built on SAP BTP, Kubernetes, and multicloud environments. In this role, you act as an Enablement Advocate within the organization: partnering with development teams to review architecture for resiliency, enforce reliability guardrails, and integrate observability and performance standards into the design process. Beyond operational excellence, you will also help develop and integrate AIOps tools for smarter monitoring and automated remediation, ensuring reliability is embedded across the lifecycle. You’ll contribute to incident response for high severity events and drive automation that reduces complexity, enabling teams to deliver services that meet reliability goals by default. WHAT YOU’LL DO Define and maintain SLIs/SLOs for critical services; apply error budgets to guide release decisions. Collaborate with development teams to embed resiliency patterns and reliability guardrails into architecture and code. Contribute to incident response for high severity events; support root cause analysis and post-incident improvements. Establish and evolve observability standards (logging, metrics, tracing) and build actionable dashboards and alerts. Drive performance and scalability improvements through load testing, capacity planning, and CI/CD performance gates. Automate operational tasks using Infrastructure-as-Code (Terraform/Helm), pipelines, and scripts to reduce toil. Advance AIOps capabilities for anomaly detection, smarter alerting, and faster remediation. Partner across teams to provide guidance, reviews, and golden paths for reliability by default. TECH YOU’LL USE (DAY TO DAY) Cloud & Platform: Kubernetes, Docker, SAP BTP, AWS/Azure services. Automation & Development: CI/CD pipelines (GitHub Actions / Azure DevOps), Infrastructure as Code (Terraform/Helm), scripting, and integration into dev workflows. Observability: Logging, metrics, tracing tools; Dynatrace, Kibana/Elastic, Prometheus, OpenTelemetry. Data & Messaging: Confluent Kafka, SAP HANA Performance Testing: Load and stress testing tools (e.g., JMeter, k6). Languages: TypeScript, Python, Bash, Java. WHAT YOU’LL BRING 6-10+ years in SRE, DevOps, or production operations for distributed systems. Proven experience with incident response and root cause analysis for high severity events. Strong skills in observability, performance engineering, and automation. Hands on expertise in Kubernetes cluster management and troubleshooting. Ability to model load, run stress tests, analyze bottlenecks, and plan capacity. Proficiency in CI/CD and Infrastructure as Code, with ability to influence development practices. Excellent collaboration and communication skills to partner with development and product teams. NICE TO HAVE Familiarity with AIOps concepts (AI‑driven anomaly detection, predictive alerting, automated remediation). Hands-on experience with LLM Agents frameworks (e.g. LangGraph or similar) for automation or reliability tooling. Certifications in Kubernetes, SAP BTP, or Dynatrace. Experience with the manufacturing domain. EDUCATION & WORK STYLE Bachelor’s degree in computer science, Engineering, or equivalent experience. Curious, proactive, and data‑driven; comfortable mentoring and promoting best practices. Travel: Occasional (up to 0–10%) for team workshops or cross‑site collaboration. On‑call: Participation in a healthy rotation with continuous improvement focus. Bring out your bestSAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.



  • Waterloo, Canada SAP Full time

    We help the world run betterAt SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your...


  • Waterloo, Canada OpenText Full time

    Overview OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...


  • Waterloo, Canada OpenText Full time

    Overview OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...


  • Waterloo, Canada OpenText Full time

    Overview OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects that shape the future of...


  • Waterloo, Canada Canonical Full time

    Overview Site Reliability Engineer role at Canonical. Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the...


  • Waterloo, Ontario, Canada Canonical - Jobs Full time US$120,000 - US$180,000 per year

    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...


  • Waterloo, Canada Canonical Full time

    OverviewSite Reliability Engineer role at Canonical. Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the...


  • Waterloo, Canada GHD Full time

    Senior Civil Engineer- Site Remediation, EarthworksJoin to apply for the Senior Civil Engineer- Site Remediation, Earthworks role at GHDSenior Civil Engineer- Site Remediation, Earthworks1 day ago Be among the first 25 applicantsJoin to apply for the Senior Civil Engineer- Site Remediation, Earthworks role at GHDWho are we looking for?We are looking for a...


  • Waterloo, Canada GHD Full time

    Senior Civil Engineer- Site Remediation, Earthworks Join to apply for the Senior Civil Engineer- Site Remediation, Earthworks role at GHD Senior Civil Engineer- Site Remediation, Earthworks 1 day ago Be among the first 25 applicants Join to apply for the Senior Civil Engineer- Site Remediation, Earthworks role at GHD Who are we looking for?We are looking for...

  • Site Reliability

    1 week ago


    Waterloo, Ontario, Canada Canonical - Jobs Full time $80,000 - $120,000 per year

    Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is very widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and silicon providers,...