Senior Site Reliability Engineering Specialist
3 days ago
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong. What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.
This is a hybrid role based out of Waterloo. Hybrid is 3 days a week onsite and 2 days a week remote.
As a Senior Site Reliability Engineer in Supply Chain Management (SCM) – Make & Deliver, you will ensure that SAP Digital Manufacturing and SAP Logistics Management operate reliably and efficiently at scale. These solutions support critical manufacturing and logistics processes worldwide, built on SAP BTP, Kubernetes, and multicloud environments. In this role, you act as an Enablement Advocate within the organization: partnering with development teams to review architecture for resiliency, enforce reliability guardrails, and integrate observability and performance standards into the design process. Beyond operational excellence, you will also help develop and integrate AIOps tools for smarter monitoring and automated remediation, ensuring reliability is embedded across the lifecycle. You'll contribute to incident response for high severity events and drive automation that reduces complexity, enabling teams to deliver services that meet reliability goals by default.
WHAT YOU'LL DO
Define and maintain SLIs/SLOs for critical services; apply error budgets to guide release decisions.Collaborate with development teams to embed resiliency patterns and reliability guardrails into architecture and code.Contribute to incident response for high severity events; support root cause analysis and post-incident improvements.Establish and evolve observability standards (logging, metrics, tracing) and build actionable dashboards and alerts.Drive performance and scalability improvements through load testing, capacity planning, and CI/CD performance gates.Automate operational tasks using Infrastructure-as-Code (Terraform/Helm), pipelines, and scripts to reduce toil.Advance AIOps capabilities for anomaly detection, smarter alerting, and faster remediation.Partner across teams to provide guidance, reviews, and golden paths for reliability by default.TECH YOU'LL USE (DAY TO DAY)
Cloud & Platform: Kubernetes, Docker, SAP BTP, AWS/Azure services.Automation & Development: CI/CD pipelines (GitHub Actions / Azure DevOps), Infrastructure as Code (Terraform/Helm), scripting, and integration into dev workflows.Observability: Logging, metrics, tracing tools; Dynatrace, Kibana/Elastic, Prometheus, OpenTelemetry.Data & Messaging: Confluent Kafka, SAP HANAPerformance Testing: Load and stress testing tools (e.g., JMeter, k6).Languages: TypeScript, Python, Bash, Java.WHAT YOU'LL BRING
6-10+ years in SRE, DevOps, or production operations for distributed systems.Proven experience with incident response and root cause analysis for high severity events.Strong skills in observability, performance engineering, and automation.Hands on expertise in Kubernetes cluster management and troubleshooting.Ability to model load, run stress tests, analyze bottlenecks, and plan capacity.Proficiency in CI/CD and Infrastructure as Code, with ability to influence development practices.Excellent collaboration and communication skills to partner with development and product teams.NICE TO HAVE
Familiarity with AIOps concepts (AI‑driven anomaly detection, predictive alerting, automated remediation).Hands-on experience with LLM Agents frameworks (e.g. LangGraph or similar) for automation or reliability tooling.Certifications in Kubernetes, SAP BTP, or Dynatrace.Experience with the manufacturing domain.EDUCATION & WORK STYLE
Bachelor's degree in computer science, Engineering, or equivalent experience.Curious, proactive, and data‑driven; comfortable mentoring and promoting best practices.Travel: Occasional (up to 0–10%) for team workshops or cross‑site collaboration.On‑call: Participation in a healthy rotation with continuous improvement focus.Bring out your best
SAP innovations help more than four hundred thousand customers worldwide work together more efficiently and use business insight more effectively. Originally known for leadership in enterprise resource planning (ERP) software, SAP has evolved to become a market leader in end-to-end business application software and related services for database, analytics, intelligent technologies, and experience management. As a cloud company with two hundred million users and more than one hundred thousand employees worldwide, we are purpose-driven and future-focused, with a highly collaborative team ethic and commitment to personal development. Whether connecting global industries, people, or platforms, we help ensure every challenge gets the solution it deserves. At SAP, you can bring out your best.
We win with inclusion
SAP's culture of inclusion, focus on health and well-being, and flexible working models help ensure that everyone – regardless of background – feels included and can run at their best. At SAP, we believe we are made stronger by the unique capabilities and qualities that each person brings to our company, and we invest in our employees to inspire confidence and help everyone realize their full potential. We ultimately believe in unleashing all talent and creating a better world.
SAP is committed to the values of Equal Employment Opportunity and provides accessibility accommodations to applicants with physical and/or mental disabilities. If you are interested in applying for employment with SAP and are in need of accommodation or special assistance to navigate our website or to complete your application, please send an e-mail with your request to Recruiting Operations Team:
For SAP employees: Only permanent roles are eligible for the SAP Employee Referral Program, according to the eligibility rules set in the SAP Referral Policy. Specific conditions may apply for roles in Vocational Training.
Qualified applicants will receive consideration for employment without regard to their age, race, religion, national origin, ethnicity, gender (including pregnancy, childbirth, et al), sexual orientation, gender identity or expression, protected veteran status, or disability, in compliance with applicable federal, state, and local legal requirements.
SAP believes the value of pay transparency contributes towards an honest and supportive culture and is a significant step toward demonstrating SAP's commitment to pay equity. SAP provides the annualized compensation range inclusive of base salary and variable incentive target for the career level applicable to the posted role. The targeted combined range for this position is 102, ,300 (CAD) CAD. The actual amount to be offered to the successful candidate will be within that range, dependent upon the key aspects of each case which may include education, skills, experience, scope of the role, location, etc. as determined through the selection process. Any SAP variable incentive includes a targeted dollar amount, and any actual payout amount is dependent on company and personal performance. A summary of benefits and eligibility requirements can be found by clicking this link:
Due to the nature of the role, which involves global interactions with SAP entities, as well as with employees and stakeholders in Canada, functional proficiency in English is required for positions based in the Quebec.
AI Usage in the Recruitment Process
For information on the responsible use of AI in our recruitment process, please refer to our Guidelines for Ethical Usage of AI in the Recruiting Process.
Please note that any violation of these guidelines may result in disqualification from the hiring process.
Requisition ID: | Work Area: Software-Design and Development | Expected Travel: 0 - 10% | Career Status: Professional | Employment Type: Regular Full Time | Additional Locations: #LI-Hybrid
-
Senior Site Reliability Engineer
2 weeks ago
, , Canada Akamai Technologies Full timeSenior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...
-
Senior Site Reliability Engineer
2 weeks ago
, , Canada TekRek Full timeThis range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...
-
Senior Site Reliability Engineer
4 weeks ago
, BC, Canada Orion Innovation Full timeOverview Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments. Responsibilities System Architecture & Management: Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools...
-
Site Reliability Engineer
1 week ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
1 week ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Senior Site Reliability Engineer
3 weeks ago
, , Canada Wealthsimple Full timeJoin to apply for the Senior Site Reliability Engineer role at Wealthsimple Get AI-powered advice on this job and more exclusive features. Your career is an investment that grows over time! Wealthsimple is on a mission to help everyone achieve financial freedom by reimagining what it means to manage your money. Using smart technology, we take financial...
-
Senior Site Reliability Engineer
2 weeks ago
, , Canada D-Wave Full timeJoin to apply for the Senior Site Reliability Engineer role at D‑Wave . D‑Wave (NYSE: QBTS) is a leader in the development and delivery of quantum computing systems, software, and services. We are the world’s first commercial supplier of quantum computers, and the only company building both annealing and gate‑model quantum computers. Our mission is...
-
Site Reliability Engineer
3 weeks ago
, , Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3 items to see on resumes 5+ years of experience in DevOps, Site...
-
Senior Site Reliability Engineer
5 days ago
, , Canada Medium Full timeWe believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that's because we're made up of people with curious minds who bring an optimistic, yet critical lens into the work we do. We're the largest provider of free phone service in the nation. And we're just getting started. Join...
-
, , Canada Wealthsimple Full timeA fintech company in Canada seeks a Senior Site Reliability Engineer to enhance system reliability and scalability. The ideal candidate will leverage their experience with Ruby, SQL, AWS, and Kubernetes to improve core infrastructure. Responsibilities include addressing infrastructure gaps and improving system observability. This role offers competitive...