Site Reliability Engineer

4 weeks ago

Toronto, Canada Capgemini Full time

Overview Site Reliability Engineer (contract) at Capgemini. We are seeking a Site Reliability Engineer to provide hands-on SRE support, ensuring application reliability, incident resolution, automation, and compliance across enterprise systems. This role requires advanced knowledge of SRE practices, production support, automation, and monitoring technologies. You will be responsible for incident and problem management, infrastructure maintenance, continuous improvement, and developing automated solutions that enhance system availability and performance. Key Responsibilities Deliver 24x7 SRE support, including incident management, problem management, RCA, monitoring, alerting, and infrastructure maintenance. Track, audit, and implement improvements across technical work streams. Act as SME for supported applications, documenting core components and infrastructure. Serve as escalation point in on-call rotation, supporting maintenance, release deployments, and scheduled work. Lead incident and problem management processes, owning RCA action items. Drive continuous improvement in productivity, tooling, monitoring, and technical standards. Manage technology currency (server patching, certificate renewal, compliance), with focus on automation opportunities. Design and develop SRE solutions such as monitoring/alerting, anomaly detection, self-healing, and reliability testing. Apply design-thinking and agile practices in collaboration with SREs, Scrum Masters, and Incident Leads. Simplify development by creating reusable solutions to manual tasks. Perform production support, including off-hours support and rotational on-call (with overtime, lieu time, and allowance). Ensure application availability and uptime aligned with service-level objectives. Support cross-team initiatives, consulting on products and enterprise-wide solutions. Stay updated on emerging technologies and provide knowledge-sharing demos. Technical Profile Bachelor’s degree in Computer Science, Engineering, Mathematics, Physics, or related technical field, or equivalent practical experience. 4–5 years of relevant SRE/DevOps experience. Strong proficiency in: Python, YAML, Shell scripting Microsoft Azure and Linux environments Monitoring and alerting tools (Dynatrace, Prometheus, Splunk, Elastic, Azure Monitor, PagerDuty, Moog) Automation tools (Ansible, Azure Automation, Catchpoint) Messaging/streaming platforms (MQ, Kafka) Chaos Engineering principles and practices Experience performing production support, including off-hours and on-call. Functional Profile Strong background in incident and problem management with ability to lead RCA processes. Ability to act as SME and maintain detailed technical documentation. Experience working in agile teams and driving continuous improvement. Focused on reliability, compliance, and automation adoption. Adept at collaborating across enterprise-wide teams for cross-functional solutions. Commitment to continuous learning and applying emerging technologies. Skills Summary Core Expertise: Site Reliability Engineering, Incident & Problem Management, Root Cause Analysis, Continuous Improvement, Compliance & Technology Currency Languages & Frameworks: Python, YAML, Shell scripting Reactive & Event-Driven Tools: MQ, Kafka Cloud & Containerization: Microsoft Azure, Linux Database & Messaging: Messaging middleware (MQ, Kafka) DevOps & CI/CD: Ansible, Azure Automation, Catchpoint Other Tools & Technologies: Dynatrace, Prometheus, Splunk, Elastic, Azure Monitor, PagerDuty, Moog, Chaos Engineering practices Compensation & Benefits The pay range that the employer in good faith reasonably expects to pay for this position is $52.32/hour - $81.75/hour. Our benefits include medical, dental, vision and retirement benefits. Applications will be accepted on an ongoing basis. About Capgemini & Equal Opportunity Tundra Technical Solutions is among North America’s leading providers of Staffing and Consulting Services. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act. Referrals increase your chances of interviewing at Capgemini by 2x. Get notified about new Site Reliability Engineer jobs in Toronto, Ontario, Canada. #J-18808-Ljbffr

Site Reliability Engineer

1 day ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time $105,000 - $170,000 per year

Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

23 hours ago

(s): Canada : Ontario : Toronto Scotiabank Global Site Full time US$80,000 - US$140,000 per year

Requisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
Site Reliability Engineer

3 days ago

Toronto, Ontario, Canada Procom Full time $80,000 - $120,000 per year

Site Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...
Site Reliability Engineer

4 weeks ago

Toronto, Canada None Full time

Job Title: Site Reliability Engineer (Python & Cloud)Location: Toronto, ONDuration: 6 months with high possibility of extensionSkills Required:Digital: PythonDigital: Google CloudDigital: Site Reliability Engineering (SRE)Job Description:Gather and analyze metrics from operating systems as well as applications to assist in performance tuning and fault...
Site Reliability Engineer

4 days ago

Toronto, Canada Maneva Full time

About ManevaManeva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs....
Site Reliability Engineer

4 days ago

Toronto, Canada Maneva Full time

About Maneva Maneva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs....
Site Reliability Engineer

3 days ago

Toronto, Canada Maneva Full time

About Maneva Maneva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs....
Site Reliability Engineer

2 days ago

Toronto, Canada Maneva Full time

About Maneva Maneva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs....
Site Reliability Engineer

2 days ago

Toronto, Canada Maneva Full time

About Maneva Maneva builds and deploys edge AI solutions powering real-time intelligence for industrial environments. Our systems run on distributed edge compute devices (NVIDIA Jetson platforms), integrate with local network cameras, PLCs, sensors, and other on-premise equipment, and securely communicate with cloud services via client- or site-based VPNs....
Site Reliability Engineer

4 weeks ago

Toronto, Canada Aarorn Technologies Inc Full time

Overview Job Title: Site Reliability Engineer Location: Toronto, ON (Hybrid - 4x Onsite a Week) Employment Type: Contract Opportunity Interview Type: Face to Face (Onsite Interview Only) Base pay range: CA$45.00/hr - CA$55.00/hr This opportunity is with Aarorn Technologies Inc. Your actual pay will be based on your skills and experience — talk with your...

Americas

Europe

Asia / Oceania

Africa

Site Reliability Engineer