Senior Site Reliability Engineer

3 weeks ago

Mississauga, Canada RBC Full time

Overview Job Description – RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. This team plays a critical role in ensuring the seamless operations of digital services for internal and external stakeholders. As a Senior Site Reliability Engineer, you will bring an engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role requires collaboration with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation. You will develop, implement, and support SRE solutions for applications supported by RBC Insurance Technology, leveraging tools such as Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling. What will you do? Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self-healing, reliability testing). Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health. Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks. Develop and maintain custom tooling and automation scripts in Bash, Python, and PowerShell to enhance operational efficiency and system reliability. Work closely with development teams to understand code changes and their impact on production, ensuring releases meet reliability standards. Contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining alerting and monitoring strategies. Document and maintain runbooks to facilitate quick incident resolution and reduce MTTR. Create and refine custom tooling and automation scripts to support infrastructure scalability and reliability needs. Guide the technical direction for future deployments, advocating for reliability and performance improvements. Mentor team members in monitoring and alerting strategies based on SLIs/SLOs. Act as portfolio SME – understand and document common components, core functionalities, and infrastructure of supported applications. Lead incident management and problem management for applications in scope, including RCA action items. Drive transformation by continually seeking opportunities to automate existing processes. Debug production issues across services and levels of the stack and provide primary operational support. Perform production support duties, including off-hours support as part of an on-call rotation. Must-have 4+ years of SRE or Systems Engineering experience with a proven record in technical leadership. Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Expertise in infrastructure-as-code and configuration management, particularly Ansible. Advanced scripting capabilities in Bash, Python, PowerShell, or similar. In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability. Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics. Nice-to-have Insurance industry experience. Hands-on experience with SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana). Familiarity with containerization technologies such as Docker. Hands-on experience with DevOps CI/CD tools e.g. Jenkins, Artifactory and Vault. Soft Skills Excellent communication skills to foster collaboration across departments. A resilient problem-solving approach, capable of leading during high-stress incidents. Strategic thinking and analytical skills, focusing on reliable and performant systems. Organizational skills to manage multiple priorities in a fast-paced environment. What’s in it for you? We thrive on the challenge to be our best, with progressive thinking to grow, and work together to deliver trusted advice. We care about each other, reaching our potential, making a difference to our communities, and achieving mutual success. A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable Leaders who support your development through coaching and opportunities Ability to make a difference and lasting impact Work in a dynamic, collaborative, progressive, and high-performing team A world-class training program in financial services Flexible work/life balance options Opportunities to do challenging work Additional Job Details Address: Meadowvale Business Park, 6880 Financial Dr, Mississauga, Canada City: Mississauga Country: Canada Work hours/week: 37.5 Employment Type: Full time Platform: TECHNOLOGY AND OPERATIONS Job Type: Regular Pay Type: Salaried Posted Date: Application Deadline: Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above Inclusion and Equal Opportunity Employment At RBC, we believe an inclusive workplace that has diverse perspectives is core to our growth. We strive to deliver a workplace based on respect, belonging and opportunity for all. Join our Talent Community Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you. Expand your limits and create a new future together at RBC. #J-18808-Ljbffr

Site Reliability Engineer

3 weeks ago

Mississauga, Canada J&M Group Full time

Join to apply for the Site Reliability Engineer role at J&M Group . Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...
Senior Site Reliability Engineer

7 hours ago

Mississauga, Canada Canonical Full time

Senior Site Reliability Engineer Join Canonical as a Senior Site Reliability Engineer and help move the world to open source. Canonical is a leading provider of open source software and operating systems. Our platform, Ubuntu, is widely used in public cloud, data science, AI, engineering innovation and IoT, and is trusted by leading cloud and silicon...
Site Reliability Engineer

7 hours ago

Mississauga, Canada J&M Group Full time

Join to apply for the Site Reliability Engineer role at J&M Group.Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...
Senior Site Reliability Engineer

3 weeks ago

Mississauga, Canada RBC Full time

OverviewJob Description – RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of...
Site Reliability Engineer

4 weeks ago

Mississauga, Canada Canonical Full time

Site Reliability Engineer at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading...
Site Reliability Engineer

7 hours ago

Mississauga, Canada Canonical Full time

Site Reliability Engineer at Canonical Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading...
Site Reliability

7 hours ago

Mississauga, Canada Canonical Full time

1 day ago Be among the first 25 applicants Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world’s...
Senior Site Reliability Administrator

6 hours ago

Mississauga, Canada OpenText Full time

Opentext - The Information Company OpenText is a global leader in information management, where innovation, creativity, and collaboration are the key components of our corporate culture. As a member of our team, you will have the opportunity to partner with the most highly regarded companies in the world, tackle complex issues, and contribute to projects...
Senior SRE: Reliability, Automation

3 weeks ago

Mississauga, Canada RBC Full time

A leading financial services provider in Canada is seeking a Senior Site Reliability Engineer for their Insurance Technology team. This role focuses on ensuring system performance and reliability through collaboration, automation, and technical leadership. Candidates should have a bachelor's degree and 4+ years of relevant experience, showcasing expertise in...
Senior SRE: Reliability, Automation

3 weeks ago

Mississauga, Canada RBC Full time

A leading financial services provider in Canada is seeking a Senior Site Reliability Engineer for their Insurance Technology team. This role focuses on ensuring system performance and reliability through collaboration, automation, and technical leadership. Candidates should have a bachelor's degree and 4+ years of relevant experience, showcasing expertise in...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer