Senior Site Reliability Engineer

5 days ago

Toronto, Ontario, Canada RBC Full time

Job Summary

Job Description

What is the Opportunity?

RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. With a unique blend of technical expertise and industry-specific knowledge, this team plays a critical role in ensuring the seamless operations of digital services that cater to both the business's internal and external stakeholders.

As a Senior Site Reliability Engineer, you will bring the engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation. This role will be responsible for the development, implementation, and support of Site Reliability Engineering (SRE) solutions for applications supported by RBC Insurance Technology. You'll leverage your proficiency in Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling.

What will you do?

Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self-healing, reliability testing)
Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health.
Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks.
Develop and maintain custom tooling and automation scripts in languages like Bash, Python, and PowerShell to enhance operational efficiency and system reliability.
Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards.
Actively contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining our alerting and monitoring strategies accordingly.
Document and maintain comprehensive runbooks, facilitating quick resolution of incidents and reducing mean time to recovery (MTTR).
Create and refine custom tooling and automation scripts using languages such as Bash, Python, and PowerShell, supporting the infrastructure's scalability and reliability needs.
Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives.
Mentor team members in building out robust monitoring and alerting strategies based on well-defined SLIs and SLOs.
Act as portfolio SME (Subject Matter Expert) – understand & document common components, core functionalities, infrastructure of supported applications.
Lead in incident management and problem management for applications in scope and RCA Action items fulfillment/ownership.
Drive transformation by continuously looking for ways to automate existing processes.
Debug production issues across services and levels of the stack and provide primary operational support.
Perform production support role, including off-hours support (As part of an on-call rotation).

Must-have:

4+ years of SRE or Systems Engineering experience with a proven record in technical leadership.
Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent experience.
Expertise in infrastructure-as-code and configuration management, particularly Ansible.
Advanced scripting capabilities in Bash, Python, PowerShell, or other similar languages.
In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability.
Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics.

Nice-to-have:

Insurance industry experience.
In-depth hands-on experience in a variety of SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana).
Familiarity with containerization technologies such as Docker.
Hands-on experience with DevOps CI-CD tools e.g. Jenkins, Artifactory and Vault.

Soft Skills:

Excellent communication skills to foster collaboration across departments.
A resilient problem-solving approach, capable of leading the charge during high-stress incidents.
Strategic thinking and analytical prowess, with a focus on delivering reliable and performant systems.
Organizational skills to manage multiple priorities in a fast-paced environment.

RBC is committed to supporting flexible work arrangements when and where available. Details to be discussed with Hiring Manager.

What's in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable.
Leaders who support your development through coaching and managing opportunities.
Ability to make a difference and lasting impact.
Work in a dynamic, collaborative, progressive, and high-performing team.
A world-class training program in financial services.
Flexible work/life balance options.
Opportunities to do challenging work.

#LI-Hybrid

#LI-POST

#TECHPJ

Job Skills

Agile Methodology, Application Infrastructure, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, Production Support, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software.

Additional Job Details

Address: MEADOWVALE BUSINESS PARK, 6880 FINANCIAL DR:MISSISSAUGA

City: MISSISSAUGA

Country: Canada

Work hours/week: 37.5

Employment Type: Full time

Platform: TECHNOLOGY AND OPERATIONS

Job Type: Regular

Pay Type: Salaried

Posted Date: 2025-03-07

Application Deadline: 2025-03-31

Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above.

Inclusion and Equal Opportunity Employment

At RBC, we embrace diversity and inclusion for innovation and growth. We are committed to building inclusive teams and an equitable workplace for our employees to bring their true selves to work. We are taking actions to tackle issues of inequity and systemic bias to support our diverse talent, clients and communities.

We also strive to provide an accessible candidate experience for our prospective employees with different abilities. Please let us know if you need any accommodations during the recruitment process.

Join our Talent Community

Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.

Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at jobs.rbc.com.

#J-18808-Ljbffr

Senior Site Reliability Engineer

4 days ago

Toronto, Ontario, Canada Northbridge Financial Full time

What is it like to be a senior Site Reliability Engineer at Northbridge FinancialThe Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and is responsible for mentoring and leading less experienced SREs.We...
Senior Site Reliability Engineer

3 weeks ago

Toronto, Ontario, Canada Northbridge Financial Full time

What is it like to be a senior Site Reliability Engineer at Northbridge FinancialThe Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and is responsible for mentoring and leading less experienced SREs.We...
Senior Site Reliability Engineer

4 weeks ago

Toronto, Ontario, Canada Randstad Digital Full time

Senior Site Reliability Engineer - Establish and SRE Practice (Contract Position)Number of Positions: 1 Filled: 0 Duration: 6 monthsLocation: Toronto, ON, CAMust be eligible to work in CanadaHybrid position, 2-3d/month onsite in Toronto mandatoryRoles and responsibilities:The consultant will be building and SRE practice from the ground up. He/she would have...
Senior Site Reliability Engineer

4 weeks ago

Toronto, Ontario, Canada Randstad Digital Full time

Senior Site Reliability Engineer - Establish and SRE Practice (Contract Position) Number of Positions: 1 Filled: 0 Duration: 6 months Location: Toronto, ON, CA Must be eligible to work in Canada Hybrid position, 2-3d/month onsite in Toronto mandatory Roles and responsibilities: The consultant will be building and SRE practice from the ground up....
Senior Site Reliability Engineer

4 weeks ago

Toronto, Ontario, Canada Gotvantage Full time

Are you passionate about ensuring the seamless operation of large-scale, distributed, and robust systems? Do you thrive on optimizing performance, increasing reliability, and automating tasks to create more efficient processes? Are you hungry for learning? If so, we would want to chat with youAs a Senior Site Reliability Engineer (SRE) / DevOps Engineer at...
Senior Site Reliability Engineer

5 days ago

Toronto, Ontario, Canada Black Ties Group Inc. Full time

We are looking for a Senior Site Reliability Engineer to join our growing Platform Infrastructure group, Site Reliability Engineering team Reporting to the Engineering Manager - Infrastructure, you'll apply your technical and domain expertise to solve complex technical and business challenges; respond to and assist with production incidents in collaboration...
Senior Site Reliability Engineer

4 weeks ago

Toronto, Ontario, Canada Thomson Reuters Full time

Senior Site Reliability Engineer, ONESOURCE Indirect TaxThomson Reuters ONESOURCE Indirect Tax's SRE team is looking for a Senior Site Reliability Engineer who will provide hands-on technical skills and share industry best practices with other team members on core SRE principles and tools. The Site Reliability Engineer will participate in end-to-end...
Senior Site Reliability Engineer

4 weeks ago

Toronto, Ontario, Canada Stacktics Inc. Full time

As a Senior Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services.As a senior DevOps team member, you will play an integral role in designing, optimizing, documenting, and...
Senior Site Reliability Engineer

2 days ago

Toronto, Ontario, Canada Stacktics Inc. Full time

As a Senior Site Reliability Engineer (GCP) you will play a key role at Stacktics Inc., where we design, create, deploy, maintain and grow industry-leading Cloud Infrastructure, Big Data Analytics and Cloud For Marketing products, solutions and services.As a senior DevOps team member, you will play an integral role in designing, optimizing, documenting, and...
Senior Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada Thomson Reuters Full time

Description Thomson Reuters is seeking a Senior Site Reliability Engineer to join our Service Management, Technology team. This role calls for an individual who is capable of analyzing customer problems of high complexity and assessing the scope of impact, while mitigating customer impact of issues and executing work arounds. Willingness to learn is an...
Site Reliability Engineer

4 days ago

Toronto, Ontario, Canada LanceSoft Inc Full time

Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...
Site Reliability Engineer

4 days ago

Toronto, Ontario, Canada LanceSoft Inc Full time

Description:Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...
Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada LanceSoft Inc Full time

Description:Business group: Client Engineering - Mobile and Web - Digital Engineering Operations part of Digital Banking, supporting mobile and web developmentProject: SRE (Site Reliability Engineering) work for Scotia Digital projects – working on online and mobile banking; back-end development; maintain reliability of applications and production...
Site Reliability Engineering Lead

4 days ago

Toronto, Ontario, Canada Scotiabank Full time

Job DescriptionThis is a senior role responsible for ensuring the reliability, scalability, and performance of critical applications and infrastructure at Scotiabank.The ideal candidate will have strong knowledge of Site Reliability Engineering practices, expertise in DevOps, and experience developing and supporting large-scale on-premises systems and...
Site Reliability Engineering Manager

4 days ago

Toronto, Ontario, Canada TechAlliance of Southwestern Ontario, London Economic Development Corporation Full time

About TechAlliance of Southwestern Ontario, London Economic Development CorporationWe are a leading organization in the field of technology and economic development, dedicated to driving innovation and growth in the region.Job SummaryThe Senior Technical Lead - Site Reliability & Operations will lead our site reliability and operations efforts, requiring a...
Senior Site Reliability Engineer

3 weeks ago

Toronto, Ontario, Canada LivePerson Full time

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world's leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set...
Senior Site Reliability Engineer

1 week ago

Toronto, Ontario, Canada LivePerson Full time

LivePerson (NASDAQ: LPSN) is the global leader in enterprise conversations. Hundreds of the world's leading brands — including HSBC, Chipotle, and Virgin Media — use our award-winning Conversational Cloud platform to connect with millions of consumers. We power nearly a billion conversational interactions every month, providing a uniquely rich data set...
Senior Site Reliability Engineer- Fleet

2 weeks ago

Toronto, Ontario, Canada Cisco Systems, Inc. Full time

Cisco Meraki, a division of Cisco Networking, is a cloud-managed IT company and leader in cloud-controlled Wi-Fi, routing, and security. Our intuitive platform enables organizations of all sizes to deliver customer and employee experiences at scale. To provide best-in-class technologies to our customers, we've created an unrivaled company culture for our...
Site Reliability Engineer Leader

1 week ago

Toronto, Ontario, Canada TechAlliance of Southwestern Ontario, London Economic Development Corporation Full time

About This Role:This position plays a key role in leading site reliability and operations for Integration platforms. As a Senior Technical Lead - Site Reliability & Operations, you will oversee the availability, reliability, security, and sustainability of the platform. Your primary responsibilities include collaborating with product management and...
Manager, Site Reliability Engineering

4 weeks ago

Toronto, Ontario, Canada Axon Full time

Your ImpactYou are an SRE Engineering Manager with experience managing the operations and uptime of large-scale platforms. You have a deep interest in Kubernetes and cloud-native technologies. You are excited about the care, feeding, and growth of high-availability, scalable cloud-based platforms.You want to lead a team of SRE experts in delivering solutions...

Americas

Europe

Asia / Oceania

Africa

Senior Site Reliability Engineer