Senior Site Reliability Engineer

2 days ago


Mississauga, Canada RBC Full time

Join to apply for the Senior Site Reliability Engineer role at RBC What is the Opportunity? RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. With a unique blend of technical expertise and industry‑specific knowledge, this team plays a critical role in ensuring the seamless operations of digital services that cater to both the business's internal and external stakeholders. Job Description As a Senior Site Reliability Engineer, you will bring an engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role calls for a dynamic individual who excels in a collaborative environment, interacting with cross‑functional teams to establish best practices for observability, monitoring, logging, alerting and automation. You will be responsible for the development, implementation and support of SRE solutions for applications supported by RBC Insurance Technology, leveraging proficiency in Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling. What will you do? Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self‑healing, reliability testing) Lead cross‑functional collaborations to define and implement best practices for monitoring, logging and incident response, driving a proactive stance on system health. Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks. Develop and maintain custom tooling and automation scripts in languages like Bash, Python and PowerShell to enhance operational efficiency and system reliability. Work closely with development teams to understand code changes and their impact on the production environment, ensuring that new releases meet our reliability standards. Actively contribute to the definition and tracking of SLIs, SLOs and other critical metrics, refining our alerting and monitoring strategies accordingly. Document and maintain comprehensive runbooks, facilitating quick resolution of incidents and reducing mean time to recovery (MTTR). Guide the technical direction for future deployments, advocating for reliability and performance improvements based on industry trends and company objectives. Mentor team members in building out robust monitoring and alerting strategies based on well‑defined SLIs and SLOs. Act as portfolio SME – understand and document common components, core functionalities and infrastructure of supported applications. Lead in incident management and problem management for applications in scope, overseeing RCA action items fulfillment and ownership. Drive transformation by continuously looking for ways to automate existing processes. Debug production issues across services and levels of the stack and provide primary operational support. Perform production support role, including off‑hours support as part of an on‑call rotation. Must‑Have 4+ years of SRE or Systems Engineering experience with a proven record in technical leadership. Bachelor’s degree in Computer Science, Engineering or a related field, or equivalent experience. Expertise in infrastructure‑as‑code and configuration management, particularly Ansible. Advanced scripting capabilities in Bash, Python, PowerShell or other similar languages. In‑depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka and their role in system reliability. Knowledge of creating, maintaining and alerting on SLIs, SLOs and other reliability metrics. Nice‑to‑Have Insurance industry experience. Hands‑on experience in a variety of SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana). Familiarity with containerisation technologies such as Docker. Hands‑on experience with DevOps CI‑CD tools e.g. Jenkins, Artifactory and Vault. Soft Skills Excellent communication skills to foster collaboration across departments. A resilient problem‑solving approach, capable of leading the charge during high‑stress incidents. Strategic thinking and analytical prowess, with a focus on delivering reliable and performant systems. Organisational skills to manage multiple priorities in a fast‑paced environment. What’s in it for you? A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions and stock where applicable. Leaders who support your development through coaching and managing opportunities. Ability to make a difference and lasting impact. Work in a dynamic, collaborative, progressive and high‑performing team. A world‑class training programme in financial services. Flexible work/life balance options. Opportunities to do challenging work. Job Skills Agile Methodology, Application Infrastructure, Group Problem Solving, IT Automation, IT Monitoring, Operations Support, Production Support, Software Development Life Cycle (SDLC), Software Engineering, Software Product Technical Knowledge, System Applications, Systems Software Additional Job Details Address: MEADOWVALE BUSINESS PARK, 6880 FINANCIAL DR, MISSISSAUGA City: Mississauga Country: Canada Work Hours/Week: 37.5 Employment Type: Full time Platform: Technology and Operations Job Type: Regular Pay Type: Salaried Posted Date: Application Deadline: (Applications accepted until 11:59 PM on the day prior to the application deadline) Inclusion and Equal Opportunity Employment At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programmes intended to foster a workplace based on respect, belonging and opportunity for all. Join our Talent Community Stay in‑the‑know about great career opportunities at RBC. Sign up and receive customised information on our latest jobs, career tips and recruitment events that matter to you. Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well‑being of our clients and communities at jobs.rbc.com #J-18808-Ljbffr



  • Mississauga, Canada J&M Group Full time

    Join to apply for the Site Reliability Engineer role at J&M Group . Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...


  • Mississauga, Canada J&M Group Full time

    Join to apply for the Site Reliability Engineer role at J&M Group.Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...


  • Mississauga, Canada RBC Full time

    Join to apply for the Senior Site Reliability Engineer role at RBC What is the Opportunity? RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and...


  • Mississauga, Canada Royal Bank of Canada> Full time

    Job DescriptionWhat is the Opportunity? RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance...


  • Mississauga, Canada Groupe Compass Quebec ltée. Full time

    Join to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée. 6 days ago Be among the first 25 applicants Join to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée. Join an award-winning culture. We have been recognized for being a Great Place to Work, in addition to being selected as a FORTUNE Global 500...


  • Mississauga, Canada Groupe Compass Quebec ltée. Full time

    Join to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée.6 days ago Be among the first 25 applicantsJoin to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée.Join an award-winning culture. We have been recognized for being a Great Place to Work, in addition to being selected as a FORTUNE Global 500...


  • Mississauga, Canada RBC Full time

    A leading financial institution in Mississauga is seeking a Senior Site Reliability Engineer to join their Insurance Technology Platform Support team. The successful candidate will ensure the performance and reliability of IT applications, collaborating with cross-functional teams to implement best practices for monitoring and incident management. With a...


  • Mississauga, Canada RBC Full time

    A leading financial institution in Mississauga is seeking a Senior Site Reliability Engineer to join their Insurance Technology Platform Support team. The successful candidate will ensure the performance and reliability of IT applications, collaborating with cross-functional teams to implement best practices for monitoring and incident management. With a...


  • Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full time

    Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote [Working EST hours] Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...


  • Mississauga, Canada Canonical Full time

    OverviewJoin to apply for the Senior Site Reliability Engineer role at Canonical.Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in enterprise initiatives across public cloud, data science, AI, engineering, and IoT. We recruit on a global basis...