Senior Site Reliability Engineer
3 days ago
Overview Job Description – RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of business. This team plays a critical role in ensuring the seamless operations of digital services for internal and external stakeholders. As a Senior Site Reliability Engineer, you will bring an engineering mindset of bold ambition, curiosity and outcome focus to ensuring the performance and reliability of our systems. This role requires collaboration with cross-functional teams to establish best practices for observability, monitoring, logging, alerting, and automation. You will develop, implement, and support SRE solutions for applications supported by RBC Insurance Technology, leveraging tools such as Elasticsearch, Ansible, GitHub Actions, Moogsoft, PagerDuty, Dynatrace and scripting languages to build and maintain robust automation and SRE tooling. What will you do? Set vision for SRE product base (monitoring, alerting, machine learning anomaly detection, self-healing, reliability testing). Lead cross-functional collaborations to define and implement best practices for monitoring, logging, and incident response, driving a proactive stance on system health. Implement and manage automation processes with Ansible and GitHub Actions to streamline operational tasks. Develop and maintain custom tooling and automation scripts in Bash, Python, and PowerShell to enhance operational efficiency and system reliability. Work closely with development teams to understand code changes and their impact on production, ensuring releases meet reliability standards. Contribute to the definition and tracking of SLIs, SLOs, and other critical metrics, refining alerting and monitoring strategies. Document and maintain runbooks to facilitate quick incident resolution and reduce MTTR. Create and refine custom tooling and automation scripts to support infrastructure scalability and reliability needs. Guide the technical direction for future deployments, advocating for reliability and performance improvements. Mentor team members in monitoring and alerting strategies based on SLIs/SLOs. Act as portfolio SME – understand and document common components, core functionalities, and infrastructure of supported applications. Lead incident management and problem management for applications in scope, including RCA action items. Drive transformation by continually seeking opportunities to automate existing processes. Debug production issues across services and levels of the stack and provide primary operational support. Perform production support duties, including off-hours support as part of an on-call rotation. Must-have 4+ years of SRE or Systems Engineering experience with a proven record in technical leadership. Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience. Expertise in infrastructure-as-code and configuration management, particularly Ansible. Advanced scripting capabilities in Bash, Python, PowerShell, or similar. In-depth knowledge of tools such as Elasticsearch, Ansible, GitHub, OpenShift, Kubernetes, Dynatrace, Kafka, and their role in system reliability. Knowledge of creating, maintaining, and alerting on SLIs, SLOs, and other reliability metrics. Nice-to-have Insurance industry experience. Hands-on experience with SRE tools (Azure Automation, Catchpoint, Prometheus, Splunk, Grafana). Familiarity with containerization technologies such as Docker. Hands-on experience with DevOps CI/CD tools e.g. Jenkins, Artifactory and Vault. Soft Skills Excellent communication skills to foster collaboration across departments. A resilient problem-solving approach, capable of leading during high-stress incidents. Strategic thinking and analytical skills, focusing on reliable and performant systems. Organizational skills to manage multiple priorities in a fast-paced environment. What’s in it for you? We thrive on the challenge to be our best, with progressive thinking to grow, and work together to deliver trusted advice. We care about each other, reaching our potential, making a difference to our communities, and achieving mutual success. A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable Leaders who support your development through coaching and opportunities Ability to make a difference and lasting impact Work in a dynamic, collaborative, progressive, and high-performing team A world-class training program in financial services Flexible work/life balance options Opportunities to do challenging work Additional Job Details Address: Meadowvale Business Park, 6880 Financial Dr, Mississauga, Canada City: Mississauga Country: Canada Work hours/week: 37.5 Employment Type: Full time Platform: TECHNOLOGY AND OPERATIONS Job Type: Regular Pay Type: Salaried Posted Date: Application Deadline: Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above Inclusion and Equal Opportunity Employment At RBC, we believe an inclusive workplace that has diverse perspectives is core to our growth. We strive to deliver a workplace based on respect, belonging and opportunity for all. Join our Talent Community Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you. Expand your limits and create a new future together at RBC. #J-18808-Ljbffr
-
Site Reliability Engineer
3 days ago
Mississauga, Canada J&M Group Full timeJoin to apply for the Site Reliability Engineer role at J&M Group . Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...
-
Senior Site Reliability Engineer
2 weeks ago
Mississauga, Canada Canonical Full timeSenior Site Reliability Engineer Join Canonical as a Senior Site Reliability Engineer and help move the world to open source. Canonical is a leading provider of open source software and operating systems. Our platform, Ubuntu, is widely used in public cloud, data science, AI, engineering innovation and IoT, and is trusted by leading cloud and silicon...
-
Senior Site Reliability Engineer
4 weeks ago
Mississauga, Canada Canonical Full timeSenior Site Reliability Engineer Join Canonical as a Senior Site Reliability Engineer and help move the world to open source. Canonical is a leading provider of open source software and operating systems. Our platform, Ubuntu, is widely used in public cloud, data science, AI, engineering innovation and IoT, and is trusted by leading cloud and silicon...
-
Site Reliability Engineer
5 days ago
Mississauga, Canada J&M Group Full timeJoin to apply for the Site Reliability Engineer role at J&M Group.Requirements include hands-on experience with technologies such as Nifi, Kubernetes, Elasticsearch, Kafka, basic understanding of LINUX and UNIX servers, Shell scripting, good SQL experience, and basic knowledge of Java, Python, Groovy. A basic understanding of the Capital Market is also...
-
Senior Site Reliability Engineer
3 days ago
Mississauga, Canada RBC Full timeOverviewJob Description – RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and resilience of IT applications used in the insurance line of...
-
Site Reliability Engineer
2 weeks ago
Mississauga, Canada Groupe Compass Quebec ltée. Full timeJoin to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée. As a Reliability Engineer you will work in focus areas such as observability, release automation, incident and problem response improvements, security, code quality, patch management and SRE advocacy. You will have the opportunity to use the latest cloud and open-source...
-
Site Reliability Engineer
2 weeks ago
Mississauga, Canada Groupe Compass Quebec ltée. Full timeJoin to apply for the Site Reliability Engineer role at Groupe Compass Quebec ltée. As a Reliability Engineer you will work in focus areas such as observability, release automation, incident and problem response improvements, security, code quality, patch management and SRE advocacy. You will have the opportunity to use the latest cloud and open-source...
-
Site Reliability Engineer
2 weeks ago
Mississauga, Canada COMPASS GROUP CANADA Full timeYou might not know our name, but you know where we are. That’s because Compass Group Canada is part of a global foodservice and support services company that’s the 6th largest employer in the world, with 625,000 employees.You’ll find us in schools, colleges, hospitals, office buildings, senior living communities, tourist attractions, sports venues,...
-
Site Reliability Engineer
3 weeks ago
Mississauga, Canada COMPASS GROUP CANADA Full timeYou might not know our name, but you know where we are. That’s because Compass Group Canada is part of a global foodservice and support services company that’s the 6th largest employer in the world, with 625,000 employees. You’ll find us in schools, colleges, hospitals, office buildings, senior living communities, tourist attractions, sports venues,...
-
Site Reliability Engineer
4 weeks ago
Mississauga, Canada COMPASS GROUP CANADA Full timeYou might not know our name, but you know where we are. That’s because Compass Group Canada is part of a global foodservice and support services company that’s the 6th largest employer in the world, with 625,000 employees. You’ll find us in schools, colleges, hospitals, office buildings, senior living communities, tourist attractions, sports venues,...