Lead Site Reliability Engineer

2 weeks ago


Toronto ON MW A, Canada RBC Full time

Job Description

What is the opportunity?

Join our Commercial, Core Banking and Payments Technology (CCBPT) team as a Lead Site Reliability Engineer, where you'll play a key role in supporting our cloud and distributed environments for the SRE & Production Operations team. This exciting opportunity will challenge you to work with cutting-edge technologies, including AI and emerging innovations, and collaborate closely with development teams to deliver embedded SRE solutions. As a vital link between QE, DevOps, Development, Infrastructure, and Support teams, you'll leverage your strong technical skills to solve complex problems and drive success across multiple components and technologies. If you're passionate about tackling new challenges and developing innovative solutions, we invite you to join our team and take your career to the next level.

What will you do?

  • Manage a team of SREs
  • Automate, automate and automate – Identify, design, write and automation procedures using AI, Ansible and other relevant technologies
  • Support applications running on multiple platforms including OpenShift and distributed systems
  • Design and implement Chaos Engineering experiments and Disaster Recovery procedures to test and validate system resilience and reliability
  • Establishing and monitoring SLO and supporting SLIs for various applications
  • Responsible for developing and establishing observability strategies for applications
  • Build and implement monitoring and alerting, anomaly detection, self-healing and reliability testing for applications in scope
  • Provide leadership and technical support for developers and DevOps engineers
  • Support incident management and problem management for applications in scope and RCA Action items fulfillment/ownership
  • Be an escalation point in the on-call rotation, and support our maintenance, scheduled work, support, and release deployment requirements

What do you need to succeed?

Must-have

  • 7+ years of experience as Site Reliability Engineer
  • A Bachelor's degree in Computer Science or related technical field or equivalent practical experience
  • Strong Kubernetes and Cloud working knowledge with experience and understanding of CICD pipeline and DevOps / Agile Methodology
  • Advanced knowledge of the following SRE practices and technologies: Shell scripting, OpenShift, Linux, Dynatrace, PagerDuty, Moog, Splunk, Elastic, Ansible, Grafana, Chaos Engineering, MQ, Kafka, Windows Servers, MS SQL Server, Mainframe technologies.
  • Perform production support role, including off-hours support
  • Effective negotiation skills, and stakeholder management
  • Excellent communication skills

Nice-to-have

  • Strong knowledge in AI and building AI-based solutions
  • Knowledge of deploying and supporting distributed applications
  • In-depth hands-on experience in a variety of SRE tools (Ansible, Catchpoint)
  • Experience working as an SRE within the Financial Industry

What's in it for you?

We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.

  • A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable
  • Leaders who support your development through coaching and managing opportunities
  • Work in a dynamic, collaborative, progressive, and high-performing team
  • Opportunities to do challenging work in AI and emerging technologies
  • Opportunities to take on progressively greater accountabilities
  • Access to a variety of job opportunities across business and geographies
TECHPJ
Ll-POST

Job Skills

Agile Methodology, Automation, Cloud Management, Cloud Software, Dynatrace Administration, Dynatrace APM, Group Problem Solving, IT Automation, IT Systems Integration, Mainframe Technologies, Microsoft Cloud, Microsoft Windows, Organizational Leadership, Product Services, Red Hat OpenShift, Software Development Life Cycle (SDLC), SRE Observability, System Applications, System Integration Testing (SIT), Systems Software

Additional Job Details

Address:

RBC WATERPARK PLACE, 88 QUEENS QUAY W:TORONTO

City:

Toronto

Country:

Canada

Work hours/week:

37.5

Employment Type:

Full time

Platform:

TECHNOLOGY AND OPERATIONS

Job Type:

Regular

Pay Type:

Salaried

Posted Date:

Application Deadline:

Note: Applications will be accepted until 11:59 PM on the day prior to the application deadline date above

Inclusion and Equal Opportunity Employment

At RBC, we believe an inclusive workplace that has diverse perspectives is core to our continued growth as one of the largest and most successful banks in the world. Maintaining a workplace where our employees feel supported to perform at their best, effectively collaborate, drive innovation, and grow professionally helps to bring our Purpose to life and create value for our clients and communities. RBC strives to deliver this through policies and programs intended to foster a workplace based on respect, belonging and opportunity for all.

Join our Talent Community

Stay in-the-know about great career opportunities at RBC. Sign up and get customized info on our latest jobs, career tips and Recruitment events that matter to you.

Expand your limits and create a new future together at RBC. Find out how we use our passion and drive to enhance the well-being of our clients and communities at

RBC is presently inviting candidates to apply for this existing vacancy. Applying to this posting allows you to express your interest in this current career opportunity at RBC. Qualified applicants may be contacted to review their resume in more detail.



  • Toronto, ON MW A, Canada RBC Full time

    Job DescriptionWhat is the opportunity?Join RBC as a Lead Site Reliability Engineer and take the lead in ensuring the reliability, scalability, and performance of our critical production systems and infrastructure. This is your chance to drive innovation through cutting-edge engineering practices, automation, and process optimization. Collaborate with...


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 245210Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The TeamGlobal Banking and Markets Engineering (GBME) is the fast-moving, award-winning technology engine that powers Scotiabank's Corporate, Investment Banking and Capital Markets businesses.The RoleGBME is searching for a Site...


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...


  • Toronto, ON MW A, Canada RBC Full time

    Job DescriptionWhat is the opportunity?This is an exciting opportunity to join a high-impact team responsible for ensuring the reliability, scalability, and performance of critical ATM production systems. As a Senior Service Reliability Engineer, you will play a pivotal role in shaping the future of our ATM services by driving innovation, implementing...


  • Toronto, Canada Kyndryl Full time

    Join to apply for the Site Reliability Engineer role at Kyndryl. Direct message the job poster from Kyndryl. Recruitment & Strategic Staffing @Kyndryl | Partnering with IT Consultants in Financial Services & Technology Position: Site Reliability Engineer Client: Financial Services - Capital Markets Technology Duration: 12-month contract with potential...


  • Toronto, Canada Scotiabank Full time

    A leading financial institution in Toronto is seeking a Site Reliability Engineer to ensure the operations of digital applications. You will drive operational efficiency, manage incidents, and enhance application reliability. The ideal candidate has a strong background in Java, Spring Boot, and DevOps practices, with at least 2 years in an SRE role....


  • Toronto, Canada Scotiabank Full time

    A leading financial institution in Toronto is seeking a Site Reliability Engineer to ensure the operations of digital applications. You will drive operational efficiency, manage incidents, and enhance application reliability. The ideal candidate has a strong background in Java, Spring Boot, and DevOps practices, with at least 2 years in an SRE role....


  • Toronto, Canada Scotiabank Full time

    A leading financial institution in Toronto is seeking a Site Reliability Engineer to ensure the operations of digital applications. You will drive operational efficiency, manage incidents, and enhance application reliability. The ideal candidate has a strong background in Java, Spring Boot, and DevOps practices, with at least 2 years in an SRE role....


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 247129Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.As a SRE, you will implement, measure and gather insights from Operational Level Indicators identifying areas for service improvements covering availability, performance, resilience, incidents and chronic problems. You will implement...


  • Toronto, Ontario, Canada Aarorn Technologies Inc Full time

    Job Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...