Director Site Reliability Engineering

3 weeks ago


Toronto, Canada BMO Full time
Application Deadline:

04/29/2024

Address:
33 Dundas Street West

This role is Hybrid (1-2 days per week in the office)

The Director - Site Reliability Engineering will lead a team that will work with application teams, infrastructure teams, and business partners to continuously improve the stability, reliability and efficiency of Finance and Enterprise Risk Management systems.

Responsibilities:
  • Work in collaboration with Application Engineering, Quality, Product and Data Engineering teams to Champion SRE/ DevOps culture and practices.
  • Develop and collaborate with a team of Reliability Engineers working closely with software development, Quality, Product and Data Engineering teams as a Champion of SRE/ DevOps culture and practices.
  • Champion SRE principles to address Mean Time to Resolve (MTTR) and Mean Time to Identify (MTTI) issues while maintaining Service Level Objectives (SLO), Service Level Agreements (SLA), and Service Level Indicators (SLI) enabling Observability across end-to-end components and harnessing Chaos Engineering Practices to develop highly available, resilient, and reliable applications and infrastructure that are ready for Production.
  • Contribute to management of Service Level Objectives with senior development and business leads.
  • Lead initiatives to continuously refine our build, plan and deploy practices for improved stability, reliability, efficiency, repeatability and security. You'll create plans, collaborate with other SROs and DevOps team members - coordinating activity with development and business leads to increase service levels, lower costs, and support delivery velocity objectives.
  • Working with application teams, implement, improve and coach service management best practices to improve overall service delivery.
  • Contribute to prioritization of reliability features and contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
  • Contribute to proactive technical communication of reliability, stability and efficiency results (based on Service Level Objectives), service health (via dashboards) key reliability risks and issues to senior business and technology stakeholders.
  • Manage a team of System Reliability Engineers who support Finance and ERPM Applications and Services.
  • Ensure solutions are automated where possible while improving operational efficiency, reducing operating risk, delivering quality services and optimizing cost.
  • Mentor and coach others within assigned area and transfers subject matter expertise to other Systems Reliability Engineers where appropriate.
  • Regularly connects work to BMO's purpose, sets inspirational goals, defines clear expected outcomes, and ensures clear accountability for follow through.
  • Builds interdependent teams that collaborate across functional and operating groups to create the highest value for all stakeholders.
Qualifications:
  • 15+ years of work experience in technology (specializing in SRE, DevOps, DevSecOps and cloud computing)
  • Proven experience managing large technology platforms of large scale and complexity.
  • Understand functional aspects and technical behavior of the underlying operating system, development environment, and deployment practices.
  • Strong analytical mindset and good communication skills
  • Expert in SRE approach and emerging SRE/Chaos Engineering practices (SLA, SLI, SLO, MTTI, MTTR)
  • Hands-on experience with DevOps CICD tools e.g. GitHub, Jenkins, Ansible, Urban Code Deploy
  • Hands-on Experience with Docker and/or Kubernetes
  • Hands-on Experience with Agile methodologies, e.g. Scrum, Kanban
  • Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles.
  • Drive alignment with, and improvement of, broader Enterprise services.
  • Apply SRE techniques to DevOps and Compute Services
  • Lead hands-on automation and elimination of manual Toil.
  • Coach application teams on how to leverage DevOps offerings and help drive productivity gains.
  • Partner on or lead new tool adoption. Recommend improvements to process.

Grade:
9
Job Category:
People Manager / Gestionnaire
We're here to help

At BMO we are driven by a shared Purpose: Boldly Grow the Good in business and life. It calls on us to create lasting, positive change for our customers, our communities and our people. By working together, innovating and pushing boundaries, we transform lives and businesses, and power economic growth around the world.

As a member of the BMO team you are valued, respected and heard, and you have more ways to grow and make an impact. We strive to help you make an impact from day one - for yourself and our customers. We'll support you with the tools and resources you need to reach new milestones, as you help our customers reach theirs. From in-depth training and coaching, to manager support and network-building opportunities, we'll help you gain valuable experience, and broaden your skillset.

BMO is committed to an inclusive, equitable and accessible workplace. By learning from each other's differences, we gain strength through our people and our perspectives. Accommodations are available on request for candidates taking part in all aspects of the selection process. To request accommodation, please contact your recruiter.

  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Toronto, ON, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Toronto, Ontario, Canada Zortech Solutions Full time

    Hi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...


  • Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean...


  • Toronto, ON, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management Organization Do you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure? The Site Reliability Engineer will analyze...


  • Old Toronto, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...


  • Old Toronto, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to Recovery...


  • Toronto, Canada eTeam Full time

    Remote work Duration - 4 months - Preference is to find candidates who are willing to be converted to full time employee . The conversion decision will be made based on performance. Job description - ::: Role Desc : Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey Designing for and implementing...


  • Toronto, ON, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Old Toronto, Canada eTeam Full time

    Remote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...


  • Toronto, ON, Canada Lightspeed Full time

    Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...


  • Old Toronto, Canada Lightspeed Full time

    Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...


  • Toronto, ON, Canada Akamai Full time

    Do you have a passion for cutting edge technologies and tackling system problems? Join our Site Reliability team. Our Team builds and delivers highly secure network security frameworks to protect our customers. We collaborate to create next-generation initiatives supporting automation, deployment, and monitoring of 3rd party cloud infrastructure. Help us...


  • Old Toronto, Canada Nityo Infotech Full time

    Job Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource utilization,...


  • Toronto, ON, Canada Nityo Infotech Full time

    Job Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...


  • Toronto, Canada Tata Consultancy Services Full time

    TCS is an equal opportunity employer, and embraces diversity in race, nationality, ethnicity, gender, age, physical ability, neurodiversity, and sexual orientation, to create a workforce that reflects the societies we operate in. Our continued commitment to Culture and Diversity and is reflected in our people stories across our workforce implemented through...