Manager, Site Reliability Engineering

1 month ago


Toronto, Canada The Home Depot Canada Full time

With a career at The Home Depot, you can be yourself and also be part of something bigger.
 

Position Overview:
 The Manager, SRE will lead a team of Site Reliability Engineers to ensure the reliability, performance, and operational support of our eCommerce systems, with a focus on Google Cloud Platform (GCP) environments. This role requires a strong background in reliability reviews, performance engineering practices, production engineering, and operational support, with emphasis on DevOps principles and GCP expertise.
Responsibilities:

Leadership & Management:Lead and mentor a team of Site Reliability EngineersFoster a culture of continuous improvement and innovationCollaborate with cross-functional teams to align SRE practices with business objectivesReliability & Performance:Conduct reliability reviews to identify areas for improvement and implement solutions to enhance system reliability, particularly in GCP environmentsImplement and promote performance engineering practices to ensure optimal system performance on GCPDevelop and maintain service level objectives (SLOs) and error budgetsProduction Engineering & Operational Support:Oversee production engineering efforts to ensure systems are designed for operational excellence and reliability, leveraging GCP services and best practicesManage incident response and post-incident reviews to minimize downtime and improve system resilienceImplement monitoring, alerting, and observability solutions to proactively identify and address issuesDevelop and maintain runbooks and playbooks for common operational tasks.Coordinate with security teams to ensure compliance with security policies and best practiceDevOps & Continuous Improvement:Drive DevOps initiatives to improve collaboration between development and operations teams, with a focus on GCP-native tools and servicesImplement and maintain CI/CD pipelines to streamline deployment processes in GCP environmentsIdentify and implement automation opportunities to reduce manual tasks and improve efficiencyPromote the use of Infrastructure as Code (IaC) to manage and provision cloud resources.Continuously evaluate and integrate new tools and technologies to enhance DevOps practicesRelease Management:Implement and maintain release management best practices to minimize disruptions and maximize system stabilityCollaborate with DevOps teams to integrate release management into CI/CD pipelinesOversee release schedules, ensuring minimal impact on business operationsEnsure there is a rigorous release readiness process in place that includes reviews and post-release retrospectivesMaintain a release calendar and communicate release plans to stakeholdersStrategic Planning:Create and maintain a strategic roadmap for SRE initiatives, aligning with business goals and technological advancements.Refine and standardize Standard Operating Procedures (SOPs) to enhance operational efficiency and consistency.Address customer pain points by developing and implementing solutions that improve user experience and system reliability.Engage with stakeholders to understand their needs and incorporate feedback into strategic planning and executionMonitor industry trends and best practices to ensure the SRE team remains at the forefront of technology.


Experience:

Bachelor’s degree in computer science, Engineering, or a related fieldStrong problem-solving and analytical abilitiesExcellent communication and collaboration skills4-6 years of relevant work experience, including significant experience with GCPExtensive experience with cloud infrastructure, GCP services and architectureProven track record of managing and optimizing large-scale systems on GCPProven ability to effectively communicate with individuals at all levels of the organizationAbility to maintain relationship and negotiate with vendors.Ability to operate in and leverage resources in a matrixed environment.Ability to analyze and present data to support ideas.Ability to clearly communicate to all levels of the organization.

  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Lorien Full time

    Hybrid - Manchester We are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...


  • Toronto, Canada KPMG Canada Full time

    OverviewAt KPMG, you'll join a team of diverse and dedicated problem solvers, connected by a common cause: turning insight into opportunity for clients and communities around the world.The OPS Site Reliability Engineer will be a focal role owning and ensuring the fluent operations of Managed Service


  • Old Toronto, Canada TD Bank Full time

    Site Reliability Engineer Site Reliability Engineer Work Location: Canada Hours: 37.5 Line of Business: Technology Solutions Pay Details: We’re committed to providing fair and equitable compensation to all our colleagues. As a candidate, we encourage you to have an open dialogue with a member of


  • Old Toronto, Canada Lorien Full time

    p>Hybrid - ManchesterWe are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...


  • Old Toronto, Canada Street Context Full time

    Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale ? Do you command with a steady hand when incidents unfold? Are you motivated by team success ? If so, continue reading… We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street...


  • Old Toronto, Canada Street Context Full time

    p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...


  • Old Toronto, Canada Street Context Full time

    p>Are you a Site Reliability Engineer that has a passion for building reliable, resilient and performant systems that scale? p>We are on a mission to build and strengthen our engineering teams to match the accelerating success of Street Context. We provide a premium Email, Analytics and Broker Relationship platform, purpose-built for capital markets and...


  • Toronto, Canada Northbridge Financial Corporation Full time

    What is it like to be a Senior Site Reliability Engineer at Northbridge Financial The Senior Site Reliability Engineer oversees the creation and implementation of Service Level Objectives (SLOs). The Senior SRE handles service reliability solutions and processes of increasing complexity, and are responsible for mentoring and leading less experienced...


  • Toronto, Canada SGS Full time

    Job Description The Site Reliability Engineer will play a critical part in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API. Partner with developers and product operations teams to understand application requirements and translate them into operational practices....


  • Old Toronto, Canada Sentry Full time

    About the role The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers.


  • Old Toronto, Canada Soda Full time

    Job Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...


  • Toronto, Ontario, Canada Compunnel Inc. Full time

    Compunnel Inc. is a leading provider of innovative technology solutions.We are seeking an experienced Site Reliability Engineering Lead to join our team in Toronto, Canada.The estimated salary for this position is $170,000 per year, considering the location and industry standards.About the JobThis role is perfect for someone who is passionate about driving...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (5 months 29 days)Published 8 months agoCLOSEDGCPSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic and...


  • Old Toronto, Canada Mastech Inc. Full time

    Mastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...


  • Greater Toronto Area, Canada GlossGenius Full time

    About GlossGenius GlossGenius is building an ecosystem enabling entrepreneurs to succeed. We empower small business owners to focus on being creators, not admins, by offering a range of business management tools including booking and scheduling, marketing, analytics, payment processing and much more.  Over 75,000 small business owners have chosen to...


  • Toronto, Canada Tecsys Inc. Full time

    Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Canada mccainfood Full time

       Position Title: Site Reliability Engineer Position Type: Regular - Full-Time ​Position Location: Toronto HQ Requisition ID: 32708   JOB PURPOSE:The Site Reliability Engineer (SRE) for Unified Communications will ensure the reliability, performance, and scalability of communication services and systems across global operations. This role...


  • Toronto, Canada The Home Depot Canada Full time

    With a career at The Home Depot, you can be yourself and also be part of something bigger. Position Overview: The Manager, SRE will lead a team of Site Reliability Engineers to ensure the reliability, performance, and operational support of our eCommerce systems, with a focus on Google Cloud Platform (GCP) environments. This role requires a strong...