Manager, Site Reliability Engineering

3 days ago


Toronto Ontario CA, Ontario The Home Depot Canada Full time
With a career at The Home Depot, you can be yourself and also be part of something bigger.

Position Overview:
The Manager, SRE will lead a team of Site Reliability Engineers to ensure the reliability, performance, and operational support of our eCommerce systems, with a focus on Google Cloud Platform (GCP) environments. This role requires a strong background in reliability reviews, performance engineering practices, production engineering, and operational support, with emphasis on DevOps principles and GCP expertise.
Responsibilities:
Leadership & Management:Lead and mentor a team of Site Reliability Engineers
Foster a culture of continuous improvement and innovation
Collaborate with cross-functional teams to align SRE practices with business objectives

Reliability & Performance:Conduct reliability reviews to identify areas for improvement and implement solutions to enhance system reliability, particularly in GCP environments
Implement and promote performance engineering practices to ensure optimal system performance on GCP
Develop and maintain service level objectives (SLOs) and error budgets

Production Engineering & Operational Support:Oversee production engineering efforts to ensure systems are designed for operational excellence and reliability, leveraging GCP services and best practices
Manage incident response and post-incident reviews to minimize downtime and improve system resilience
Implement monitoring, alerting, and observability solutions to proactively identify and address issues
Develop and maintain runbooks and playbooks for common operational tasks.
Coordinate with security teams to ensure compliance with security policies and best practice

DevOps & Continuous Improvement:Drive DevOps initiatives to improve collaboration between development and operations teams, with a focus on GCP-native tools and services
Implement and maintain CI/CD pipelines to streamline deployment processes in GCP environments
Identify and implement automation opportunities to reduce manual tasks and improve efficiency
Promote the use of Infrastructure as Code (IaC) to manage and provision cloud resources.
Continuously evaluate and integrate new tools and technologies to enhance DevOps practices

Release Management:Implement and maintain release management best practices to minimize disruptions and maximize system stability
Collaborate with DevOps teams to integrate release management into CI/CD pipelines
Oversee release schedules, ensuring minimal impact on business operations
Ensure there is a rigorous release readiness process in place that includes reviews and post-release retrospectives
Maintain a release calendar and communicate release plans to stakeholders

Strategic Planning:Create and maintain a strategic roadmap for SRE initiatives, aligning with business goals and technological advancements.
Refine and standardize Standard Operating Procedures (SOPs) to enhance operational efficiency and consistency.
Address customer pain points by developing and implementing solutions that improve user experience and system reliability.
Engage with stakeholders to understand their needs and incorporate feedback into strategic planning and execution
Monitor industry trends and best practices to ensure the SRE team remains at the forefront of technology.

Experience:
Bachelor’s degree in computer science, Engineering, or a related field
Strong problem-solving and analytical abilities
Excellent communication and collaboration skills
4-6 years of relevant work experience, including significant experience with GCP
Extensive experience with cloud infrastructure, GCP services and architecture
Proven track record of managing and optimizing large-scale systems on GCP
Proven ability to effectively communicate with individuals at all levels of the organization
Ability to maintain relationship and negotiate with vendors.
Ability to operate in and leverage resources in a matrixed environment.
Ability to analyze and present data to support ideas.
Ability to clearly communicate to all levels of the organization.
  • Site Engineer

    5 days ago


    Toronto, Ontario, C6A, Ontario, Canada FCC Construcción Full time

    Build Your Career at FCC CanadaFCC is an international reference in engineering and infrastructure. Fomento de Construcciones y Contratas (“FCC”), headquartered in Spain, is the parent company of one of the world’s leading infrastructure and citizen services groups. With more than a century of history, our business portfolio is highly diversified....


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Royal Bank of Canada is seeking a highly skilled Site Reliability Engineering (SRE) leader to join our team in Toronto, Canada. As an SRE leader, you will be responsible for leading the development and implementation of SRE solutions that improve the reliability and performance of our applications.The ideal candidate will have 5+ years of experience as a...

  • Site Engineer

    5 days ago


    Toronto, Ontario, C6A, Ontario, Canada FCC Construcción Full time

    Build Your Career at FCC CanadaFCC is an international reference in engineering and infrastructure. Fomento de Construcciones y Contratas (“FCC”), headquartered in Spain, is the parent company of one of the world’s leading infrastructure and citizen services groups. With more than a century of history, our business portfolio is highly diversified....

  • Reliability Engineer

    1 month ago


    Toronto, Ontario, Ontario, Canada Major Recruitment Full time

    Reliability Engineer***Must be Canadian Citizen or Permanent Resident requiring no sponsorship***My Client have a shared vision for greatness. We manufacture some of North America’s most popular tissue brands - Cashmere®, Purex®, Scotties®, SpongeTowels®, Bonterra®, White Cloud®, as well as products for use away from home.We are leaders in our...


  • Toronto, Ontario, Canada Peter Lucas Project Management Inc. Full time

    Job OverviewA leading project management company, Peter Lucas Project Management Inc., is seeking a skilled Reliability Engineering Specialist to join their team. This critical role involves developing and implementing asset maintenance strategies, conducting root cause analysis, creating risk mitigation plans, and optimizing preventative maintenance...


  • Toronto, Ontario, C6A, Ontario, Canada S.i. Systems Full time

    Our valued crown corporation client is seeking a Senior Site Reliability Engineer (SRE) to support the installation & configuration of Dynatrace to ensure seamless integration with existing systems and infrastructure!Initial 3 year contract in Ottawa, ON with strong possibility of extension to a total term of 4 years. 7.5 hours per day, Monday to Friday....

  • Site Engineer

    5 days ago


    Toronto, Ontario, C6A, Ontario, Canada Webuild Full time

    About Us:Webuild is an international construction company of civil engineering pioneers who have been at the forefront of the construction business for 120 years. We are a global player with Italian roots specializing in complex infrastructure: innovative and sustainable works that improve the lives of people. In over a century, we built some of the...


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Job SummaryRoyal Bank of Canada is seeking an experienced professional to lead our Site Reliability Engineering (SRE) efforts for our US Cash Management Technology. This is a unique opportunity to shape the future technology landscape of the company, delivering key business values and implementing strategic components across all RBC functions defined in our...


  • Toronto, Ontario, Canada Tecsys Inc. Full time

    About the RoleWe are looking for an exceptional Site Reliability Engineer to join our Network and Security Operations Center team. As a key member of our team, you will be responsible for ensuring the reliability and uptime of our platform and applications.Key Responsibilities:Collaborate with Engineering teams to support services through system design...


  • Toronto, Ontario, Ontario, Canada Design Works Engineering Full time

    Hello and welcome to Design Works Engineering!We are a multidisciplinary engineering firm that includes civil engineering, structural engineering, mechanical engineering, electrical engineering, energy modelling, and fire protection design. Our diverse staff shares the same vision: to create great projects and even better relationshipsOur team is a group of...


  • Toronto, Ontario, Canada Teranet Inc. Full time

    About TeranetTeranet is a leading innovator in electronic services and solutions, operating one of the most advanced and secure registration systems worldwide.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our DevOps team. The ideal candidate will possess strong software engineering principles and infrastructure expertise to...

  • Reliability Engineer

    2 weeks ago


    Toronto, Ontario, Canada Disability Solutions Full time

    Job SummaryThe position of the Reliability Engineer is to execute reliability, availability, and maintainability (RAM) analysis and engineering in support of Information & Technology solutions. The primary objective is to ensure that these solutions have attributes of high robustness, reliability, and availability.

  • Cloud Platform Lead

    1 month ago


    Toronto, Ontario, Canada Royal Bank of Canada Full time

    Role OverviewWe are seeking a seasoned Cloud Platform Lead to spearhead the design and development of highly scalable, secure, and available architectures for cloud platforms. As a key member of our team, you will lead and coordinate a team of talented Site Reliability Engineers and Cloud Platform Engineers to drive innovation and excellence.About the...


  • Toronto, Ontario, Canada PointsBet Canada Full time

    About the Role">The ideal candidate will ensure the reliability, scalability, and performance of our product. This involves leading efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient infrastructure, minimize downtime, and...

  • Field Engineer

    2 weeks ago


    Toronto, Ontario, C6A, Ontario, Canada Webuild Full time

    About Us:Webuild is an international construction company of civil engineering pioneers who have been at the forefront of the construction business for 120 years. We are a global player with Italian roots specializing in complex infrastructure: innovative and sustainable works that improve the lives of people. In over a century, we built some of the...


  • Toronto, Ontario, Canada Manager Full time

    We are seeking a reliable and motivated individual to join our team as a part-time cleaner for an Airbnb property. This is an entry-level position that offers a competitive hourly rate of $15.As a part-time cleaner, you will be responsible for maintaining the cleanliness and organization of our Airbnb space. If you are interested in this opportunity, please...

  • Data Engineer

    2 weeks ago


    Toronto, Ontario, C6A, Ontario, Canada Quarry Consulting Full time

    Title: Data EngineerLocation: Old Toronto, ON - 2/3 times a week on-siteDuration: Permanent role FTKey Responsibilities:Design, develop, and maintain data pipelines for handling large volumes of data streams using Apache Kafka.Implement real-time data processing solutions using Apache Flink or Apache Spark.Build and maintain RESTful APIs using Spring Boot to...


  • Toronto, Ontario, Ontario, Canada Stathera, Inc. Full time

    Stathera is a fabless semiconductor company focused on providing cutting-edge MEMS-based timing solutions. With offices in Montreal and Toronto and Boston, our team is re-architecting the traditional quartz-based timing industry with the introduction of state-of-the-art DualMode™ frequency technology. This breakthrough innovation is redefining what is...


  • Toronto, Ontario, Canada Royal Bank of Canada Full time

    Job Summary">We are seeking a highly motivated Technical Release Coordinator to join our Digital SRE Environment and Release team. This role offers the unique opportunity to work at the intersection of technology, reliability, and delivery, ensuring the smooth execution of technical projects that directly impact our digital infrastructure and release...


  • Toronto, Ontario, Canada henon Full time

    Are you looking for a challenging role that combines DevOps and Site Reliability Engineering skills? Henon is seeking a highly skilled Senior Software Reliability Engineer to join our team.Job Summary:We are building a relationship-first, tech-enabled financial services company founded to help Private Equity firms grow. As a key member of our engineering...