Director, Site Reliability Engineering

1 week ago


Mississauga, Ontario, Canada CEI Fleet Collision and Safety Full time

Director, Site Reliability Engineering page is loaded

Director, Site Reliability Engineering

Apply locations Mississauga time type Full time posted on Posted Yesterday job requisition id R104373

Get started on an exciting career at Element

Element employees make a difference in the lives of others every day. We are re-defining the fleet management industry to be people first, then business – delivering on our promise of a superior client experience. This takes hard work and innovation, and we need more like-minded people on our team.

What We Need

We are looking for a Director, Site Reliability Engineering to join Element Fleet Management. As the largest pure-play fleet manager in the world, we provide unmatched products and services and solutions to our clients.

At Element, employees play a critical role in delivering value to customers and ensuring an exceptional client experience. We are committed to the success of our clients, employees, and investors by fostering a culture where every employee can make a difference

Are You:

  • An individual with strong customer focus, adaptability, and a proactive approach to problem-solving?
  • Someone with experience using data analytics to drive decision-making for system improvements and incident prevention?

As the Director, Site Reliability Engineering, you will lead and manage our SRE team, working closely with cross-functional teams to implement and refine SRE practices, minimize downtime, and drive automation for high efficiency. You will bring a mix of operational and engineering expertise to design robust systems, oversee incident management, monitor key metrics, and foster a culture of continuous improvement. Your work will directly impact our ability to deliver reliable, scalable services to our customers.

A Day in the Life

  • Team Leadership and Development: Hire, mentor, and develop a high-performing SRE team. Foster a culture of collaboration, continuous learning, and innovation. Provide ongoing training and development opportunities for team growth.
  • Incident Management and Response: Lead the team in incident response, coordinating with cross-functional stakeholders to ensure timely resolution. Conduct thorough post-mortems, identifying and implementing preventive measures.
  • Problem Management: Analyze and address underlying issues in applications and systems to prevent recurring incidents. Establish and maintain processes for identifying, tracking, and resolving long-term problems, promoting continual improvement.
  • Change Management and Release Engineering: Implement and oversee change management practices, ensuring safe and reliable releases. Work closely with development and QA teams to standardize and optimize deployment pipelines for maximum reliability and scalability.
  • Service Level Objectives (SLOs) and SLAs: Establish, monitor, and enforce SLOs, SLIs, and SLAs that align with business requirements. Regularly review and update SLOs to reflect changing system needs and customer expectations.
  • Monitoring, Alerting, and Reporting: Build and maintain robust monitoring, logging, and alerting solutions for system health and application performance. Develop regular reports on reliability metrics and trends to identify areas for improvement.
  • Automation and Tooling: Drive the adoption of automation and self-healing systems to reduce manual intervention, improve efficiency, and minimize human error. Oversee the development of tools and frameworks to support automation in deployment, monitoring, and incident response.
  • Capacity Planning and Disaster Recovery: Conduct capacity planning and manage resources to ensure systems can handle current and future demands. Establish and maintain disaster recovery and business continuity plans for critical systems.
  • Audit and Compliance: Collaborate with internal and external audit teams to ensure that our production systems meet SOC1, SOX, and other regulatory requirements. Oversee the creation of reports and documentation to support compliance and audit processes.
  • Vendor Management: Manage relationships with external vendors to ensure they meet performance and service level agreements. Work with vendors on troubleshooting, support, and continuous improvement initiatives.

Requirements

  • Bachelor's degree in computer science, engineering, or a related field; advanced degree preferred.
  • 10+ years of experience in IT operations, SRE, or related field, with a strong record of managing high-availability systems in production environments.
  • In-depth knowledge of cloud infrastructure (AWS, Azure, or GCP), containerization (Docker, Kubernetes), and infrastructure as code (Terraform, Ansible).
  • Solid understanding of SRE principles and practices, including error budgets, service level objectives (SLOs), and service level indicators (SLIs).
  • Strong background in automation, CI/CD, and DevOps practices, with experience using tools such as Jenkins, GitLab CI/CD, or similar.
  • Experience with observability tools such as Prometheus, Grafana, ELK Stack, Splunk, or DataDog, and the ability to design, implement, and interpret monitoring and alerting systems.
  • Proven ability to lead and manage incident response and post-incident analysis, with a focus on improving response times and reducing incident frequency.
  • Proficiency in scripting and programming languages such as Python, Go, or Bash, with an ability to build automation scripts and tooling.
  • Familiarity with SOC1, SOX, and other regulatory compliance frameworks, and experience in maintaining audit and compliance documentation.
  • Strong project management skills with a focus on prioritization, resource planning, and risk assessment.

Nice-to-Have Skills

  • Google Cloud Professional DevOps Engineer, AWS Certified DevOps Engineer, or Certified Kubernetes Administrator (CKA)
  • ITIL Certification, ITSM Certification, or PMP certification
  • Familiarity with advanced SRE tools and practices such as chaos engineering, load testing, and synthetic monitoring
  • Experience managing third-party relationships to ensure vendors meet performance and service level expectations
  • Hands-on experience in coordinating with audit teams for compliance documentation and requirements.

The hiring base salary range for this position is $162,700 - $223,700 annually. Actual compensation within this range will be dependent upon the individual's knowledge, skills, experience, equity with other team members, and alignment with market data.

What's in it for You

• A culture of innovation, empowerment, decision-making, and accountability

• Comprehensive health and welfare benefits that serve the needs of you and your family and foster a culture of wellness (for qualified roles)

• Additional benefits and amenities, including paid time-off programs (vacation, sick leave, and holidays) (for qualified roles)

Applicants will be required to undergo a background check only if and after a conditional offer of employment has been extended.

Element Fleet Management and its wholly owned subsidiaries are an equal opportunity employer committed to diversity, equity, inclusion, and belonging. We are pleased to consider all qualified applicants for employment without regard to race, color, religion, gender identity, age, sex, sexual orientation, disability, national origin, Aboriginal/Native American status, protected veterans' status or any other legally-protected factors. Disability-related accommodations during the application and interview process are available upon request. Should you require an accommodation with our hiring process please send an email to talentacquisition@elementcorp.com or call (800) 665-9744.

#J-18808-Ljbffr

  • Mississauga, Ontario, Canada Element Fleet Management Full time

    Get started on an exciting career at ElementElement employees make a difference in the lives of others every day. We are re-defining the fleet management industry to be people first, then business – delivering on our promise of a superior client experience. This takes hard work and innovation, and we need more like-minded people on our team.What We NeedWe...


  • Mississauga, Ontario, Canada Element Fleet Management Full time

    Get started on an exciting career at ElementElement employees make a difference in the lives of others every day. We are re-defining the fleet management industry to be people first, then business – delivering on our promise of a superior client experience. This takes hard work and innovation, and we need more like-minded people on our team.What We NeedWe...


  • Mississauga, Ontario, Canada Scotiabank Full time

    Requisition ID: 219255Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.PurposeAt Scotiabank, we are executing towards a bold new strategy and vision for our Bank to be our client's most trusted financial partner, to drive sustainable, profitable growth and maximize total shareholder return.The Technology...


  • Mississauga, Ontario, Canada Scotiabank Full time

    Requisition ID: 219255Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.PurposeAt Scotiabank, we are executing towards a bold new strategy and vision for our Bank to be our client's most trusted financial partner, to drive sustainable, profitable growth and maximize total shareholder return.The Technology...


  • Mississauga, Ontario, Canada Moneris Full time

    Your Moneris Career - The OpportunityWe are looking for a Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will help ensure the reliability, performance, and scalability of our systems. You will work with development and operations teams to build and maintain robust infrastructure, automate processes, and improve overall system...


  • Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full time

    Job DescriptionWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.This dynamic role involves guiding cross-functional teams to apply SRE principles and drive continuous improvement, leveraging technical expertise to identify potential issues and resolve complex...


  • Mississauga, Ontario, Canada PointClickCare Full time

    Job DescriptionAs a Site Reliability Engineer (SRE), you will contribute technically to a team focused on applying software engineering practices to operations at scale. Your responsibilities will include monitoring and reporting on service level objectives for applications services, working with business and product owners to establish key performance...


  • Mississauga, Ontario, Canada CEI Fleet Collision and Safety Full time

    We are seeking a seasoned Site Director who can provide leadership and vision for our Site Reliability Engineering team. The ideal candidate will have a deep understanding of cloud infrastructure, containerization, and infrastructure as code, as well as experience with SRE principles and practices.As a Site Director, you will be responsible for leading and...


  • Mississauga, Ontario, Canada Royal Bank of Canada> Full time

    Job Summary Job DescriptionWhat is the Opportunity? RBC Insurance Technology is seeking to hire a Senior Site Reliability Engineer for its Insurance Technology Platform Support team. The Insurance Technology Platform Support Team is a specialized unit dedicated to ensuring the optimal performance, availability, and...


  • Mississauga, Ontario, Canada PointClickCare Full time

    PointClickCare is a leading North American healthcare technology platform enabling meaningful care collaboration and real‐time patient insights. For over 20 years, the company has been focused on realizing its vision: to help create a world in which providers and plans can confidently deliver frictionless care. Since its inception, PointClickCare has grown...


  • Mississauga, Ontario, Canada Thermo Fisher Scientific Inc. Full time

    Work ScheduleOtherEnvironmental ConditionsOfficeJob DescriptionBusiness Title: Reliability Engineering - Co-OpIssue Date: 31.MAY.2023Revision #: 1Summary:The main focus of this position is to provide support for the Engineering department.Main Job Duties:1. Work closely with Reliability Engineering Team/Manager to develop and implement processes and...


  • Mississauga, Ontario, Canada Thermo Fisher Scientific Inc. Full time

    Work ScheduleOtherEnvironmental ConditionsOfficeJob DescriptionBusiness Title: Reliability Engineering - Co-OpIssue Date: 31.MAY.2023Revision #: 1Summary:The main focus of this position is to provide support for the Engineering department.Main Job Duties:Work closely with Reliability Engineering Team/Manager to develop and implement processes and procedures...


  • Mississauga, Ontario, Canada Index Exchange Full time

    We shaped the earliest forms of ad tech, and we're looking for the technical expertise to help shape its future. Our customers have unique problems that can only be solved at internet scale, and that's where the technical skills of our team make a real difference.Our exchange handles over 500 billion requests every day (for comparison Google serves an...


  • Mississauga, Ontario, Canada PointClickCare Full time

    PointClickCare is a leading North American healthcare technology platform enabling meaningful care collaboration and real‐time patient insights. For over 20 years, the company has been focused on realizing its vision: to help create a world in which providers and plans can confidently deliver frictionless care. Since its inception, PointClickCare has grown...


  • Mississauga, Ontario, Canada PointClickCare Full time

    PointClickCare is a leading North American healthcare technology platform enabling meaningful care collaboration and real‐time patient insights. For over 20 years, the company has been focused on realizing its vision: to help create a world in which providers and plans can confidently deliver frictionless care. Since its inception, PointClickCare has grown...


  • Mississauga, Ontario, Canada Thermo Fisher Scientific Inc. Full time

    Job DescriptionThe primary objective of this role is to provide support for the Engineering department. Key responsibilities include working closely with the Reliability Engineering Team/Manager to develop and implement processes and procedures necessary to establish a Reliability Centered Maintenance (RCM) culture in the site and improve overall equipment...


  • Mississauga, Ontario, Canada Index Exchange Full time

    Unlocking Ad Tech Innovation at Index ExchangeWe are the pioneers of ad tech, shaping its future with cutting-edge technology and innovative solutions. As a global leader in programmatic advertising, we need skilled professionals to help us drive growth and success.Our exchange handles over 500 billion requests every day, making it one of the most complex...


  • Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full time

    Are you an experienced Site Reliability Engineer with a passion for enhancing platform stability, reliability, and efficiency? We are growing at KUBRA, and we're looking for a skilled Team Lead, Site Reliability Engineer, where you will guide our DevOps team in optimizing our customer experience management platforms.In this dynamic role, you will work...


  • Mississauga, Ontario, Canada De Havilland Aircraft of Canada Limited Full time

    About the CompanyDe Havilland Aircraft of Canada Limited is a renowned name in the aerospace industry, recognized worldwide for its pioneering contributions to aviation and its unwavering commitment to quality, innovation, and reliability. With a rich history dating back to 1928, the company has evolved to become a comprehensive aerospace company with...


  • Mississauga, Ontario, Canada PointClickCare Full time

    PointClickCare is a leading North American healthcare technology platform enabling meaningful care collaboration and real-time patient insights. We offer a wealth of opportunities and a vibrant culture that empowers our employees to make a lasting impact on healthcare across North America.We are seeking an Intermediate Site Reliability Engineer with at least...