Site Reliability Engineer Lead

5 days ago


Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full time
About the Role

We are seeking a highly skilled Site Reliability Engineer Lead to join our team at KUBRA Data Transfer Ltd. As a key member of our DevOps team, you will be responsible for ensuring the stability, reliability, and efficiency of our customer experience management platforms.

Key Responsibilities
  • Infrastructure Optimization: Ensure that our infrastructure and applications perform within established Service Level Agreements (SLA) and Service Level Objectives (SLO).
  • Standardization and Best Practices: Maintain well-documented standards and best practices to ensure services are built for high availability and security.
  • Automation and Observability: Implement appropriate automation and observability to achieve low and continuously improving mean time to recovery (MTTR) for service-impacting incidents.
  • Incident Management: Document any incidents thoroughly, along with corresponding problem records and corrective actions.
  • Architectural Review: Participate in the Architectural Review Process for new and existing services, ensuring compliance with high-availability, observability, security, and cost efficiency standards.
  • Governance and Compliance: Enhance governance processes to ensure all platform components meet current standards.
  • Root Cause Analysis: Lead root cause analysis for major incidents, communicating with senior stakeholders, driving problem-solving, and debugging using best practice techniques.
  • Experimentation and Remediation: Design and conduct fault injection experiments to identify potential weak points in high-availability architecture and work with engineering teams to remediate findings.
  • Collaboration and Documentation: Collaborate with engineering teams to optimize infrastructure for security, resiliency, and cost targets based on collected feedback. Document processes and maintain records related to infrastructure procedures and strategies, ensuring appropriate alerts and support procedures are in place for quick incident remediation.
Requirements
  • Technical Expertise: Adept at solving complex technical challenges and devising effective solutions.
  • Attention to Detail: Meticulous attention to detail to ensure high standards of availability and security.
  • Teamwork and Communication: Team player with strong interpersonal skills, able to work well within a team setting. Effective communicator, capable of explaining complex technical issues to both technical and non-technical audiences.
  • Leadership and Management: Proven leadership and team management skills.
  • Education and Experience: Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent experience. 5+ years of experience in site reliability engineering or a related field.
  • Technical Skills: Experience with systems programming languages, such as Go or Python, and shell scripting. Proficient with Terraform and infrastructure as code principles. Demonstrated proficiency in public cloud environments, particularly AWS. Hands-on experience with Kubernetes management within AWS EKS. Experience with CI/CD automation tools, such as CircleCI and ArgoCD. Experience with monitoring and logging using tools like Prometheus, Grafana, Open Telemetry, CloudWatch, and Honeycomb.
About KUBRA Data Transfer Ltd

KUBRA Data Transfer Ltd is a fast-growing company that delivers customer communications solutions to some of the largest utility, insurance, and government entities across North America. We offer a casual work environment, competitive compensation, and a stellar benefits program.



  • Mississauga, Ontario, Canada Devopshunt Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer Lead to join our team at Devopshunt. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure...


  • Mississauga, Ontario, Canada Devopshunt Full time

    Job OverviewWe are seeking a highly skilled Site Reliability Engineer Lead to join our team at Devopshunt. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure...


  • Mississauga, Ontario, Canada Devopshunt Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Site Reliability Engineering Team Lead, you will be responsible for guiding our team in identifying potential issues, resolving complex problems, and leading technical and business discussions.Key...


  • Mississauga, Ontario, Canada Devopshunt Full time

    About the RoleWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Site Reliability Engineering Team Lead, you will be responsible for guiding our team in identifying potential issues, resolving complex problems, and leading technical and business discussions.Key...


  • Mississauga, Ontario, Canada KUBRA Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer Team Lead to join our team at KUBRA. As a key member of our DevOps team, you will be responsible for guiding our team in optimizing our customer experience management platforms.Key ResponsibilitiesEnsure that infrastructure and applications perform within established Service Level...


  • Mississauga, Ontario, Canada KUBRA Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer Team Lead to join our team at KUBRA. As a key member of our DevOps team, you will be responsible for guiding our team in optimizing our customer experience management platforms.Key ResponsibilitiesEnsure that infrastructure and applications perform within established Service Level...


  • Mississauga, Ontario, Canada KUBRA Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer Team Lead to join our team at KUBRA. As a key member of our DevOps team, you will be responsible for guiding our team in optimizing our customer experience management platforms.Key ResponsibilitiesEnsure that infrastructure and applications perform within established Service Level...


  • Mississauga, Ontario, Canada Mimecast Full time

    Senior Site Reliability EngineerContribute to the Development of Advanced Cloud-Enabled AI Security SolutionsAre you enthusiastic about software security? Do you excel in implementing large-scale public cloud solutions? Are you eager to apply Machine Learning to tackle intricate challenges? This position could be the perfect fit for you. Our Communication...


  • Mississauga, Ontario, Canada Mimecast Full time

    Senior Site Reliability EngineerContribute to the Development of Advanced Cloud-Scalable AI-Driven Security SolutionsAre you passionate about software security? Do you excel in implementing public cloud solutions at scale? Are you eager to utilize Machine Learning to tackle intricate challenges? This position may be an excellent fit for you. Our...


  • Mississauga, Ontario, Canada HOVER SENIOR LIVING COMMUNITY Full time

    Senior Site Reliability EngineerContribute to the Development of Advanced Cloud-Scalable Security SolutionsAre you passionate about software security? Do you excel in implementing scalable public cloud solutions? Are you eager to apply Machine Learning to tackle intricate challenges? This position may be the perfect fit for you. Our innovative security...


  • Mississauga, Ontario, Canada Mimecast Full time

    Senior Site Reliability EngineerContribute to the Development of Advanced Cloud-Scalable AI-Driven Security SolutionsAre you passionate about software security? Do you excel in implementing large-scale public cloud solutions? Are you eager to apply Machine Learning to tackle intricate challenges? This position could be the perfect fit for you. Our...


  • Mississauga, Ontario, Canada HOVER SENIOR LIVING COMMUNITY Full time

    Senior Site Reliability EngineerContribute to the Development of Advanced Cloud-Scalable AI-Driven Security SolutionsAre you passionate about software security? Do you excel in implementing large-scale public cloud solutions? Are you eager to utilize Machine Learning to tackle intricate challenges? This position could be an excellent fit for you. Our...


  • Mississauga, Ontario, Canada HOVER SENIOR LIVING COMMUNITY Full time

    Senior Site Reliability EngineerContribute to Innovative Cloud-Driven Security SolutionsAre you passionate about software security? Do you excel in implementing scalable public cloud solutions? Are you eager to apply Machine Learning to tackle intricate challenges? This position may be the perfect fit for you. Our advanced Communication and Collaboration...


  • Mississauga, Ontario, Canada KUBRA Full time

    Are you passionate about transforming and optimizing complex infrastructures? Do you thrive on solving challenging technical problems and ensuring high availability, security, and performance in cloud environments?At KUBRA, we're seeking an enthusiastic and skilled Site Reliability Engineer to join our dynamic team. You'll work with cutting-edge technologies...


  • Mississauga, Ontario, Canada Maple Leaf Foods Full time

    The Role: In this pivotal position, you will report directly to the Head of Asset Management and Reliability, becoming an integral part of the Asset Reliability Group (ARG) at Maple Leaf Foods. The ARG is dedicated to enhancing manufacturing reliability and operational maturity across the Maple Leaf network of facilities in North America. Your...


  • Mississauga, Ontario, Canada Roche Full time

    About the RoleSenior Site Reliability Engineer (Kubernetes Platform) - Digital Products and EnablementAt Roche, we're revolutionizing healthcare by developing personalized medicine and advanced diagnostics. To accelerate medical processes, make them safer, and more accessible, we're heavily investing in software and digital solutions.The team you'll be...


  • Mississauga, Ontario, Canada Roche Full time

    About the RoleSenior Site Reliability Engineer (Kubernetes Platform) - Digital Products and EnablementAt Roche, we're revolutionizing healthcare by developing personalized medicine and advanced diagnostics. To accelerate medical processes, make them safer, and more accessible, we're heavily investing in software and digital solutions.The team you'll be...


  • Mississauga, Ontario, Canada Maple Leaf Foods Full time

    Position Overview: This role is integral to the Asset Reliability Group (ARG) at Maple Leaf Foods, reporting directly to the Head of Asset Management and Reliability. Key Responsibilities: Lead and oversee the implementation and sustainability of asset reliability initiatives at the plant level.Collaborate with cross-functional teams to enhance plant...


  • Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at KUBRA Data Transfer Ltd. As a key member of our infrastructure team, you will be responsible for ensuring the high availability, security, and performance of our cloud-based systems.Key ResponsibilitiesInfrastructure Optimization: Ensure that our...


  • Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at KUBRA Data Transfer Ltd. As a key member of our infrastructure team, you will be responsible for ensuring the high availability, security, and performance of our cloud-based systems.Key ResponsibilitiesInfrastructure Optimization: Ensure that our...