Site Reliability Engineer Leader
4 days ago
We are seeking a highly skilled Site Reliability Engineer Leader to join our team at Devopshunt. As a key member of our DevOps team, you will be responsible for ensuring the stability, reliability, and efficiency of our customer experience management platforms.
Key Responsibilities- Ensure that infrastructure and applications perform within established Service Level Agreements (SLA) and Service Level Objectives (SLO).
- Maintain well-documented standards and best practices to ensure services are built for high availability and security.
- Implement appropriate automation and observability to achieve low and continuously improving mean time to recovery (MTTR) for service-impacting incidents.
- Document any incidents thoroughly, along with corresponding problem records and corrective actions.
- Participate in the Architectural Review Process for new and existing services, ensuring compliance with high-availability, observability, security, and cost efficiency standards.
- Enhance governance processes to ensure all platform components meet current standards.
- Lead root cause analysis for major incidents, communicating with senior stakeholders, driving problem-solving, and debugging using best practice techniques.
- Design and conduct fault injection experiments to identify potential weak points in high-availability architecture and work with engineering teams to remediate findings.
- Collaborate with engineering teams to optimize infrastructure for security, resiliency, and cost targets based on collected feedback.
- Document processes and maintain records related to infrastructure procedures and strategies, ensuring appropriate alerts and support procedures are in place for quick incident remediation.
- Bachelor's degree in Computer Science, Engineering, Information Technology, or equivalent experience.
- 5+ years of experience in site reliability engineering or a related field.
- Proven leadership and team management experience.
- Experience with systems programming languages, such as Go or Python, and shell scripting.
- Proficient with Terraform and infrastructure as code principles.
- Demonstrated proficiency in public cloud environments, particularly AWS.
- Hands-on experience with Kubernetes management within AWS EKS.
- Experience with CI/CD automation tools, such as CircleCI and ArgoCD.
- Experience with monitoring and logging using tools like Prometheus, Grafana, Open Telemetry, CloudWatch, and Honeycomb.
- AWS and Kubernetes Certifications (Solutions Architect, SysOps Administrator, DevOps Engineer, CKA, CKS, CKAD, KCNA) are desirable.
Devopshunt is a fast-growing company that delivers customer communications solutions to some of the largest utility, insurance, and government entities across North America. We offer a casual work environment, competitive compensation, and a stellar benefits program. Our office is small enough to allow creative individuals to flourish, yet large enough to provide long-term stability.
We place a tremendous amount of responsibility on our team members to be productive, focused, and self-motivated. We are an equal opportunity employer dedicated to building an inclusive and diverse workforce. We will provide accommodations during the recruitment process upon request.
-
Site Reliability Engineer Lead
2 weeks ago
Mississauga, Ontario, Canada Devopshunt Full timeJob OverviewWe are seeking a highly skilled Site Reliability Engineer Lead to join our team at Devopshunt. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure...
-
Site Reliability Engineer Lead
2 weeks ago
Mississauga, Ontario, Canada Devopshunt Full timeJob OverviewWe are seeking a highly skilled Site Reliability Engineer Lead to join our team at Devopshunt. As a key member of our DevOps team, you will be responsible for ensuring the reliability, scalability, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and highly available cloud infrastructure...
-
Site Reliability Engineer Lead
6 hours ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeJob Title: Team Lead, Site ReliabilityWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Team Lead, Site Reliability, you will be responsible for guiding our team in enhancing platform stability, reliability, and efficiency.Key Responsibilities:Ensure infrastructure...
-
Site Reliability Engineer Lead
3 hours ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeJob Title: Team Lead, Site ReliabilityWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Team Lead, Site Reliability, you will be responsible for guiding our team in enhancing platform stability, reliability, and efficiency.Key Responsibilities:Ensure infrastructure...
-
Site Reliability Engineer
1 hour ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at KUBRA Data Transfer Ltd. As a key member of our infrastructure team, you will play a critical role in ensuring the high availability, security, and performance of our cloud-based systems.Key ResponsibilitiesDesign and implement scalable and secure cloud...
-
Site Reliability Engineer
4 hours ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our dynamic team at KUBRA Data Transfer Ltd. As a key member of our infrastructure team, you will play a critical role in ensuring the high availability, security, and performance of our cloud-based systems.Key ResponsibilitiesDesign and implement scalable and secure cloud...
-
Lead Site Reliability Engineer
1 month ago
Mississauga, Ontario, Canada Mimecast Full timeSenior Site Reliability EngineerContribute to the Development of Advanced Cloud-Scalable AI-Driven Security SolutionsAre you passionate about software security? Do you excel in implementing large-scale public cloud solutions? Are you eager to apply Machine Learning to tackle intricate challenges? This position could be the perfect fit for you. Our...
-
Site Reliability Engineer
7 days ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at KUBRA Data Transfer Ltd. As a key member of our technical operations team, you will play a critical role in ensuring the high availability, security, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and secure cloud...
-
Site Reliability Engineer
7 days ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at KUBRA Data Transfer Ltd. As a key member of our technical operations team, you will play a critical role in ensuring the high availability, security, and performance of our cloud-based infrastructure.Key ResponsibilitiesDesign and implement scalable and secure cloud...
-
Site Reliability Engineer Team Lead
4 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Site Reliability Engineer Team Lead, you will be responsible for guiding our team in enhancing platform stability, reliability, and efficiency.Key...
-
Site Reliability Engineer Team Lead
4 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms. As a Site Reliability Engineer Team Lead, you will be responsible for guiding our team in enhancing platform stability, reliability, and efficiency.Key...
-
Site Reliability Engineer Lead
7 days ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeJob Title: Team Lead, Site ReliabilityWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Ensure infrastructure and applications perform within established Service Level Agreements (SLA) and Service Level Objectives (SLO).Maintain well-documented...
-
Site Reliability Engineer Lead
7 days ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeJob Title: Team Lead, Site ReliabilityWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Ensure infrastructure and applications perform within established Service Level Agreements (SLA) and Service Level Objectives (SLO).Maintain well-documented...
-
Site Reliability Engineer Team Lead
3 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Identify potential issues and resolve complex problemsLead technical and business discussionsImplement automation and observability to achieve low...
-
Site Reliability Engineer Team Lead
3 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Identify potential issues and resolve complex problemsLead technical and business discussionsImplement automation and observability to achieve low...
-
Site Reliability Engineer Team Lead
3 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Identify potential issues and resolve complex problems to ensure high-availability, observability, security, and cost efficiency standards.Implement...
-
Site Reliability Engineer Team Lead
3 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Identify potential issues and resolve complex problems to ensure high-availability, observability, security, and cost efficiency standards.Implement...
-
Site Reliability Engineer Team Lead
4 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Guide the DevOps team in identifying potential issues, resolving complex problems, and leading technical and business discussions.Implement automation and...
-
Site Reliability Engineer Team Lead
4 days ago
Mississauga, Ontario, Canada Devopshunt Full timeJob Title: Site Reliability Engineer Team LeadWe are seeking an experienced Site Reliability Engineer to lead our DevOps team in optimizing our customer experience management platforms.Key Responsibilities:Guide the DevOps team in identifying potential issues, resolving complex problems, and leading technical and business discussions.Implement automation and...
-
Site Reliability Engineer Lead
2 weeks ago
Mississauga, Ontario, Canada KUBRA Data Transfer Ltd Full timeAbout the RoleWe are seeking a highly skilled Site Reliability Engineer Lead to join our team at KUBRA Data Transfer Ltd. As a key member of our DevOps team, you will be responsible for ensuring the stability, reliability, and efficiency of our customer experience management platforms.Key ResponsibilitiesInfrastructure Optimization: Ensure that our...