Lead Site Reliability Engineer

4 weeks ago


Old Toronto, Ontario, Canada SoundHound Inc Full time

About SoundHound AI
SoundHound AI is dedicated to enabling seamless interactions between individuals and technology through natural language. Our innovative Voice AI solutions cater to diverse applications, including automotive systems and restaurant services, empowering brands to engage with their customers in meaningful ways.

Role Overview
This position offers an exciting opportunity to be part of our Site Reliability Engineering (SRE) team, responsible for constructing robust infrastructure that supports our cutting-edge technology. The SRE team plays a crucial role in delivering the global infrastructure necessary to serve millions of users around the globe. We prioritize efficiency, and automation is integral to our development process.

Key Responsibilities:
  • Develop software and systems for managing and automating cloud infrastructure (Ansible, Terraform, Oracle Cloud).
  • Contribute to the creation of frameworks for application deployment, customization, and upgrades (Kubernetes, ArgoCD, Gitlab, Jenkins).
  • Enhance observability, implement and track key performance indicators, and establish Service Level Objectives (SLOs) and Service Level Agreements (SLAs) (Prometheus, Grafana, ELK).
  • Collaborate with cross-functional teams to design and build reliable, scalable services.
  • Engage in the on-call rotation (follow-the-sun model).
Qualifications:
  • A minimum of 4 years of experience in a Site Reliability Engineer role or a similar capacity.
  • A Master's or Bachelor's degree in Computer Science or a related discipline.
  • Proven experience in managing and automating infrastructure within a cloud environment.
  • In-depth understanding of container orchestration platforms, particularly Kubernetes.
  • Technical proficiency with tools related to Observability, Infrastructure as Code, and Continuous Integration/Continuous Deployment (CI/CD).
  • Experience in troubleshooting complex Linux system and network issues.
  • Strong skills in writing high-quality Python code and utilizing GitOps methodologies.
  • A pragmatic, analytical, and solution-oriented mindset.

Additional Information
We value experience with microservices architecture in large-scale infrastructures that emphasize reliability and observability.

This role allows for flexible work arrangements, including virtual, hybrid, and in-office options. In addition to competitive compensation and equity, we offer comprehensive healthcare benefits, paid time off, and other perks. Our recruitment team will provide specific salary details based on location and experience.

Company Culture
At SoundHound AI, we are committed to fostering a values-driven environment that promotes support, transparency, resilience, agility, and excellence. Diversity, equity, inclusion, and belonging are fundamental to our identity as a company. Our mission to develop Voice AI solutions for a global audience necessitates a team enriched with diverse perspectives. Explore more about our philosophy, benefits, and culture at SoundHound Careers.

We prioritize creating an inclusive atmosphere where every individual can thrive and contribute their best work. SoundHound is dedicated to providing reasonable accommodations for individuals with disabilities throughout the interview process and in their roles.



  • Toronto, Ontario, Canada Lightspeed Restaurant Full time

    Lead Site Reliability Engineer at Lightspeed RestaurantWe are seeking a skilled Lead Site Reliability Engineer to become a vital part of our Lightspeed Restaurant team. Our mission is to create innovative software solutions that empower restaurants to enhance their operational efficiency and profitability.In the role of Lead Site Reliability Engineer, you...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to perform essential tasks that drive business success through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and scaling the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to execute essential tasks that drive business success through the PagerDuty Operations Cloud.We are looking for a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a significant role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Old Toronto, Ontario, Canada PagerDuty, Inc. Full time

    PagerDuty empowers diverse teams to drive essential operations that propel business growth through the PagerDuty Operations Cloud.We are in search of a Senior Site Reliability Engineer to become a vital member of our SRE-Platform team. In this capacity, you will play a crucial role in developing, sustaining, and enhancing the Kubernetes infrastructure that...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to join our team as a Lead Site Reliability Engineer at Thomson Reuters. As a key member of our engineering team, you will be responsible for leading and mentoring a team of SREs, providing technical guidance, coaching, and support to foster a culture of collaboration, innovation, and continuous improvement.Key...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to join our team as a Lead Site Reliability Engineer at Thomson Reuters. As a key member of our engineering team, you will be responsible for leading and mentoring a team of SREs, providing technical guidance, coaching, and support to foster a culture of collaboration, innovation, and continuous improvement.Key...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the development and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, as well as mentoring and guiding junior SREs. Key...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing complex service reliability solutions and processes, while also providing mentorship and guidance to junior...


  • Old Toronto, Ontario, Canada Moneris Full time

    Your Moneris Career - The Opportunity Moneris stands as a leader in payment processing, recognized as Canada's foremost provider and one of the largest in North America. Connect. Impact. Grow. Become part of one of Canada's esteemed employers and leave your mark at Moneris. The Senior Site Reliability Engineer at Moneris works in collaboration with various...


  • Old Toronto, Ontario, Canada Magic Leap - Multiple Locations Full time

    Transforming the Future of ComputingMagic Leap stands at the forefront of spatial computing, innovating advanced augmented reality solutions that integrate digital elements with the physical environment. As a leader in the next generation of computing platforms, our mixed reality devices open up new avenues for interaction and engagement with the world...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Overview of the Senior Site Reliability Engineer Role at Northbridge Financial Corporation The Senior Site Reliability Engineer is responsible for the establishment and execution of Service Level Objectives (SLOs). This role involves managing service reliability solutions and processes of increasing intricacy, along with mentoring and guiding junior...


  • Old Toronto, Ontario, Canada SoundHound Inc Full time

    About SoundHound AI: At SoundHound AI, we envision a world where every individual can seamlessly interact with technology through natural conversation. Our innovative Voice AI solutions cater to various sectors, including automotive and food services, empowering brands to connect with their audiences in meaningful ways.Role Overview: We are seeking a...


  • Toronto, Ontario, Canada CIRCLE Full time

    About Circle: Circle is a pioneering financial technology firm positioned at the forefront of the evolving digital economy, where value can traverse globally, almost instantaneously, and at a lower cost compared to traditional settlement systems. This innovative layer of the internet unveils extraordinary opportunities for transactions, commerce, and...


  • Old Toronto, Ontario, Canada SoundHound Inc Full time

    About SoundHound AISoundHound AI is dedicated to enabling seamless interaction with technology through natural language. Our innovative Voice AI solutions cater to various industries, enhancing user experiences and brand engagement.Role OverviewAs a vital member of our Site Reliability Engineering (SRE) team, you will be instrumental in developing robust...


  • Toronto, Ontario, Canada Northbridge Financial Corporation Full time

    Join Northbridge Financial Corporation as a Site Reliability Engineering LeadThe Site Reliability Engineering Lead is essential in maintaining the dependability, efficiency, and accessibility of our primary insurance systems. Collaborating closely with both application and infrastructure teams, your focus will be on preventing incidents, managing...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to lead a team of Site Reliability Engineers (SREs) at Thomson Reuters, a leading provider of news, information, and technology solutions to professionals in the legal, tax, accounting, and compliance markets.Key ResponsibilitiesTeam Leadership: Lead and mentor a team of SREs, providing technical guidance,...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleThis is an exciting opportunity to lead a team of Site Reliability Engineers (SREs) at Thomson Reuters, a leading provider of news, information, and technology solutions to professionals in the legal, tax, accounting, and compliance markets.Key ResponsibilitiesTeam Leadership: Lead and mentor a team of SREs, providing technical guidance,...


  • Toronto, Ontario, Canada Relay Financial Full time

    About Relay Financial:At Relay, we are revolutionizing the way businesses manage their finances. Traditional banking has often hindered growth for business owners, and we are committed to changing that narrative. Our platform is designed to be an all-in-one, collaborative solution for money management, tailored specifically for small to medium-sized...


  • Old Toronto, Ontario, Canada Akamai Full time

    Are you passionate about technology and teamwork? If you enjoy collaborating with diverse teams to tackle intricate challenges, consider joining our esteemed Nameserver SRE team.The Nameserver SRE team plays a pivotal role in defining, measuring, and optimizing the key performance indicators of Akamai's nameserver platform. We adopt a comprehensive approach...


  • Toronto, Ontario, Canada Alliancesrcare Full time

    About the RoleAt Alliancesrcare, we are transforming the landscape of financial services by offering a comprehensive platform for small to medium-sized businesses. We are in search of a Lead Site Reliability Engineer to become a pivotal member of our Trust team and contribute to the evolution of our services.Key ResponsibilitiesOversee and manage production...