Senior Site Reliability Expert

6 days ago


Montreal administrative region, Canada Lightspeed Full time

Lightspeed is seeking a Senior Site Reliability Expert to join our Retail organization. Our SRE team is responsible for the design, build, and operation of Lightspeed’s infrastructure, and for creating the platform that empowers our product teams. The platform covers the full cycle of software delivery, from CI/CD pipelines to highly available, scalable production environments. What You’ll Be Responsible For Initiate and lead efforts to continuously improve our software delivery processes and practices, creating tools that accelerate product development across multi-location teams. Develop and maintain robust, self-service platforms for building, deploying, and operating services, with a strong focus on automation, scalability, and cost efficiency. Engage with people from different teams and actively contribute to increase cost efficiency and the reliability of different products. Use your engineering expertise to influence strategic decisions and advocate for best practices across your business unit, including Infrastructure as Code, monitoring, high availability, disaster recovery, and security. Take ownership of the preparation and maintenance of the team's technical documentation. Lead incident response, diagnosis, and resolution for critical production issues, and drive blameless post‑mortems. Proactively monitor system performance and capacity, define and enforce Service Level Objectives (SLOs), and identify and eliminate toil by building robust automation and tooling. What You’ll Be Bringing to the Team You are a proven technical leader with a track record of successfully building platforms and tools that solve complex problems and enable other teams. You have deep knowledge and practical skills in your domain, and you can effectively apply and communicate these concepts to others. You have expert knowledge of major cloud platforms (AWS/GCP), extensive experience with container technologies (Docker, Kubernetes), and strong proficiency with Linux systems. You have significant experience with deployment and build tools (e.g., CircleCI, Jenkins, ArgoCD), Infrastructure as Code practices (e.g., Terraform), and various datastores (e.g., MySQL, Postgres, Redis, DynamoDB). You are skilled in shell scripting and can read and write code in at least one programming language (Python, Ruby, or Go). You demonstrate strong leadership, critical thinking, and interpersonal skills , which you apply to solve complex problems and partner with other teams. You are passionate about continuous learning and growth and are adept at mentoring team members to foster their development. Who You Are A problem-solver who isn't afraid to tackle complex challenges. A team player and a "bar raiser." You have a strong desire to learn, grow, and step out of your comfort zone. You have great energy and passion for technology. You can express yourself flawlessly in English. You have strong interpersonal skills. What's in it for You Lightspeed offers a range of benefits to support your professional and personal life, including: Flexible work culture, autonomy, and the possibility of remote work. The opportunity to develop high-traffic products used globally. Exposure to modern, proven technologies and the opportunity to expand your skill set. Excellent growth opportunities in technical or people management roles. A range of benefits and perks, including equity for all Lightspeeders. A fast‑paced, high‑growth environment where you can become a valued part of a diverse and inclusive culture. Additionally, Lightspeed provides benefits to help you stay healthy and happy, such as a mental health platform, counseling services, health and wellness benefits, a LinkedIn Learning license, and a volunteer day to give back to the community. Lightspeed is a proud equal opportunity employer and we are committed to creating an inclusive and barrier‑free workplace. Lightspeed welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process. #J-18808-Ljbffr



  • Montreal (administrative region), Canada Lightspeed Full time

    Lightspeed is seeking a Senior Site Reliability Expert to join our Retail organization.Our SRE team is responsible for the design, build, and operation of Lightspeed’s infrastructure, and for creating the platform that empowers our product teams. The platform covers the full cycle of software delivery, from CI/CD pipelines to highly available, scalable...


  • Montreal (administrative region), Canada Lightspeed Full time

    Lightspeed is seeking a Senior Site Reliability Expert to join our Retail organization. Our SRE team is responsible for the design, build, and operation of Lightspeed’s infrastructure, and for creating the platform that empowers our product teams. The platform covers the full cycle of software delivery, from CI/CD pipelines to highly available, scalable...


  • Montreal, Canada Lightspeed Full time

    Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place! We’re looking for a Senior SRE to join our Lightspeed Retail group in North America, a team responsible for multiple POS systems infrastructure and developer experiences. The team is at the helm of providing a stable, reliable...


  • Montreal, Canada Lightspeed Commerce Full time

    Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right place! We’re looking for a Senior SRE to join our Lightspeed Retail group in North America, a team responsible for multiple POS systems infrastructure and developer experiences. The team is at the helm of providing a stable, reliable...

  • Montreal: Cloud

    2 days ago


    Montreal (administrative region), Canada High Tech Genesis Full time

    A leading tech company in Montreal is looking for a Site Reliability Engineer to provide expert support for network operations and cloud platforms. The ideal candidate will have over 3 years of experience in managing production systems, a strong understanding of networking, and hands-on experience in both Linux and Windows environments. Responsibilities...

  • Montreal: Cloud

    2 days ago


    Montreal (administrative region), Canada High Tech Genesis Full time

    A leading tech company in Montreal is looking for a Site Reliability Engineer to provide expert support for network operations and cloud platforms. The ideal candidate will have over 3 years of experience in managing production systems, a strong understanding of networking, and hands-on experience in both Linux and Windows environments. Responsibilities...

  • Montreal: Cloud

    2 days ago


    Montreal (administrative region), Canada High Tech Genesis Full time

    A leading tech company in Montreal is looking for a Site Reliability Engineer to provide expert support for network operations and cloud platforms. The ideal candidate will have over 3 years of experience in managing production systems, a strong understanding of networking, and hands-on experience in both Linux and Windows environments. Responsibilities...


  • Montreal (administrative region), Canada Canonical Full time

    Site Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...


  • Montreal (administrative region), Canada Canonical Full time

    Site Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...


  • Montreal (administrative region), Canada Canonical Full time

    Site Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...