Principal Site Reliability Engineer

2 weeks ago


Montréal QC, Canada Lightspeed Commerce Full time

Hi there Thanks for stopping by

Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place

We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.

What you’ll be doing:

  • Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
  • Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
  • Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
  • Drive incident management process and conduct post-mortem analysis to prevent future outages.
  • Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
  • Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
  • Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
  • Design and build robust, scalable, and highly available systems.
  • Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery
  • Manage infrastructure change through infrastructure as code (IaC)
  • Be part of our on-call rotation.
  • Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.

What you need to bring:

  • Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.
  • 9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.
  • Strong expertise in container orchestration platforms, specifically Kubernetes.
  • Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
  • Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
  • Proficiency in programming languages such as Java, Python, Go, etc.
  • Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.
  • Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
  • Strong understanding of security best practices.
  • Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
  • Excellent communication skills to effectively collaborate with cross-functional teams.
  • Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.

We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the ‘Apply’ button and give it a try

What’s in it for you?

Come live the Lightspeed experience...

  • Ability to do your job in a truly flexible environment;
  • Genuine career opportunities in a company that’s creating new jobs everyday;
  • Work in a team big enough for growth but lean enough to make a real impact.

… and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry:

  • Lightspeed share scheme (we are all owners)
  • Lightspeed RSU program (we are all owners)
  • Unlimited paid time off policy
  • Flexible working policy
  • Health insurance
  • Health and wellness benefits
  • Paid leave assistance for new parents
  • Linkedin learning
  • Volunteer day
#J-18808-Ljbffr

  • Montréal, QC, Canada Lightspeed Full time

    We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure,...


  • Montréal, QC, Canada Noverka Conseil Full time

    At Noverka, our values illustrate who we are and define our beliefs: Human, Transparent, Passionate. We are driven by innovation and success, both in our relationships and in our practices. Finding the right job for the right person is what we do best! Our client, an organization in the banking industry is looking for a Site Reliability Engineering (SRE)...


  • Montréal, QC, Canada CGI Full time

    Position Description: CGI is a dynamic and innovative technology firm committed to delivering cutting-edge solutions. We are currently seeking a highly skilled and motivated individual to join our team as a FinOps and Site Reliability Engineer (SRE). This role is pivotal in bridging our finance and technology teams to ensure the successful implementation...


  • Montréal, QC, Canada Behavox Full time

    About Behavox Behavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. From managing enterprise risk and compliance to maximizing revenue and value,...


  • Montréal, QC, Canada Synechron Full time

    Nous sommes Synechron est un cabinet de conseil leader mondial en transformation numérique, axé sur les services financiers et les organisations technologiques. Nos spécialités incluent l'intelligence artificielle de bout en bout, le conseil, le numérique, le cloud & DevOps, les données et l'ingénierie logicielle. Notre client dans le domaine de la...


  • Montréal, QC, Canada OVHcloud Full time

    Département Nous recherchons un.e Ingénieur.e fiabilité de site / DevOps pour notre département TI, technologie & Produits qui conçoit et développe les produits, les services, les infrastructures qui construisent ensemble l’avenir d’OVHcloud. Responsabilités Concevoir et développer des modules fonctionnels intégrés tout en veillant à leur...


  • Montréal, QC, Canada SAP Full time

    We help the world run better Our company culture is focused on helping our employees enable innovation by building breakthroughs together. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly...


  • Montréal, QC, Canada Haven Interactive Studios Full time

    En mai 2021, nous sommes lancés dans la création de Haven Studios avec une petite équipe et de grandes ambitions. Notre objectif était de construire un studio où nous pourrions créer le genre de jeux que nous avons toujours voulu créer - et auxquels nous avons toujours rêvé de jouer ! Nous faisons désormais partie de la famille PlayStations...


  • Montréal, QC, Canada Sony Interactive Entertainment Inc. Full time

    //FRENCH FOLLOWS// In May 2021, we embarked on a journey to start Haven Studios with a small team and big ambitions. Our goal was to build a studio where we could make the kind of games we’ve always wanted to create – and games we’ve longed to play. We’ve made amazing progress in a short time thanks to our talented, passionate team and their...


  • Montréal, QC, Canada Lightspeed Restaurant Full time

    Data is the new Gold!!! We are here to help our data teams build and maintain the data and AI infrastructure platform, and the governance framework needed for having data flowing everywhere at Lightspeed. Data security, reliability, and high availability are our mojo. Role : Collaborate seamlessly with cross-functional data teams to craft and deploy...


  • Montréal, QC, Canada Sony Playstation Full time

    Why PlayStation? PlayStation isn’t just the Best Place to Play — it’s also the Best Place to Work. Today, we’re recognized as a global leader in entertainment producing The PlayStation family of products and services including PlayStation5, PlayStation4, PlayStationVR, PlayStationPlus, acclaimed PlayStation software titles from PlayStation Studios,...

  • Reliability Engineer

    2 weeks ago


    Kirkland, QC, Canada Contrôles Laurentide Full time

    Description : RELIABILITY ENGINEER – VIBRATION SPECIALIST Come join the largest supplier of automation and reliability solutions in our region. Discover what we can offer you and be the voice that cultivates innovative ideas To Help Industry Thrive in Eastern Canada. Sounds Exciting? Join us As a Reliability Engineer, you will play a strategic...


  • Montréal, QC, Canada Stingray Full time

    Développeur.se logiciel SRE - Département IT Lieu: Montréal Chez Stingray, la créativité, la collaboration et la technologie innovante sont les piliers de notre ADN. Es-tu prêt.e à rocker ta carrière en rejoignant une entreprise en pleine croissance, une équipe de passionnés.es de musique dans un environnement de travail stimulant et amusant?...


  • Montréal, Canada Cadence Design Systems, Inc. Full time

    At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Protium Prototyping Platform is part of the Cadence Dynamic Duo that has been a huge success with our customers. With...


  • Québec, QC, Canada Tecsys Full time

    La version française suit ci-dessous Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, as of May 1, 2022, we became a remote-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our...


  • Montréal, QC, Canada Cadence Design Systems Full time

    At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Protium Prototyping Platform is part of the Cadence Dynamic Duo that has been a huge success with our customers. With Cadence®...


  • Montréal, QC, Canada AFRY Finland Full time

    ---------------------------------------------------------------------------- Provide electrical leadership as Lead electrical engineer on small to medium size projects in industrial sectors such as Pulp & Paper, Wastewater Treatment, Power Generation. Serve as AFRY’s electrical liaison with the Client. Attend engineering review meetings with AFRY’s...


  • Montréal, QC, Canada CADENCE CANADA Full time

    At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. The Cadence IP group develops integrated circuits in processes from 28nm down to the industry's most recent process nodes in different foundries, including planar, finfets, and gate-all-around transistors. As an ESD, Modeling, Reliability, and...


  • Montréal, QC, Canada AlayaCare Full time

    Director of Security Engineering and SRE AlayaCare AlayaCare is a revolutionary cloud-based home care software platform for agencies looking for innovation and efficiencies across the entire agency. AlayaCare is revolutionizing the way home health care is delivered. Our leading cloud -based software allows our clients around the world to manage their...


  • Montréal, QC, Canada Expleo Full time

    Assess the safety and reliability (PSSA / SSA, FMEA, FTA and CMA) of flight control systems and controllers (software and hardware) in accordance with ARP4754 / 61 practices. Defines, negotiates, and shares safety and reliability objectives and methods with the rest of the team, customers, and partners. Prepares, presents, and defends the safety and...