Principal Site Reliability Engineer
2 weeks ago
Hi there Thanks for stopping by
Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place
We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.
What you’ll be doing:
- Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
- Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
- Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
- Drive incident management process and conduct post-mortem analysis to prevent future outages.
- Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
- Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
- Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
- Design and build robust, scalable, and highly available systems.
- Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery
- Manage infrastructure change through infrastructure as code (IaC)
- Be part of our on-call rotation.
- Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
What you need to bring:
- Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.
- 9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.
- Strong expertise in container orchestration platforms, specifically Kubernetes.
- Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
- Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
- Proficiency in programming languages such as Java, Python, Go, etc.
- Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
- Strong understanding of security best practices.
- Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
- Excellent communication skills to effectively collaborate with cross-functional teams.
- Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.
We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the ‘Apply’ button and give it a try
What’s in it for you?
Come live the Lightspeed experience...
- Ability to do your job in a truly flexible environment;
- Genuine career opportunities in a company that’s creating new jobs everyday;
- Work in a team big enough for growth but lean enough to make a real impact.
… and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry:
- Lightspeed share scheme (we are all owners)
- Lightspeed RSU program (we are all owners)
- Unlimited paid time off policy
- Flexible working policy
- Health insurance
- Health and wellness benefits
- Paid leave assistance for new parents
- Linkedin learning
- Volunteer day
-
Oracle: Principal Site Reliability Engineer
2 weeks ago
Montréal, QC, Canada Lightspeed Full timeWe’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure,...
-
Site Reliability Engineering
3 weeks ago
Montréal, QC, Canada Noverka Conseil Full timeAt Noverka, our values illustrate who we are and define our beliefs: Human, Transparent, Passionate. We are driven by innovation and success, both in our relationships and in our practices. Finding the right job for the right person is what we do best! Our client, an organization in the banking industry is looking for a Site Reliability Engineering (SRE)...
-
Montréal, QC, Canada CGI Full timePosition Description: CGI is a dynamic and innovative technology firm committed to delivering cutting-edge solutions. We are currently seeking a highly skilled and motivated individual to join our team as a FinOps and Site Reliability Engineer (SRE). This role is pivotal in bridging our finance and technology teams to ensure the successful implementation...
-
Site Reliability Engineer 3
2 weeks ago
Montréal, QC, Canada Behavox Full timeAbout Behavox Behavox is shaping the future for how businesses harness their most important raw material - data. Our mission is bold: Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. From managing enterprise risk and compliance to maximizing revenue and value,...
-
Senior Site Reliability Engineer/DevOps
4 weeks ago
Montréal, QC, Canada Synechron Full timeNous sommes Synechron est un cabinet de conseil leader mondial en transformation numérique, axé sur les services financiers et les organisations technologiques. Nos spécialités incluent l'intelligence artificielle de bout en bout, le conseil, le numérique, le cloud & DevOps, les données et l'ingénierie logicielle. Notre client dans le domaine de la...
-
Site Reliability Engineer
4 weeks ago
Montréal, QC, Canada OVHcloud Full timeDépartement Nous recherchons un.e Ingénieur.e fiabilité de site / DevOps pour notre département TI, technologie & Produits qui conçoit et développe les produits, les services, les infrastructures qui construisent ensemble l’avenir d’OVHcloud. Responsabilités Concevoir et développer des modules fonctionnels intégrés tout en veillant à leur...
-
Site Reliability Engineer
3 weeks ago
Montréal, QC, Canada SAP Full timeWe help the world run better Our company culture is focused on helping our employees enable innovation by building breakthroughs together. How? We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and future-focused work. We offer a highly...
-
Montréal, QC, Canada Haven Interactive Studios Full timeEn mai 2021, nous sommes lancés dans la création de Haven Studios avec une petite équipe et de grandes ambitions. Notre objectif était de construire un studio où nous pourrions créer le genre de jeux que nous avons toujours voulu créer - et auxquels nous avons toujours rêvé de jouer ! Nous faisons désormais partie de la famille PlayStations...
-
Site Reliability Expert
4 weeks ago
Montréal, QC, Canada Sony Interactive Entertainment Inc. Full time//FRENCH FOLLOWS// In May 2021, we embarked on a journey to start Haven Studios with a small team and big ambitions. Our goal was to build a studio where we could make the kind of games we’ve always wanted to create – and games we’ve longed to play. We’ve made amazing progress in a short time thanks to our talented, passionate team and their...
-
Senior Site Reliability Expert
3 weeks ago
Montréal, QC, Canada Lightspeed Restaurant Full timeData is the new Gold!!! We are here to help our data teams build and maintain the data and AI infrastructure platform, and the governance framework needed for having data flowing everywhere at Lightspeed. Data security, reliability, and high availability are our mojo. Role : Collaborate seamlessly with cross-functional data teams to craft and deploy...
-
Site Reliability Expert
3 weeks ago
Montréal, QC, Canada Sony Playstation Full timeWhy PlayStation? PlayStation isn’t just the Best Place to Play — it’s also the Best Place to Work. Today, we’re recognized as a global leader in entertainment producing The PlayStation family of products and services including PlayStation5, PlayStation4, PlayStationVR, PlayStationPlus, acclaimed PlayStation software titles from PlayStation Studios,...
-
Reliability Engineer
2 weeks ago
Kirkland, QC, Canada Contrôles Laurentide Full timeDescription : RELIABILITY ENGINEER – VIBRATION SPECIALIST Come join the largest supplier of automation and reliability solutions in our region. Discover what we can offer you and be the voice that cultivates innovative ideas To Help Industry Thrive in Eastern Canada. Sounds Exciting? Join us As a Reliability Engineer, you will play a strategic...
-
Emploi: Développeur.se logiciel
6 days ago
Montréal, QC, Canada Stingray Full timeDéveloppeur.se logiciel SRE - Département IT Lieu: Montréal Chez Stingray, la créativité, la collaboration et la technologie innovante sont les piliers de notre ADN. Es-tu prêt.e à rocker ta carrière en rejoignant une entreprise en pleine croissance, une équipe de passionnés.es de musique dans un environnement de travail stimulant et amusant?...
-
Principal C++ Software Engineer
1 week ago
Montréal, Canada Cadence Design Systems, Inc. Full timeAt Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Protium Prototyping Platform is part of the Cadence Dynamic Duo that has been a huge success with our customers. With...
-
Québec, QC, Canada Tecsys Full timeLa version française suit ci-dessous Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, as of May 1, 2022, we became a remote-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our...
-
Principal C++ Software Engineer
4 weeks ago
Montréal, QC, Canada Cadence Design Systems Full timeAt Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. At Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. Protium Prototyping Platform is part of the Cadence Dynamic Duo that has been a huge success with our customers. With Cadence®...
-
Montréal, QC, Canada AFRY Finland Full time---------------------------------------------------------------------------- Provide electrical leadership as Lead electrical engineer on small to medium size projects in industrial sectors such as Pulp & Paper, Wastewater Treatment, Power Generation. Serve as AFRY’s electrical liaison with the Client. Attend engineering review meetings with AFRY’s...
-
Montréal, QC, Canada CADENCE CANADA Full timeAt Cadence, we hire and develop leaders and innovators who want to make an impact on the world of technology. The Cadence IP group develops integrated circuits in processes from 28nm down to the industry's most recent process nodes in different foundries, including planar, finfets, and gate-all-around transistors. As an ESD, Modeling, Reliability, and...
-
Director of Security Engineering
4 weeks ago
Montréal, QC, Canada AlayaCare Full timeDirector of Security Engineering and SRE AlayaCare AlayaCare is a revolutionary cloud-based home care software platform for agencies looking for innovation and efficiencies across the entire agency. AlayaCare is revolutionizing the way home health care is delivered. Our leading cloud -based software allows our clients around the world to manage their...
-
Safety Systems Engineer
1 week ago
Montréal, QC, Canada Expleo Full timeAssess the safety and reliability (PSSA / SSA, FMEA, FTA and CMA) of flight control systems and controllers (software and hardware) in accordance with ARP4754 / 61 practices. Defines, negotiates, and shares safety and reliability objectives and methods with the rest of the team, customers, and partners. Prepares, presents, and defends the safety and...