Senior Site Reliability Expert
16 hours ago
Are you actively seeking a new opportunity, or simply exploring the market? Either way, you might have just found the right placeWe’re looking for a Senior SRE to join our Lightspeed Retail group in North America, a team responsible for multiple POS systems infrastructure and developer experiences. The team is at the helm of providing a stable, reliable and efficient system to our retailers.Our team is also dedicated to designing, building, and operating the infrastructure that powers Lightspeed Retail. This platform supports the entire software delivery lifecycle, from CI/CD pipelines to highly available and scalable production environments.NOTE: As a global company with employees and clients outside of Quebec, fluency in English as a working language is required for this position.What you’ll be responsible for:As a member of the Site Reliability Expert team:Being an active member of the Retail Platform team, where you will be responsible for the observability, scalability and reliability of the Retail Platform.Designing and implementing Kubernetes clusters for various use cases, ensuring scalability, reliability, and security.Configuring and managing Kubernetes clusters, including nodes, networking, and storage.Performing updates to multi-platform Kubernetes clusters in critical production environmentsAct as both a subject matter expert and an incident lead during the incident response processInitiate and contribute to continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product developmentObsess over reliability, help teams deliver reliable softwareAdhere to and advocate for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologiesProvide timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be on call for periods of time)What you’ll be bringing to the team:A passion for scalability, reliability and observability and a desire to share that passion with others in a positive, solutions-oriented wayComfortable with leading projects which require coordination and collaboration with other development teams to reach a common goalA desire to quickly grow your ability to champion process changes in the pursuit of the SRE mandateProven track record of driving optimization of cloud services, including, but not limited to data pipelines, storage, databases, caching layer, cores, memory, etcUnderstanding different types of SLAs/SLOs and different types of resource contracts, such as reserved instances and savings plans.Analytical mindset: live by the metrics, deeply understand data and use it to drive technical decisionsGood understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods and testingPrimary ownership of customer-facing, zero-downtime production environments using the following toolsets:Major cloud platforms (Amazon Web Services, Google Cloud Platform, Azure)CI/CD pipelines (CircleCI, Jenkins, Github, ArgoCD, Helm)Containers (Docker, Kubernetes, EKS, AKS, GKE & Linux Systems)Infrastructure as Code (Terraform)Programming or Scripting languages (Bash, Python, Ruby, Java, Golang, etc.)Who you are:You are a problem solver who does not shy away from tackling complexity and critical thinkingYou have a strong will to learn, grow and get out of your comfort zoneYou have great energy and passion for technologyYou can express yourself flawlessly in EnglishYou have strong interpersonal skillsYou are a team player and a bar raiserWhat’s in it for you: Join a growing team and help us move to the next levelAmazing benefits & perks, including equity for all LightspeedersConstant development of both your skill-set and business acumen with limitless growth opportunitiesLots of autonomy, flexible work cultureInnovation time to explore and learn at workShaping the company by joining cultural & technical committeesTons of growth opportunities into technical or people management rolesOpportunity to join a fast-paced, high-growth companyOpportunity to learn, expand your skill set, forge wonderful relationships and make your mark within the diverse and inclusive Lightspeed family, a true Canadian tech success story…. And enjoy a range of benefits that will keep you happy, healthy and (not) hungry.Lightspeed equity scheme (we are all owners).Flexible paid time off and remote work policies.Health insurance.Contributions to your pension plan - RRSP.Health and wellness benefit of $500 per year.Paid leave and assistance for new parents.Mental health online platform and counseling & coaching services.Training opportunities to grow your skills and careerVolunteer day.Fully stacked kitchen (hot and cold beverages, meals served)Happy hours to build your relationships with colleagues after work
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeApplication Support SME – SRE & Regulatory Financial Systems Location: Montreal (day 1 onboarding onsite / in‑office presence required 3×/week) Duration: 12+ months (extendable contract) Years of experience: 0–2 (open to new grads) We are seeking an Application Support SME (Subject Matter Expert) in Site Reliability Engineering and regulatory...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeApplication Support SME – SRE & Regulatory Financial Systems Location: Montreal (day 1 onboarding onsite / in‑office presence required 3×/week) Duration: 12+ months (extendable contract) Years of experience: 0–2 (open to new grads) We are seeking an Application Support SME (Subject Matter Expert) in Site Reliability Engineering and regulatory...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeApplication Support SME – SRE & Regulatory Financial Systems Location: Montreal (day 1 onboarding onsite / in‑office presence required 3×/week) Duration: 12+ months (extendable contract) Years of experience: 0–2 (open to new grads) We are seeking an Application Support SME (Subject Matter Expert) in Site Reliability Engineering and regulatory...
-
Site Reliability Engineer
7 days ago
Montreal, Canada ApTask Full timeDirect message the job poster from ApTask Looking for an intermediate between 2 to 5 years' experience. The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services clients ServiceNow SaaS implementation. Reporting to a Site Reliability...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada Bounce Full timeGet AI-powered advice on this job and more exclusive features. About Bounce Bounce is the new social way to pay; a smart and fun alternative to traditional money transfers. Canada’s first Venmo-style app, rail-agnostic, and designed to go viral (Hey, Bounce me!). Whether you're requesting for rent, sending money for concert tickets, or splitting trip group...
-
Site Reliability Engineer
7 days ago
Montreal, Canada Botpress Full time3 weeks ago Be among the first 25 applicants Help bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada TMC Canada Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for the ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada TMC Canada Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for the ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada TMC Canada Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for the ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role...
-
Senior Site Reliability Engineer
4 weeks ago
Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Victoria, Surrey, Halton Hills, London, Regina, Markham, Brampton, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada Orion Innovation Full timeWe are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go, coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...