Copy of Principal Site Reliability Engineer

3 weeks ago


Ottawa ON, Canada Lightspeed Full time

Hi there Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place

We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.

What you’ll be doing:
  1. Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
  2. Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
  3. Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
  4. Drive incident management process and conduct post-mortem analysis to prevent future outages.
  5. Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
  6. Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
  7. Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
  8. Design and build robust, scalable, and highly available systems.
  9. Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery.
  10. Manage infrastructure change through infrastructure as code (IaC).
  11. Be part of our on-call rotation.
  12. Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
What you need to bring:

Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.

9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.

Strong expertise in container orchestration platforms, specifically Kubernetes.

Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).

Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.

Proficiency in programming languages such as Java, Python, Go, etc.

Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.

Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).

Strong understanding of security best practices.

Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.

Excellent communication skills to effectively collaborate with cross-functional teams.

Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.

We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the ‘Apply’ button and give it a try

What’s in it for you?

Come live the Lightspeed experience...

  • Ability to do your job in a truly flexible environment;
  • Genuine career opportunities in a company that’s creating new jobs every day;
  • Work in a team big enough for growth but lean enough to make a real impact.

... and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry:

  • Lightspeed share scheme (we are all owners)
  • Lightspeed RSU program (we are all owners)
  • Unlimited paid time off policy
  • Flexible working policy
  • Health insurance
  • Health and wellness benefits
  • Paid leave assistance for new parents
  • Linkedin learning
  • Volunteer day
#J-18808-Ljbffr

  • Ottawa, Canada Lightspeed Full time

    Hi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size...


  • Ottawa, Canada Lightspeed Full time

    Hi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds sof


  • Ottawa, ON, Canada Lightspeed Restaurant Full time

    Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their...


  • Ottawa, Canada Lightspeed Restaurant Full time

    Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place!We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business....


  • Toronto, ON, Canada Autodesk Full time

    Position Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to...


  • Ottawa, ON, Canada Lightspeed Commerce Full time

    Hi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and...


  • Toronto, ON, Canada Thomson Reuters Full time

    (Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management Organization Do you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure? The Site Reliability Engineer will analyze...


  • Ottawa, ON, Canada Barracuda Full time

    Job ID 24-628(2) Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey....


  • Ajax, ON, Canada Gradient IT Full time

    We are looking for a passionate Site Reliability Engineer with a deep-rooted foundation in DevSecOps and Open Source Technology. The engineer should be passionate about automation and building highly scalable and available services in the cloud. You will help lead a team of engineers to build tooling, automation, and support Spinnaker on behalf of our...


  • Toronto, ON, Canada OnX Canada Full time

    OnX is looking for a Site Reliability Engineer for one our clients in Toronto. Client: Financial ServicesLocation: Toronto, mostly remoteDuration: 6 months with potential extensionJBoss in middleware experience is super importantResponsibilities:Following the senior technicians plans to build out lower environments with functioning software stacks including...


  • Toronto, ON, Canada OnX Canada Full time

    OnX is looking for a Site Reliability Engineer for one our clients in Toronto. Client: Financial ServicesLocation: Toronto, mostly remoteDuration: 6 months with potential extensionJBoss in middleware experience is super importantResponsibilities:Following the senior technicians plans to build out lower environments with functioning software stacks including...


  • Toronto, ON, Canada Infotek Consulting Services Inc. Full time

    Infotek Consulting is searching for a Site Reliability Engineer - this is a remote opportunity with some travel involvedJob Description: Our EPM (Event and Performance Management) team is availability, performance and reliability management discipline that supports the optimization of the operational experience and behavior of a digital agent - human or...


  • Waterloo, ON, Canada Hamilton Barnes Associates Limited Full time

    Are you ready to revolutionize automation and reliability in a dynamic tech environment? You'll have the opportunity to join a cutting-edge Automation Development Services team as a Site Reliability Engineer (SRE) working on the 'Platform as a Service' toolset. If you're passionate about enterprise Linux systems, networking, and automation, this role is...


  • Toronto, ON, Canada OnX Canada Full time

    OnX is looking for a Site Reliability Engineer for one our clients in Toronto. Client: Financial Services Location: Toronto, mostly remote Duration: 6 months with potential extension JBoss in middleware experience is super important Responsibilities: Following the senior technicians plans to build out lower environments with functioning software...


  • Toronto, ON, Canada Hour Consulting Full time

    Our client, a fast growing Fintech Startup is on a mission to redefine how to protect user identity, providing users secure control over personal information through a privacy compliant network. This approach creates higher customer interaction and sales conversions, while improving overall security for both customers and businesses. They are a...


  • Ottawa, Canada Barracuda Full time

    Job ID 24-628(2) Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey....


  • Toronto, ON, Canada Nityo Infotech Full time

    Job Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource...


  • Ottawa, Canada Barracuda Full time

    Job ID 24-628(2)Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey. More...


  • Ottawa, Canada Barracuda Full time

    Job ID 24-628(2)Come Join Our Passionate Team! At Barracuda, we make the world a safer place. We believe every business deserves access to cloud-enabled, enterprise-grade security solutions that are easy to buy, deploy, and use. We protect email, networks, data and applications with innovative solutions that grow and adapt with our customers’ journey. More...


  • Mississauga, ON, Canada Mimecast Canada Limited Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply locations Canada - Mississauga - Remote time type Full time posted on Posted 4 Days Ago job requisition id R4613 Senior Site Reliability Engineer Help Build the Next Generation of Cloud-Scalable AI-Based Security Products Have a passion for software security? Excel...