Manager, Site Reliability Engineering and DevOps

3 weeks ago


Ottawa ON, Canada Lightspeed Commerce Full time

Hi there Thanks for stopping by
Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place
We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.
What you’ll be doing:
Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Gihub, Helm, Terraform, etc.).
Drive incident management process and conduct post-mortem analysis to prevent future outages.
Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
Design and build robust, scalable, and highly available systems.
Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery
Manage infrastructure change through infrastructure as code (IaC)
Be part of our on-call rotation.
Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
What you need to bring:
~ Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.
~9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.
~ Strong expertise in container orchestration platforms, specifically Kubernetes.
~ Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
~ Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
~ Proficiency in programming languages such as Java, Python, Go, etc.
~ Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure.
~ Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
~ Strong understanding of security best practices.
~ Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
~ Excellent communication skills to effectively collaborate with cross-functional teams.
~ Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.

We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the ‘Apply’ button and give it a try
What’s in it for you?
Come live the Lightspeed experience...
Ability to do your job in a truly flexible environment;
Genuine career opportunities in a company that’s creating new jobs everyday;
Work in a team big enough for growth but lean enough to make a real impact.
… and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry:
Lightspeed share scheme (we are all owners)
Lightspeed RSU program (we are all owners)
Unlimited paid time off policy
Flexible working policy
Health insurance
Health and wellness benefits
Paid leave assistance for new parents
Linkedin learning
Volunteer day
#J-18808-Ljbffr



  • Toronto, ON, Canada Paymentus Full time

    Summary Paymentus leads the North American marketplace in electronic bill payment solutions and is looking for high performers to join our development team building SaaS Fintech solutions across a range of industries. You will contribute to a massively scalable data platform, that is built on top of a world class enterprise platform, supporting thousands of...


  • Waterloo, ON, Canada Hamilton Barnes Associates Limited Full time

    Are you ready to revolutionize automation and reliability in a dynamic tech environment? You'll have the opportunity to join a cutting-edge Automation Development Services team as a Site Reliability Engineer (SRE) working on the 'Platform as a Service' toolset. If you're passionate about enterprise Linux systems, networking, and automation, this role is...


  • Ajax, ON, Canada Gradient IT Full time

    We are looking for a passionate Site Reliability Engineer with a deep-rooted foundation in DevSecOps and Open Source Technology. The engineer should be passionate about automation and building highly scalable and available services in the cloud. You will help lead a team of engineers to build tooling, automation, and support Spinnaker on behalf of our...


  • Ottawa, ON, Canada Lightspeed Restaurant Full time

    Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their...


  • Mississauga, ON, Canada Mimecast Canada Limited Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply locations Canada - Mississauga - Remote time type Full time posted on Posted 4 Days Ago job requisition id R4613 Senior Site Reliability Engineer Help Build the Next Generation of Cloud-Scalable AI-Based Security Products Have a passion for software security? Excel...


  • Toronto, ON, Canada Hour Consulting Full time

    Our client, a fast growing Fintech Startup is on a mission to redefine how to protect user identity, providing users secure control over personal information through a privacy compliant network. This approach creates higher customer interaction and sales conversions, while improving overall security for both customers and businesses. They are a...


  • Toronto, ON, Canada Behavox Full time

    Behavox is shaping the future for how businesses harness their most important raw material - data. Organize enterprise data into actionable information that protects and promotes the business growth of multinational companies around the world. From managing enterprise risk and compliance to maximizing revenue and value, our data operating platform presents...


  • Ottawa, Canada Lightspeed Full time

    Hi there! Thanks for stopping by Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size...


  • Mississauga, ON, Canada Mimecast Full time

    Senior Site Reliability Engineer Help Build the Next Generation of Cloud-Scalable AI-Based Security Products Have a passion for software security? Excel at implementing public cloud at scale? Desire to apply Machine Learning to solve complex problems? This may well be the role for you. Our Communication and Collaboration Security products are cutting-edge...


  • Mississauga, ON, Canada Mimecast Canada Limited Full time

    Senior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Apply locations Canada - Mississauga - Remote time type Full time posted on Posted 4 Days Ago job requisition id R4613 Senior Site Reliability Engineer Help Build the Next Generation of Cloud-Scalable AI-Based Security Products Have a passion for software security? Excel...


  • Ottawa, Canada Lightspeed Restaurant Full time

    Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place!We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business....


  • Toronto, ON, Canada Paymentus Full time

    Summary Paymentus leads the North American marketplace in electronic bill payment solutions and is looking for high performers to join our development team building SaaS Fintech solutions across a range of industries. You will contribute to a massively scalable data platform, that is built on top of a world class enterprise platform, supporting thousands of...


  • Toronto, ON, Canada Paymentus Full time

    Summary Paymentus leads the North American marketplace in electronic bill payment solutions and is looking for high performers to join our development team building SaaS Fintech solutions across a range of industries. You will contribute to a massively scalable data platform, that is built on top of a world class enterprise platform, supporting thousands of...


  • Toronto, ON, Canada Tata Consultancy Services Full time

    TCS has been recognized as a Global Top Employer by the Top Employers Institute - one of only eight companies worldwide to have achieved this status. Our organizational structure is domain-led and designed to offer businesses a single window into industry-specific solutions. Our agile industry units have embedded capabilities to enable rapid responses that...


  • Toronto, ON, Canada Tata Consultancy Services Full time

    TCS has been recognized as a Global Top Employer by the Top Employers Institute - one of only eight companies worldwide to have achieved this status. Our organizational structure is domain-led and designed to offer businesses a single window into industry-specific solutions. Our agile industry units have embedded capabilities to enable rapid responses that...


  • Toronto, ON, Canada Collage HR Full time

    Wise Publishing, Inc. is a digital publisher and technology company but we’re much more than that; we’re a group of talented, passionate people who believe that consumers deserve the best possible information to help them make smart choices and get ahead. Our purpose is to empower everyone to live a richer life. Our core products are our widely read,...


  • Toronto, ON, Canada EQ Bank | Equitable Bank Full time

    Being a traditional bank just isn’t our thing. We are big believers in innovating the banking experience because we believe Canadians deserve better options, and we challenge ourselves and our teams to creatively transform what’s possible in banking. Our team is made up of inquisitive and agile minds that find smarter ways of doing things. Overall we...


  • Waterloo, ON, Canada Hamilton Barnes Associates Limited Full time

    Are you ready to revolutionize automation and reliability in a dynamic tech environment? You'll have the opportunity to join a cutting-edge Automation Development Services team as a Site Reliability Engineer (SRE) working on the 'Platform as a Service' toolset. If you're passionate about enterprise Linux systems, networking, and automation,...


  • Ottawa, ON, Canada Axiad Ids, Inc. Full time

    Axiad is looking for an experienced Senior DevOps Engineer to join our Cloud team. Axiad is looking for an experienced Senior DevOps Engineer to join our Cloud team. Do you have a passion for Continuous Integration (CI) and Delivery (CD) and cloud first applications leveraging cloud-agnostic technology that runs on cloud platforms (AWS, GCP, Azure)? Do...

  • DevOps SRE Manager

    7 days ago


    Toronto, ON, Canada Actionstep Full time

    Actionstep is a pioneer in the development and sale of software-as-a-service (SaaS) products, specializing in the delivery of Legal Practice Management software. We are a fast growing, dynamic business with a global customer base and team.  Headquartered in Auckland, New Zealand, with team members in the United Kingdom, United States, Canada and Australia,...