Principal Site Reliability Engineer
3 weeks ago
Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place
We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America.NuORDER by Lightspeed builds software solutions that help merchants grow the size and the profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more. You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks, and help run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.
What you’ll be doing:- Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
- Design, build and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
- Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, Github, Helm, Terraform, etc.).
- Drive incident management process and conduct post-mortem analysis to prevent future outages.
- Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
- Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
- Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
- Design and build robust, scalable, and highly available systems.
- Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery.
- Manage infrastructure change through infrastructure as code (IaC).
- Be part of our on-call rotation.
- Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience. 9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering. Strong expertise in container orchestration platforms, specifically Kubernetes. Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis). Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
Proficiency in programming languages such as Java, Python, Go, etc. Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS or Azure. Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack). Strong understanding of security best practices. Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues. Excellent communication skills to effectively collaborate with cross-functional teams. Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.
We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role... hit the ‘Apply’ button and give it a try
What’s in it for you?Come live the Lightspeed experience... Ability to do your job in a truly flexible environment; Genuine career opportunities in a company that’s creating new jobs everyday; Work in a team big enough for growth but lean enough to make a real impact.
… and enjoy a range of benefits that’ll keep you happy, healthy and (not) hungry: Lightspeed share scheme (we are all owners) Lightspeed RSU program (we are all owners) Unlimited paid time off policy Flexible working policy Health insurance Health and wellness benefits Paid leave assistance for new parents Linkedin learning Volunteer day
#J-18808-Ljbffr-
Site Reliability Engineer
2 weeks ago
Old Toronto, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to Recovery...
-
Site Reliability Engineer
2 weeks ago
Old Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
(Canada) Site Reliability Engineer
1 month ago
Old Toronto, Canada Thomson Reuters Full time(Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will...
-
Site Reliability Engineer
3 weeks ago
Old Toronto, Canada eTeam Full timeRemote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...
-
Site Reliability Engineer
4 days ago
Toronto, ON, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. In this role, you will help build trusted services of APS (Autodesk Platform Services) measured by Service Level Objectives (SLOs) and Mean Time to...
-
Site Reliability Engineer
1 month ago
Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
Site Reliability Engineer
4 weeks ago
Toronto, Canada Autodesk Full timePosition Overview Autodesk, the leading Design and Make Software Company, is looking for a Principal Site Reliability Engineer to join the Autodesk Platform Services Engineering team in Toronto, Canada. On this position, you will help build trusted services of APS (Autodesk Platform Services) as measured by Service Level Objectives (SLOs) and Mean...
-
Senior Site Reliability Engineer
1 day ago
Old Toronto, Canada Lloyds Banking Group Full timeJob Description - Senior Site Reliability EngineerJOB TITLE: Senior Site Reliability Engineer (SRE)LOCATION: Halifax, Leeds or ManchesterHOURS: Full-timeWORKING PATTERN: Our work style is hybrid, which involves spending at least two days per week, or 40% of our time, at one of our office sites.Who are Lloyds Banking Group and where does this role sit?If you...
-
Site Reliability Engineer
19 hours ago
Old Toronto, Canada Hour Consulting Full timeOur client, a fast growing Fintech Startup is on a mission to redefine how to protect user identity, providing users secure control over personal information through a privacy compliant network. Their enterprise platform is comprised of three key pillars: strong authentication, user privacy and identity, and uses a combination of biometrics and...
-
Site Reliability Engineer
19 hours ago
Old Toronto, Canada Scotiabank Full timePress Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Select how often (in days) to receive an alert: Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance. Title: Site Reliability Engineer Requisition ID:...
-
Senior Site Reliability Engineer
19 hours ago
Old Toronto, Canada Manulife Insurance Malaysia Full timeSenior Site Reliability Engineer page is loaded Senior Site Reliability Engineer Postuler locations Waterloo, Ontario Toronto, siège social mondial (200 Bloor) time type Temps plein posted on Publié hier job requisition id JR24020202 Nous sommes un fournisseur de services financiers qui s’emploie à faciliter les...
-
Site Reliability Engineer
1 month ago
Old Toronto, Canada Nityo Infotech Full timeJob Responsibilities: Objectives of this Role Run the IKP clusters by monitoring availability and taking a holistic view of system health Build tools and automation to manage platform infrastructure and services Improve reliability, quality, and time to upgrade cluster and service versions Measure and optimize system performance and resource utilization,...
-
Site Reliability Engineer
19 hours ago
Old Toronto, Canada The Voleon Group Full timeVoleon is a technology company that applies state-of-the-art machine learning techniques to real-world problems in finance. For more than 15 years, we have led our industry and worked at the frontier of applying machine learning to investment management. We have become a multi-billion-dollar asset manager, and we have ambitious goals for the future. ...
-
Senior Site Reliability Engineer
19 hours ago
Old Toronto, Canada Practice Better Full timeAbout us:Practice Better is a leading all-in-one practice management software solution transforming how health & wellness professionals run their practices and support their clients. The company serves 15,000+ customers in over 70+ countries across the globe, and processes hundreds of millions annually in payments on behalf of customers. Over 65% of growth...
-
Site Reliability Engineer
2 days ago
Toronto, Canada Infotek Consulting Services Inc. Full timeInfotek Consulting is searching for a Site Reliability Engineer - this is a remote opportunity with some travel involved Job Description: Our EPM (Event and Performance Management) team is availability, performance and reliability management discipline that supports the optimization of the operati
-
Site Reliability Engineer
3 weeks ago
Toronto, Ontario, Canada Zortech Solutions Full timeHi,Hope you are doing GreatThis side Priya Rajput from Zortech Solutions trying to reach you for an exciting job opening, kindly have a look to job description and revert me with your positive feedback. My mail ID is or call me on .Role: Site Reliability EngineerLocation: Toronto, ON-OnsiteDuration: Fulltime PermanentSkills and Responsibilities:...
-
Site Reliability Engineer
5 days ago
toronto, Canada OnX Canada Full timeOnX is looking for a Site Reliability Engineer for one our clients in Toronto. Client: Financial Services Location: Toronto, mostly remote Duration: 6 months with potential extension JBoss in middleware experience is super important Responsibilities: Following the senior technicians plans to buil
-
Senior Site Reliability Engineer
19 hours ago
Old Toronto, Canada Sentry Full timeBad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology.With more than $217 million in funding and 90,000 organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...
-
Senior Site Reliability Engineer
2 weeks ago
Old Toronto, Canada Zendesk Full timeJob Description Zendesk is a service-first CRM company that builds powerful, customizable software designed to improve customer relations. At Zendesk, we encourage growth, innovation, and believe in giving back to the communities we call home. The ideal candidate will want to join a growing team. You have recent experience with full-stack cloud native...
-
Senior Site Reliability Engineer
1 week ago
Old Toronto, Canada Zendesk Full timeJob Description Zendesk is a service-first CRM company that builds powerful, customizable software designed to improve customer relations. At Zendesk, we encourage growth, innovation, and believe in giving back to the communities we call home. The ideal candidate will want to join a growing team. You have recent experience with full-stack cloud native...