Principal Site Reliability Engineer
1 month ago
Hi there Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place
We’re looking for a Principal Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuOrder by Lightspeed builds software solutions that help merchants grow the size and profitability of their business. You'll join a team responsible for supporting the group in cross-cutting concerns, such as cloud infrastructure, reliability and incident management, data warehousing and analytics, cost transparency and efficiency, and much more.
You will also be supporting our growing Dev teams with the infrastructure and tools needed to continue scaling. You will build and support multi-region infrastructures and networks and help run our products in a reliable, efficient, and secure manner by implementing, advising, and advocating the well-known DevOps principles.
What you’ll be doing:- Work closely with development teams to empower them with the necessary tools and practices for monitoring software health in production, defining and measuring reliability metrics (SLI, SLO), and managing error budgets.
- Design, build, and maintain robust infrastructure built upon GCP, leveraging cloud native technologies such as GKE, Cloud SQL, BigQuery, etc.
- Develop and manage CI/CD pipelines for efficient deployment and release using a number of technologies (GitLab, GitHub, Helm, Terraform, etc.).
- Drive incident management process and conduct post-mortem analysis to prevent future outages.
- Mentor junior SREs and developers, providing guidance on best practices in cloud architecture, data management, and software development.
- Conduct system performance benchmarks and implement enhancements to improve system reliability and throughput.
- Collaborate with cross-functional teams to identify, design, and implement internal process improvements in a cost-efficient manner.
- Design and build robust, scalable, and highly available systems.
- Build platform solutions and apply software engineering principles to improve the reliability of our software and accelerate software delivery.
- Manage infrastructure change through infrastructure as code (IaC).
- Be part of our on-call rotation.
- Stay current with industry trends and emerging technologies, advocating for the adoption of new technologies and practices that improve product quality and team efficiency.
- Bachelor’s degree in Computer Science, Engineering, or possess a related level of real-world experience.
- 9-10+ years of experience across site reliability engineering, systems administration, and/or software engineering.
- Strong expertise in container orchestration platforms, specifically Kubernetes.
- Strong understanding of both relational (e.g., PostgreSQL, MySQL) and NoSQL databases (e.g., MongoDB, Cassandra, Redis).
- Deep understanding of network protocols and IP networking, as well as experience with network troubleshooting.
- Proficiency in programming languages such as Java, Python, Go, etc.
- Proven track record of managing large-scale infrastructure in cloud environments, such as Google Cloud, AWS, or Azure.
- Experience with monitoring tools (e.g., Prometheus, Grafana, Datadog) and logging solutions (e.g., ELK stack).
- Strong understanding of security best practices.
- Exceptional problem-solving skills and the ability to work under pressure to troubleshoot and resolve complex issues.
- Excellent communication skills to effectively collaborate with cross-functional teams.
- Strong leadership skills, capable of leading projects and influencing engineering decisions across the organization.
We know that people are more than what’s on their CV. If you’re unsure that you have the right profile for the role hit the ‘Apply’ button and give it a try
What’s in it for you?Come live the Lightspeed experience:
- Ability to do your job in a truly flexible environment.
- Genuine career opportunities in a company that’s creating new jobs every day.
- Work in a team big enough for growth but lean enough to make a real impact.
… and enjoy a range of benefits that’ll keep you happy, healthy, and (not) hungry:
- Lightspeed share scheme (we are all owners).
- Lightspeed RSU program (we are all owners).
- Unlimited paid time off policy.
- Flexible working policy.
- Health insurance.
- Health and wellness benefits.
- Paid leave assistance for new parents.
- LinkedIn learning.
- Volunteer day.
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Reperio Human Capital Full time```html Site Reliability Engineer 100421 Location: Ireland/UK Salary: €70K+ Type: Permanent, Full-time We're seeking experienced Site Reliability Engineers who excel at ensuring the reliability and scalability of production systems, and possess extensive experience with monitoring and automation tools. Responsibilities: Ensure the reliability,...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Ontario, CA CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and Confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
(Canada) Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Thomson Reuters Full time(Canada) Site Reliability Engineer (Contract) Contract (9 months 4 days) Published 3 days ago New Relic Data Dog Site Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA TD Bank Full timeSite Reliability EngineerSite Reliability EngineerWork Location: CanadaHours: 37.5Line of Business: Technology SolutionsPay Details: We’re committed to providing fair and equitable compensation to all our colleagues. As a candidate, we encourage you to have an open dialogue with a member of our HR Team and ask compensation related questions, including pay...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA eTeam Full timeRemote Work Duration 4 months - Preference is to find candidates who are willing to be converted to full-time employees. The conversion decision will be made based on performance. Job Description Role Description: Defining and measuring reliability goals—SLIs, SLOs, and error budgets for user journey. Designing for and implementing observability (ELK,...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Ontario, CA Rogers Part timeSite Reliability Engineer Are you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of...
-
Site Reliability Engineer
3 weeks ago
Old Toronto, Ontario, CA Rogers Communications, Inc. Part timeSite Reliability EngineerAre you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports,...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Vaco Full timeAbout the CompanyOur client operates global markets and builds digital communities and analytic solutions and is looking to hire a Site Reliability EngineerAbout the OpportunityStephen manages the infra group team, Windows, virtualization, IT infrastructure, etc. Works closely with Jeremy who is the hiring manager away for Pat leave. They are currently...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA The Voleon Group Full timeVoleon is a technology company that applies state-of-the-art machine learning techniques to real-world problems in finance. For more than 15 years, we have led our industry and worked at the frontier of applying machine learning to investment management. We have become a multi-billion-dollar asset manager, and we have ambitious goals for the future.Your...
-
Site Reliability Engineer
4 weeks ago
Old Toronto, Ontario, CA Rogers Communications Full timeAre you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports, news, e-commerce, and...
-
Site Reliability Engineer
3 weeks ago
Old Toronto, Ontario, CA Rogers Communications Full time```html Are you ready to take your career to new heights and be a part of a dynamic team at Rogers Sports & Media? We believe in creativity, innovation, and collaboration in everything we do, and we are looking for people who share this mindset to join us. With a monthly reach of 30 million Canadians, you can help shape the future of sports, news,...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Tecsys Inc. Full timeHaving recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Staff Site Reliability Engineer
4 weeks ago
Old Toronto, Ontario, CA Lightspeed Full timeHi there! Thanks for stopping by. Are you actively looking for a new opportunity? Or just checking the market? Well… you might just be in the right place! We’re looking for a Staff Site Reliability Engineer to join our NuOrder by Lightspeed team in North America. NuORDER by Lightspeed builds software solutions that help merchants grow the size and the...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Nityo Infotech Full time```html Job Responsibilities: Objectives of this Role: Run the IKP clusters by monitoring availability and taking a holistic view of system health. Build tools and automation to manage platform infrastructure and services. Improve reliability, quality, and time to upgrade cluster and service versions. Measure and optimize system performance and resource...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA PharmaLex Full timeYour Job SRE at Pharmalex is the software engineering approach to production operations. 50% of your time will be building software to automate the manual work you do during the other 50% of your time will be providing operational support to the products you cover. SRE operates critical products 24/7/365 operating within agreed SLOs. Out-of-hours support via...
-
Site Reliability Engineer in Toronto, Canada
4 weeks ago
Old Toronto, Ontario, CA United Software Group Inc. - Canada Full timePosition: Site Reliability Engineer Location: Toronto, Canada Duration: Contract Job Description: 3+ years of experience Advanced knowledge of the following SRE practices and technologies Python, YAML, Shell scripting Azure, Linux Dynatrace, Prometheus, PagerDuty, Moog, Splunk, Elastic, Azure monitor Chaos Engineering MQ, Kafka Perform production support...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Scotiabank Full timePress Tab to Move to Skip to Content Link Select how often (in days) to receive an alert: Requisition ID: 197089Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. The Team We are looking for a developer to join our Digital Engineering Operations. The ideal candidate is passionate about designing and...
-
Site Reliability Engineer III
1 month ago
Old Toronto, Ontario, CA Guidewire Full time```html ESSENTIAL DUTIES AND RESPONSIBILITIES Take a purist SRE approach to shared multi-tenant infrastructure for a resilient SaaS microservice-based containerized systems in addition to customer-centric application environments. Oversee and automate the team’s growing presence in AWS. Contribute to core infrastructure systems development with...
-
Site Reliability Engineer Toronto
1 month ago
Old Toronto, Ontario, CA Ascend Fundraising Solutions Full timeFounded in 2010, Ascend Fundraising Solutions provides online and in-venue fundraising platforms and solutions. Our innovative approach has been embraced by renowned non-profit organizations worldwide, including United Way, Vancouver Canucks Foundation, Canadian Olympic Foundation, Canadian Institute for the Blind, Kansas City Chiefs Foundation, Boston Red...
-
Site Reliability Engineer
1 month ago
Old Toronto, Ontario, CA Snaphunt Full time```html The Offer Great Opportunity The Job You will be responsible for: Gathering and evaluating user feedback. Providing code documentation and other inputs to technical documents. Supporting continuous improvement by investigating alternatives and new technologies and presenting these for architectural review. Troubleshooting and debugging to optimise...