Reliability Engineering Specialist
3 weeks ago
About the Role
">The ideal candidate will ensure the reliability, scalability, and performance of our product. This involves leading efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient infrastructure, minimize downtime, and enhance operational efficiency to support business goals.
Key Responsibilities
- Implement intelligent monitoring solutions to proactively monitor system health and performance metrics using dashboards and alerts.
- Analyse systems health and performance, identifying and mitigating incidents, participating in on-call rotations, and responding to incidents quickly and effectively.
- Design, implement, and manage resilient systems and architectures, conducting regular performance and reliability reviews to identify areas for improvement.
- Reduce manual operational tasks by building automation solutions using Shell scripting, PowerShell, or Python, developing solutions for scaling, monitoring/alerting, auto-healing, and automation, improving infrastructure scalability and reliability.
Requirements
- Bachelor's degree in computer science engineering or a related field.
- 2+ years of experience as a Site Reliability Engineer or similar role.
- Experience with cloud platforms and services (Azure, AWS, or GCP).
- Familiarity with container orchestration platforms (e.g., Kubernetes, Docker Swarm).
- Experience with monitoring/observability tools (e.g., Datadog, Prometheus, Grafana, Azure Monitor, Dynatrace).
- Experience working on Log analysis tools like Splunk or Azure Log analytics using Kusto queries.
- Experience with Infrastructure as Code (IaC) tools like Terraform, ARM, etc.
- Experience with provisioning or configuration management tools like Ansible.
- Experience working with SQL, PostgreSQL, etc.
- Proficiency in scripting languages (Shell, PowerShell, Python, Go) for automation.
- Message Brokers: Azure Service Bus, RabbitMQ.
- Experience with Git.
- Incident Management tools: PagerDuty, OpsGenie, or Jira, etc.
- Experience with Work management tools like Jira, Azure DevOps.
- Familiarity with cloud cost management tools and practices.
- Proficient in SRE best practices, including defining and maintaining SLIs, SLOs, and error budgets.
- Understanding of DevOps principles and practices.
- Experience working with Software Frameworks: .NET, .NET Core improving resiliency and observability.
Salary Range
$100,000 - $140,000 per annum, depending on experience.
Benefits
- Hybrid work arrangements.
- Fun downtown office on Queen St. West.
- Sabbatical Leave.
- Pet-friendly office.
- No meetings on Fridays.
- Paid volunteer days.
- Generous Vacation Time & Personal Days.
- Generous parental leave policy.
- Holiday shutdown.
- Group Retirement Savings Plan (with Employer matching).
- Culture Events (sporting events, concerts, happy hours, holiday parties, team outings).
- Incredible culture fostered by a highly collaborative and high-performing team.
- Professional development opportunities, working closely with the senior leadership team.
-
Reliability Engineering Specialist
2 weeks ago
Toronto, Ontario, Canada Disability Solutions Full timeJob SummaryWe are seeking an experienced Reliability Engineering Specialist to join our Information & Technology team at Disability Solutions. The successful candidate will be responsible for executing RAM analysis and engineering, collaborating with industry teams and regulatory agencies.Main Responsibilities:* Review and approve solution requirements for...
-
Manufacturing Reliability Specialist
3 weeks ago
Toronto, Ontario, Canada Major Recruitment Full time $72,000 - $110,000Unlock Your Potential as a Manufacturing Reliability SpecialistAre you passionate about ensuring the smooth operation of manufacturing facilities? Do you have what it takes to drive efficiency and reliability in our fast-paced industry?We are seeking a highly skilled Manufacturing Reliability Specialist to join our team at Major Recruitment. In this exciting...
-
Reliability Engineer
7 days ago
Toronto, Ontario, Canada Disability Solutions Full timeJob SummaryThe position of the Reliability Engineer is to execute reliability, availability, and maintainability (RAM) analysis and engineering in support of Information & Technology solutions. The primary objective is to ensure that these solutions have attributes of high robustness, reliability, and availability.
-
Reliability Engineer
4 weeks ago
Toronto, Ontario, Ontario, Canada Major Recruitment Full timeReliability Engineer***Must be Canadian Citizen or Permanent Resident requiring no sponsorship***My Client have a shared vision for greatness. We manufacture some of North America’s most popular tissue brands - Cashmere®, Purex®, Scotties®, SpongeTowels®, Bonterra®, White Cloud®, as well as products for use away from home.We are leaders in our...
-
Reliability and Availability Specialist
2 months ago
Toronto, Ontario, Canada Disability Solutions Full timeAbout the RoleThe Specialist, Reliability-Availability-Maintainability (RAM) is a critical position within our Information & Technology (I&T) team. This role is responsible for executing RAM analysis and engineering in support of I&T solutions. The primary objective is to ensure that these solutions have attributes of high robustness, reliability, and...
-
Reliability Systems Engineer
3 weeks ago
Toronto, Ontario, Canada Teranet Inc. Full timeAbout TeranetTeranet is a leading innovator in electronic services and solutions, operating one of the most advanced and secure registration systems worldwide.Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our DevOps team. The ideal candidate will possess strong software engineering principles and infrastructure expertise to...
-
Reliability Engineering Lead
3 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeJob SummaryRoyal Bank of Canada is seeking an experienced professional to lead our Site Reliability Engineering (SRE) efforts for our US Cash Management Technology. This is a unique opportunity to shape the future technology landscape of the company, delivering key business values and implementing strategic components across all RBC functions defined in our...
-
Senior Environmental Remediation Specialist
3 weeks ago
Toronto, Ontario, Canada Alexana Engineering Full timeWe are a small and dynamic environmental engineering firm based in Toronto, seeking an experienced Senior Environmental Remediation Specialist. Our team is dedicated to delivering high-quality services, and we are looking for a professional who can bring their expertise to our projects.About the RoleThis role offers a unique opportunity to work with a...
-
Site Reliability Engineering Leader
4 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeRoyal Bank of Canada is seeking a highly skilled Site Reliability Engineering (SRE) leader to join our team in Toronto, Canada. As an SRE leader, you will be responsible for leading the development and implementation of SRE solutions that improve the reliability and performance of our applications.The ideal candidate will have 5+ years of experience as a...
-
Senior Software Reliability Engineer
3 weeks ago
Toronto, Ontario, Canada henon Full timeAre you looking for a challenging role that combines DevOps and Site Reliability Engineering skills? Henon is seeking a highly skilled Senior Software Reliability Engineer to join our team.Job Summary:We are building a relationship-first, tech-enabled financial services company founded to help Private Equity firms grow. As a key member of our engineering...
-
Toronto, Ontario, Canada Capgemini Engineering Full timeOverview">At Capgemini Engineering, we are seeking a seasoned Senior Test Automation Specialist to join our team. This is an exceptional opportunity to leverage your expertise in designing and implementing comprehensive test automation strategies that drive quality and performance in software applications.">About the Role">We are looking for a highly skilled...
-
Site Reliability Engineer
4 weeks ago
Toronto, Ontario, Ontario, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Reliability Engineering Lead
4 weeks ago
Toronto, Ontario, Canada Compunnel Inc. Full timeAbout the Role:Compunnel Inc. is seeking an experienced Reliability Engineering Lead to join our team in Toronto, Canada. This role will be responsible for driving SRE and DevSecOps mindset and culture within the company.The ideal candidate will have a strong background in system reliability, observability, and automation. They will possess excellent...
-
Digital Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeJob Summary">We are seeking a highly motivated Technical Release Coordinator to join our Digital SRE Environment and Release team. This role offers the unique opportunity to work at the intersection of technology, reliability, and delivery, ensuring the smooth execution of technical projects that directly impact our digital infrastructure and release...
-
Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada PointsBet Canada Full timeAbout the RoleWe are seeking a highly skilled Reliability Engineer to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and performance of our product.As a key member of our engineering team, you will lead efforts in proactive monitoring, incident management, automation, and collaboration across teams to...
-
Reliability Engineer for Cloud Infrastructure
4 weeks ago
Toronto, Ontario, Canada Tecsys Inc. Full timeAbout Tecsys Inc.Tecsys Inc. is a fast-growing company offering supply chain solutions to industry leading healthcare systems, hospitals, and pharmacy businesses. We work with industry leaders to transform their supply chains through technology. If you thrive on tackling interesting challenges with continuous learning opportunities, then Tecsys Inc. could be...
-
Site Reliability Engineer
1 month ago
Toronto, Ontario, Canada Tecsys Inc. Full timeAbout the RoleWe are looking for an exceptional Site Reliability Engineer to join our Network and Security Operations Center team. As a key member of our team, you will be responsible for ensuring the reliability and uptime of our platform and applications.Key Responsibilities:Collaborate with Engineering teams to support services through system design...
-
Paper Machine Reliability Specialist
4 weeks ago
Toronto, Ontario, Canada Major Recruitment Full timeReliability Engineer - A Key Role in Our SuccessWe are a leading manufacturer of tissue brands and paper products, with a strong commitment to innovation and sustainability. Our headquarters is located in Mississauga, ON, and we have nearly 3,000 employees across North America.In this role, you will be responsible for managing our preventative and predictive...
-
Toronto, Ontario, Canada Teoresi Group Full timeAbout the RoleWe are seeking an experienced Reliability and Safety Engineer to join our team in Toronto, Canada or Pittsburgh, US. As a key member of our team, you will be responsible for applying reliability and safety principles and disciplines at both subsystem and vehicle levels.
-
Cloud Native Reliability Specialist
2 weeks ago
Toronto, Ontario, Canada Royal Bank of Canada Full timeAbout the RoleWe are seeking a highly skilled Cloud Native Reliability Specialist to join our team at Royal Bank of Canada. As a key member of our Digital team, you will play a crucial role in ensuring the health, security, and availability of our applications in production.