Reliability Systems Engineer
1 week ago
Job Summary
This role is responsible for ensuring the reliability, scalability, and performance of our systems. As a key member of our team, you will play a crucial part in building and implementing tools, automation, and best practices to enhance the stability of our infrastructure, streamline deployment pipelines, and reduce manual intervention in operations.
Key Responsibilities:
- Reliability and Performance: Proactively monitor, troubleshoot, and resolve issues in production and non-production environments to ensure maximum uptime and optimal performance.
- Automation: Develop scripts and tools to automate repetitive tasks, streamline deployment processes, and manage infrastructure-as-code.
- Monitoring and Incident Management: Set up and refine monitoring systems, define alert thresholds, and manage incident response processes to resolve critical issues efficiently.
- Scalability: Collaborate with development and infrastructure teams to design and implement scalable solutions that meet current and future demand.
- CI/CD & Deployment: Optimize and support our CI/CD pipelines to streamline build, test, and deployment processes.
- Security & Compliance: Partner with security teams to implement compliance policies, improve system security, and adhere to regulatory standards.
- Continuous Improvement: Conduct post-incident reviews, implement lessons learned, and recommend changes to increase system resilience and reliability.
Requirements:
- Experience: 3+ years in an SRE, DevOps, or related role in a high-availability environment.
- Technical Skills: Proficiency in programming/scripting (Python, Bash, or similar) and experience with infrastructure automation tools (e.g., Ansible, Terraform).
- Cloud Expertise: Experience with cloud platforms (AWS, Azure, GCP) and their reliability services (e.g., CloudWatch, Datadog, or Prometheus).
- Systems Knowledge: Strong understanding of Linux/Unix systems, networking, and containerization (Docker, Kubernetes).
- Monitoring & Observability: Familiarity with monitoring and observability tools to maintain and troubleshoot system health (e.g., Grafana, Splunk).
- CI/CD Experience: Hands-on experience with CI/CD pipelines and tools (Jenkins, GitLab CI, etc.).
- Soft Skills: Strong problem-solving skills, ability to work cross-functionally, and a proactive, detail-oriented approach to tasks.
Estimated Salary Range: $120,000 - $180,000 per year, based on location and industry standards.
Benefits: We offer a comprehensive benefits package, including medical, dental, and vision coverage, as well as retirement savings options and paid time off.
About Us: The Royal Bank of Canada is a leading financial institution dedicated to helping our clients thrive and communities prosper. Our team is passionate about delivering exceptional service and innovative solutions to our customers.
-
Reliability Systems Engineer
1 month ago
Old Toronto, Ontario, Canada Chelsea Avondale Full timeJob Title: Asset Reliability EngineerAt Chelsea Avondale, we're pushing the boundaries of home insurance innovation. Our team of experts has developed cutting-edge risk modeling and insurance pricing technologies, which we deploy through our own insurance company.We're a group of talented individuals from diverse backgrounds, including insurance, software...
-
Reliability Systems Engineer
2 weeks ago
Toronto, Ontario, Canada Lorven Technologies Full timeJob Title : Reliability Systems Engineer Location : Remote Duration : Long term A Bachelor's degree in Computer Science or related technical field, or equivalent practical experience. Advanced knowledge of SRE practices and technologies including Azure, Linux, and scripting languages. Expertise in various SRE tools such as Ansible, Azure Automation,...
-
Systems Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Scotiabank Full timeAs a key member of our team at Scotiabank, you will play a critical role in ensuring the reliability and performance of our production systems.Key Responsibilities:Contribute to in-depth data analysis to gauge service trends and drive improvements to production systems.Collaborate closely with SREs, Development, and Operations teams to assist in...
-
Reliability Engineer
1 month ago
Toronto, Ontario, Canada Scotiabank Full timeAbout the Role:We are seeking a highly skilled Systems Reliability Engineer to join our team at Scotiabank. As a key member of our Systems Reliability Office, you will be responsible for ensuring the stability and reliability of our technology portfolio.Key Responsibilities:Champion a customer-focused culture to deepen client relationships and leverage...
-
Senior Systems Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Vantage Full timeAbout the Role:We are seeking a highly skilled Senior Site Reliability Engineer to join our team at Vantage. As a key member of our engineering team, you will play a pivotal role in ensuring the seamless operation of our large-scale, distributed systems. Your expertise in software and systems engineering will be instrumental in building, maintaining, and...
-
Senior System Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Interac Corp. Full timeSenior System Reliability EngineerWe are seeking a skilled Senior System Reliability Engineer to join our team at Interac Corp. in Canada.About the Role:This is an exciting opportunity to work on high-performance payment systems, focusing on Site (Application) Reliability Engineering activities, including proactive monitoring, responding to alerts and...
-
Senior Systems Engineer
1 month ago
Toronto, Ontario, Canada Safran Landing Systems Full timeJob Description Assist in the development and certification of the landing gear system, including hydro-mechanical, electrical, and control systems designed per software and complex hardware (DO-178/DO-254). Liaise with customers and airworthiness authorities on matters pertaining to certification and system development. Define requirements applicable to the...
-
Senior Systems Engineer
4 weeks ago
Toronto, Ontario, Canada Safran Landing Systems Full timeJob DescriptionAs a Senior Systems Engineer at Safran Landing Systems, you will play a key role in the development and certification of the Landing Gear System. This includes working on hydro-mechanical, electrical, and control systems designed per Software and Complex Hardware (DO-178/DO-254). You will liaise with customers and airworthiness authorities on...
-
Site Reliability Engineer
1 month ago
Toronto, Ontario, Canada The Toronto-Dominion Bank (Canada) Full timeJob SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at The Toronto-Dominion Bank (Canada). As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.Key ResponsibilitiesProvide technical leadership and expertise in designing and...
-
Reliability Engineering Specialist
2 weeks ago
Toronto, Ontario, Canada Criteo Full timeAbout the Role:This is a challenging opportunity for an experienced engineer to join Criteo's PRE team as a Site Reliability Engineer. The role involves working closely with product engineering to improve the reliability of our apps, systems, and pipelines, assessing where optimization is needed most, and telling stories with meaningful monitoring.Key...
-
Reliability Systems Specialist
4 weeks ago
Toronto, Ontario, Canada Lorven Technologies Full timeJob Title : Reliability Systems SpecialistLocation : RemoteDuration : Long termA Bachelor's degree in Computer Science or related technical field, or equivalent practical experience.Advanced knowledge of reliability engineering practices and technologies.Hands-on experience in reliability tools (Ansible, Azure Automation, Catchpoint).Azure, Linux.Dynatrace,...
-
Reliable Systems Specialist
1 week ago
Toronto, Ontario, Canada Flinks Full timeAbout Flinks: A Pioneering Force in Financial Data ManagementFlinks is at the forefront of open banking and financial data management, empowering consumers to take control of their financial lives. Our mission is to unlock the full potential of financial data, enabling innovative solutions for fintechs and banks.As a leading provider of data infrastructure,...
-
Reliability Engineering Expert
2 weeks ago
Toronto, Ontario, Canada Criteo Full timeAbout the Role:Criteo is seeking a talented Site Reliability Engineer to join our PRE team.What You'll Do: As a Site Reliability Engineer, you'll work closely with product engineering to improve the reliability of our apps, systems, and pipelines. You'll assess where optimization is needed most and tell stories with meaningful monitoring.How You'll Make an...
-
System Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Scotiabank Full timeAbout the Role:We are seeking a highly skilled System Reliability Engineer to join our team at Scotiabank. As a key member of our Systems Reliability Office, you will be responsible for ensuring the stability and reliability of our technology portfolio.Key Responsibilities:Champion a customer-focused culture to deepen client relationships and leverage...
-
Senior Reliability Engineer
1 month ago
Toronto, Ontario, Canada Metrolinx Full timeJob Title: Senior Reliability EngineerJob Summary:Metrolinx is a leading transportation agency in the Greater Golden Horseshoe region, operating GO Transit, UP Express, and the PRESTO fare payment system. We are committed to providing reliable and efficient transportation services to our customers. As a Senior Reliability Engineer, you will play a critical...
-
Senior Site Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Criteo Full timeAbout the Role:We are seeking a skilled Senior Site Reliability Engineer to join our team at Criteo. As a key member of our Product Reliability Engineering group, you will work closely with product engineering to improve the reliability of our apps, systems, and pipelines.Your Responsibilities:Collaborate with product engineering to identify and prioritize...
-
Toronto, Ontario, Canada Criteo Full timeCompany Overview:Criteo is a leader in the AdTech industry, pushing the boundaries of online advertising and driving innovation. As a Site Reliability Engineer on our team, you will be at the forefront of building and maintaining scalable systems that deliver exceptional results.About the Role:This role offers a unique blend of technical expertise and...
-
Cloud Reliability Engineering Manager
4 weeks ago
Toronto, Ontario, Canada The Home Depot Canada Full timeAbout The Home Depot CanadaThe Home Depot Canada is a leading retailer of home improvement products and services, committed to delivering exceptional customer experiences and driving business growth. We are seeking a highly skilled Cloud Reliability Engineering Manager to join our team and lead our Site Reliability Engineers in ensuring the reliability,...
-
Senior Reliability Engineer
4 weeks ago
Toronto, Ontario, Canada Flinks Full timeAbout FlinksWe're not just building data infrastructure; we're shaping the future of finance. Our mission is to empower consumers with control over their financial data and unlock its full potential. We equip fintechs and banks with cutting-edge data tools, enabling them to create innovative, client-centric products that are transforming the financial...
-
Site Reliability Engineer
1 month ago
Toronto, Ontario, Canada SGS Full timeJob Title: Site Reliability EngineerWe are seeking a highly skilled Site Reliability Engineer to join our team at SGS Canada. As a key member of our infrastructure team, you will play a critical role in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications.Key Responsibilities:Partner with developers and...