Reliability Engineer

4 weeks ago


Old Toronto, Ontario, Canada Emburse, Inc. Full time

As a Cloud Reliability Engineer - Automation Expert at Emburse, Inc., you will be responsible for developing software and software fixes to integrate internal systems. You will ensure code quality, test and distribute code updates, and monitor the health and stability of the servers.

Key Responsibilities:
  • Meet and beat Key Performance Indicators, SLAs, maintain an error budget and adhere to it.
  • Identify, evaluate, and execute preventative measures to minimize and avoid impact to the customer experience.
  • Employ deep troubleshooting skills to improve the availability, performance, and security for CR and Emburse, ensuring services are designed with 24/7 availability and operational readiness.
  • Coding and automation of applications on cloud platforms.
  • Work with engineering leadership to build shared services that meet the requirements and needs of the platform and application teams.
  • Collaborate with Cloud Platform and Operations leaders to develop narratives, backlog grooming, epic planning, and overall sprint planning processes.
  • Ensure the platform holds a high degree of reliability, at least four 9s.
  • Define non-functional requirements as part of the product lifecycle to influence new designs, standards, and methods for scalable, highly available distributed systems.
  • Own technically intricate issues that cross between DevOps, databases, networking, code, infrastructure, and people; drive them to satisfactory completion.
  • Work closely with product stakeholders to align operational priorities and planning with the product and engineering roadmap.
  • Prepare and present engineering-related documents to key stakeholders.
  • Provide recommendations and feedback in review sessions and design reviews.
  • Mentor SRE I and II's.
  • Assist in guiding more junior engineers in best practices.
  • Conduct and assist with investigation, testing, and deployment activities; identify and mitigate risks in development activities.
Requirements:
  • Bachelor's degree in Computer Science or a STEM field required.
  • Minimum of 7 years' experience in an engineering role required.
  • Deep understanding of infrastructure as code, scripting, self-healing, containers, and DevOps tooling highly desired.
  • Experience working with Ansible and Terraform tools highly desirable.
  • Excellent written and verbal communication skills in English.
  • Experience with the full lifecycle of SaaS implementations as well as infrastructure as code.
  • Excellent follow-up and project management skills.
  • Proven ability to create and maintain new tools.
  • Excellent troubleshooting skills.
  • Excellent technical skills; up to 70% of the job is hands-on in a distributed Linux environment.
  • Strong scripting skills; OOP is a plus.
  • Liaise between other teams to help prioritize and align priorities.
  • Experience working with an offshore team.

  • Reliability Engineer

    4 weeks ago


    Old Toronto, Ontario, Canada TD Bank Full time

    Job Title: Site Reliability EngineerJob Summary:We are seeking a highly skilled Site Reliability Engineer to join our Technology Solutions team in Canada. As a key member of our team, you will be responsible for ensuring the reliability and performance of our technology infrastructure.Key Responsibilities:* Collaborate with cross-functional teams to design,...


  • Old Toronto, Ontario, Canada Chelsea Avondale Full time

    Job Title: Asset Reliability EngineerAt Chelsea Avondale, we're pushing the boundaries of home insurance innovation. Our team of experts has developed cutting-edge risk modeling and insurance pricing technologies, which we deploy through our own insurance company.We're a group of talented individuals from diverse backgrounds, including insurance, software...


  • Old Toronto, Ontario, Canada Thomson Reuters Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineering Specialist to join our team at Thomson Reuters. As a Site Reliability Engineer, you will be responsible for designing, implementing, and maintaining scalable systems and services that meet the needs of our customers.Key ResponsibilitiesDesign and implement scalable systems and...


  • Old Toronto, Ontario, Canada https:www.energyjobline.comsitemap Full time

    Product: Global Platform EngineeringYour Role:As a key member of our Global Platform Engineering team, you will be responsible for overseeing a team of Site Reliability Engineers and ensuring the smooth operation of our cloud-based infrastructure.Lead a team of Site Reliability Engineers to ensure the reliability and scalability of our cloud-based...


  • Old Toronto, Ontario, Canada Sentry Full time

    About the RoleSentry is on a mission to help developers write better software faster. As a Cloud Reliability Engineer, you will play a critical role in ensuring the uptime and reliability of our hosted platform.You will work with a multitude of technologies, including cloud providers, to architect and automate services and systems to meet the demand of...


  • Old Toronto, Ontario, Canada Thomson Reuters Full time

    Site Reliability Engineer (Contract)Contract (5 months 29 days)Closed OpportunityThomson Reuters is seeking a skilled Site Reliability Engineer to join our Service Management Organization.The ideal candidate will have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure.As a Site Reliability...

  • Reliability Engineer

    4 weeks ago


    Old Toronto, Ontario, Canada https:www.energyjobline.comsitemap Full time

    Job SummaryWe are seeking a highly skilled Cloud Infrastructure Reliability Engineer to join our team. As a key member of our cloud operations team, you will be responsible for designing, implementing, and maintaining a highly available and scalable cloud infrastructure.Key ResponsibilitiesDesign and implement a cloud infrastructure that meets the needs of...


  • Toronto, Ontario, Canada Metrolinx Full time

    Job Title: Senior Reliability EngineerJob Summary:Metrolinx is a leading transportation agency in the Greater Golden Horseshoe region, operating GO Transit, UP Express, and the PRESTO fare payment system. We are committed to providing reliable and efficient transportation services to our customers. As a Senior Reliability Engineer, you will play a critical...


  • Toronto, Ontario, Canada Metrolinx Full time

    Job Title: Senior Reliability EngineerJob Summary:Metrolinx is a leading transportation agency in the Greater Golden Horseshoe region, operating GO Transit, UP Express, and the PRESTO fare payment system. We are committed to providing reliable and efficient transportation services to our customers. As a Senior Reliability Engineer, you will play a critical...


  • Toronto, Ontario, Canada Estée Lauder Companies Full time

    Reliability Engineering Manager RoleWe are seeking a highly skilled Reliability Engineering Manager to join our team at Estée Lauder Companies. As a key member of the Plant Management Team, you will be responsible for leading maintenance and reliability processes to achieve operational excellence.The ideal candidate will have a strong background in plant...


  • Old Toronto, Ontario, Canada Emburse, Inc. Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer - Automation to join our team at Emburse, Inc. This individual will be responsible for ensuring the reliability, availability, and performance of our cloud-based systems.Key ResponsibilitiesDevelop and implement software fixes to integrate internal systems, ensuring code quality, testing,...


  • Old Toronto, Ontario, Canada Ascend Fundraising Solutions Full time

    Site Reliability Engineer - Automation SpecialistWe are seeking a highly skilled Site Reliability Engineer to join our IT team. In this role, you will collaborate closely with the client services team to diagnose, troubleshoot, and resolve issues related to system reliability.Key Responsibilities:Take ownership of customer-reported issues and see problems...


  • Old Toronto, Ontario, Canada TD Bank Full time

    Job Summary:We are seeking a highly skilled AWS Cloud Reliability Engineer to join our team at TD Bank. As a key member of our technology organization, you will be responsible for designing and operating large, complex systems that meet the highest standards of reliability, scalability, and efficiency.Key Responsibilities:Provide technical leadership to...


  • Toronto, Ontario, Canada Criteo Full time

    About the Role:This is a challenging opportunity for an experienced engineer to join Criteo's PRE team as a Site Reliability Engineer. The role involves working closely with product engineering to improve the reliability of our apps, systems, and pipelines, assessing where optimization is needed most, and telling stories with meaningful monitoring.Key...


  • Old Toronto, Ontario, Canada Ascend Fundraising Solutions Full time

    Job Title: Site Reliability Engineer - AutomationWe are seeking a highly skilled Site Reliability Engineer to join our IT team at Ascend Fundraising Solutions. As a key member of our team, you will collaborate closely with our client services team to diagnose, troubleshoot, and resolve issues related to system reliability.Responsibilities:Take ownership of...


  • Toronto, Ontario, Canada Metrolinx Full time

    Job Summary: We are seeking a highly skilled Reliability Engineering Expert to join our team at Metrolinx. In this role, you will be responsible for ensuring the reliability, availability, maintainability, and safety (RAMS) of our GO Transit Bus fleet and infrastructure assets. You will analyze performance metrics and asset failure history to identify...


  • Old Toronto, Ontario, Canada Teranet Full time

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our DevOps team at Teranet. As a key member of our team, you will be responsible for applying software engineering principles to infrastructure and operations problems, with the goal of creating highly automated, scalable, and reliable systems.Key ResponsibilitiesDesign and...


  • Toronto, Ontario, Canada The Toronto-Dominion Bank (Canada) Full time

    Job SummaryWe are seeking a highly skilled Site Reliability Engineer to join our team at The Toronto-Dominion Bank (Canada). As a Site Reliability Engineer, you will play a critical role in ensuring the reliability, scalability, and performance of our systems and applications.Key ResponsibilitiesProvide technical leadership and expertise in designing and...


  • Old Toronto, Ontario, Canada Sentry Full time

    About the roleThe Site Reliability Engineering team at Sentry is responsible for the deployment, configuration, maintenance, and monitoring of our hosted platform.We leverage automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers.Key ResponsibilitiesDeployment and configuration of our hosted...


  • Toronto, Ontario, Canada SGS Full time

    Job DescriptionThe Site Reliability Engineer will play a critical role in ensuring the reliability, supportability, scalability, and performance of our .NET stack applications built with MVC, Angular, and Web API.Partner with developers and product operations teams to understand application requirements and translate them into operational practices.Design,...