Cloud Reliability Engineer

1 month ago


Old Toronto, Canada Ascend Fundraising Solutions Full time

We are seeking a skilled Cloud Reliability Engineer to collaborate with our IT team in Toronto. In this role, you will work closely with the client services team to diagnose, troubleshoot, and resolve system reliability issues.

Responsibilities:

  1. Take ownership of customer-reported issues and drive them to resolution.
  2. Develop proactive measures to prevent recurring issues.
  3. Escalate unresolved issues to internal teams using standard procedures.

Infrastructure Management:

  1. Design, configure, deploy, and maintain AWS infrastructure using best practices.
  2. Implement Infrastructure as Code (IaC) using Terraform for scalability, repeatability, and maintainability.
  3. Collaborate with the development team to optimize .NET applications for peak performance in a cloud environment.

Monitoring and Alerting:

  1. Design and implement advanced system monitoring solutions for high performance, availability, and security.
  2. Use monitoring tools proactively to identify and diagnose infrastructure and application-level issues.
  3. Collaborate on defining Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.

Reliability and Availability:

  1. Optimize cloud resource availability, performance, and cost using best practices.
  2. Plan and execute disaster recovery drills and ensure high availability of critical systems.
  3. Respond promptly to system alerts, lead incident resolution, and contribute to post-mortem analyses.

Automation and Optimization:

  1. Automate repetitive tasks related to infrastructure provisioning, configuration, and deployment.
  2. Ensure continuous deployment and continuous integration best practices are implemented and maintained.

Collaboration and Knowledge Sharing:

  1. Collaborate with developers, product managers, and other teams to ensure seamless and stable application deployment.
  2. Document processes, architectures, and best practices to facilitate knowledge sharing.

Requirements:

  1. AWS certifications such as AWS Certified Solutions Architect or AWS Certified DevOps Engineer.
  2. Experience with monitoring and alerting tools in the AWS ecosystem.
  3. Familiarity with Site Reliability Engineering (SRE) philosophy, SLOs, SLIs, and Error Budgets.
  4. Strong analytical and troubleshooting skills.
  5. Excellent communication and collaboration skills.

What We Seek in Our Ideal Candidate:

  1. AWS certifications such as AWS Certified Solutions Architect or AWS Certified DevOps Engineer.
  2. Experience with monitoring and alerting tools in the AWS ecosystem.
  3. Familiarity with Site Reliability Engineering (SRE) philosophy, SLOs, SLIs, and Error Budgets.
  4. Strong analytical and troubleshooting skills.
  5. Excellent communication and collaboration skills.

Why Work at Ascend Fundraising Solutions:

  1. Intellectual curiosity, dedication, and a team willing to get the job done.
  2. Opportunity to make a significant impact on the business in the short and long term.
  3. Contribute to a company that supports charities and NPOs in funding their causes.
  4. Beautiful downtown Toronto office with lake views and proximity to transit.
  5. Hybrid work environment.


  • Old Toronto, Canada Mastech Inc. Full time

    Mastech Digital is a leading provider of IT staffing and digital transformation services.We are currently seeking a highly skilled Cloud Reliability Engineer to join our client's team in the United States.Responsibilities of the Cloud Reliability Engineer include:Designing and implementing scalable and reliable cloud architectures.Collaborating with...


  • Old Toronto, Canada The Home Depot Canada Full time

    About The JobAs a Cloud Reliability Engineer Lead at The Home Depot Canada, you will play a crucial role in ensuring the reliability, performance, and operational support of our eCommerce systems.Job OverviewThis position requires a strong background in reliability reviews, performance engineering practices, production engineering, and operational support,...

  • Reliability Engineer

    4 weeks ago


    Old Toronto, Canada Thomson Reuters Full time

    About the RoleWe are seeking a skilled Reliability Engineer - Cloud Systems to join our team at Thomson Reuters.As a Reliability Engineer - Cloud Systems, you will be responsible for analyzing and resolving chronic and major issues affecting our cloud-based services.Key responsibilities include:Designing and implementing scalable systems and...


  • Old Toronto, Canada Thomson Reuters Full time

    Site Reliability Engineer Job DescriptionThis role is part of our Service Management Organization and involves IT Service Management, cloud providers, software development, and technology infrastructure experience.The Site Reliability Engineer will analyze chronic and major issues, evaluate products and their services, and make recommendations to improve...


  • Old Toronto, Canada Sentry Full time

    Sentry is on a mission to simplify software development and improve application performance. We need a skilled AWS Site Reliability Engineer to join our team and help us achieve our goals. This role involves ensuring the uptime and reliability of our hosted platform, architecting and automating services and systems to meet scaling demands, and collaborating...


  • Old Toronto, Canada Chelsea Avondale Full time

    Chelsea Avondale is the world’s most cutting-edge home insurance group. We have developed sophisticated risk modeling and insurance pricing technologies for home insurance and deploy that technology through our own insurance company. Our team consists of some of the brightest minds in insurance, software development, finance, and operations. Our group...

  • Cloud Engineer

    4 weeks ago


    Old Toronto, Canada LanceSoft Full time

    Description: Business group: Data and Analytics Technology Time Tracking Employees In partnership with the Customer Insights Data and Analytics teams and our IT partners, the Data and Analytics Technology team supports the bank's Data and Analytics needs with tooling, projects, and IT operational support. The Cloud Engineer role will be responsible for...


  • Old Toronto, Canada Lorien Full time

    Hybrid - Manchester We are currently working with a leading gambling company dedicated to providing exceptional gaming experiences. They are looking for an experienced Site Reliability Engineer with a strong skill set in system reliability to join its world-class technology team. This role is ideal for someone who has 4+ years of experience within the...


  • Toronto, Ontario, Canada Thomson Reuters Full time

    We are seeking an experienced Senior SRE to join our Shared Capabilities, Service Reliability and Operation team in Toronto. As a Cloud Native Site Reliability Engineer, you will be responsible for implementing site reliability engineering and DevOps best practices, building and maintaining monitoring for all aspects of infrastructure, micro-services, usage...


  • Old Toronto, Canada Quantumbricks Full time

    Job Title: DevOps EngineerJob Description:Work closely with Engineering stakeholders to design and maintain a reliable, scalable, and secure platform.Collaborate with the Engineering team to identify areas for improvement and implement solutions.Optimize existing deployment tooling and infrastructure, including but not limited to creating and maintaining new...


  • Old Toronto, Canada HOOPP Thames Limited Full time

    **About the Role**We are seeking a highly skilled Cloud Infrastructure Engineer to join our IT Investment Solutions Group at HOOPP. As a Cloud Infrastructure Engineer, you will play a critical role in designing, implementing, and managing our cloud infrastructure to support the organization's strategic objectives.**Responsibilities**Design, deploy, and...


  • Greater Toronto Area, Canada GlossGenius Full time

    About GlossGeniusGlossGenius is a leading fintech company empowering small business owners to succeed by offering a range of business management tools, including booking and scheduling, marketing, analytics, payment processing, and more. Our platform serves over 75,000 entrepreneurs daily.As a pioneering force in the industry, GlossGenius is expanding its...


  • Old Toronto, Canada Sentry Full time

    Bad software is everywhere, and we’re tired of it. Sentry is on a mission to help developers write better software faster, so we can get back to enjoying technology. With more than $217 million in funding and 100,000+ organizations that believe we’re on to something, we're building performance and error monitoring tools that help companies like Disney,...

  • Cloud Engineer

    2 months ago


    Old Toronto, Canada Scotiabank Full time

    Requisition ID: 206977Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture. Scotiabank has embarked on the journey to modernize both development practices and tools. One of the main areas of transformation is the public cloud and the various platform technologies that support both development and operations on...

  • Cloud Engineer

    3 weeks ago


    Old Toronto, Canada Ontario Health Full time

    Job Title: Senior Cloud EngineerOngoing development and implementation of cloud-based systems and infrastructure for Ontario Health.Key Responsibilities:Design, implement, and manage cloud-based infrastructure and applications.Collaborate with cross-functional teams to ensure efficient and secure cloud services.Provide expert-level guidance on cloud...

  • Cloud Engineer

    1 month ago


    Old Toronto, Canada Scotiabank Full time

    Join a purpose-driven winning team, committed to results, in an inclusive and high-performing culture.Scotiabank has embarked on the journey to modernize both development practices and tools. One of the main areas of transformation is the public cloud and the various platform technologies that support both development and operations on the cloud. The aim is...


  • Old Toronto, Canada Infotree Global Solutions Full time

    About Infotree Global SolutionsInfotree Global Solutions is a leading provider of innovative solutions, and we're seeking an experienced Site Reliability Engineer to lead our team.Your RoleAs our Site Reliability Engineering Lead, you will be responsible for supervising a team of skilled engineers and ensuring the reliability and scalability of our global...


  • Toronto, Ontario, Canada LTIMindtree Full time

    About Us: LTIMindtree is a global technology consulting and digital solutions company. We enable enterprises to reimagine business models, accelerate innovation, and maximize growth by harnessing digital technologies.Job Title: SRE EngineerLocation: Mississauga, Ontario (Remote)Job DescriptionWe are seeking an experienced Site Reliability Engineer with 10+...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...


  • Old Toronto, Canada Scotiabank Full time

    As a Principal Cloud Engineer – Cloud Operations Engineering, you will contribute to the overall success of the Cloud and Platform Engineering department at Scotiabank. Your primary objective will be to ensure the stability and dependability of our cloud platform, which serves millions of customers every day.Key Responsibilities:You will be responsible for...