Site Reliability Engineer- Automation

2 months ago


Old Toronto, Canada Ascend Fundraising Solutions Full time

We are currently seeking a full-time Site Reliability Engineer to join our IT team. In this role, you will collaborate closely with the client services team to diagnose, troubleshoot, and resolve issues related to system reliability.

RESPONSIBILITIES:

  1. Take ownership of customer-reported issues and see problems through to resolution.
  2. Develop preventive measures to avoid recurring issues.
  3. Follow standard procedures for escalating unresolved issues to the appropriate internal teams.

Infrastructure Management:

  1. Design, configure, deploy, and maintain AWS infrastructure using best practices.
  2. Implement Infrastructure as Code (IaC) using Terraform for scalability, repeatability, and maintainability.
  3. Collaborate with the development team to optimize .NET applications for peak performance in a cloud environment.

Monitoring and Alerting:

  1. Design and implement advanced system monitoring solutions for high performance, availability, and security.
  2. Use monitoring tools proactively to identify and diagnose infrastructure and application-level issues.
  3. Collaborate on defining Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.

Reliability and Availability:

  1. Optimize cloud resource availability, performance, and cost using best practices.
  2. Plan and execute disaster recovery drills and ensure high availability of critical systems.
  3. Respond promptly to system alerts, lead incident resolution, and contribute to post-mortem analyses.

Automation and Optimization:

  1. Automate repetitive tasks related to infrastructure provisioning, configuration, and deployment.
  2. Ensure continuous deployment and continuous integration best practices are implemented and maintained.

Collaboration and Knowledge Sharing:

  1. Collaborate with developers, product managers, and other teams to ensure seamless and stable application deployment.
  2. Document processes, architectures, and best practices to facilitate knowledge sharing.

WHAT WE SEEK IN OUR IDEAL CANDIDATE:

  1. AWS certifications such as AWS Certified Solutions Architect or AWS Certified DevOps Engineer.
  2. Experience with monitoring and alerting tools in the AWS ecosystem.
  3. Familiarity with Site Reliability Engineering (SRE) philosophy, SLOs, SLIs, and Error Budgets.
  4. Strong analytical and troubleshooting skills.
  5. Excellent communication and collaboration skills.

YOUR EXPERIENCE & SKILLS:

  1. Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
  2. 5+ years of experience managing and operating AWS environments.
  3. Familiarity with best practices in monitoring, logging, and alerting.

WHY WORK AT ASCEND?

  1. Intellectual curiosity, dedication, and a team willing to get the job done.
  2. Opportunity to make a significant impact on the business in the short and long term.
  3. Contribute to a company that supports charities and NPOs in funding their causes.
  4. Beautiful downtown Toronto office with lake views and proximity to transit.
  5. Hybrid work environment.
#J-18808-Ljbffr

  • Old Toronto, Canada https:www.energyjobline.comsitemap.xml Full time

    Product: Global Platform Engineering Your role: Supervise a team of Site Reliability Engineers Report metrics on application performance and incidents Act proactively and responsively to infrastructure and application failures Build and automate failover and recovery workflows Implement observability and monitoring stack for infrastructure and application...


  • Old Toronto, Canada RBC Full time

    b>RBC is seeking a Lead SRE for our US Cash Management Technology. This is a brand-new system to serve our corporate clients. You will be heavily involved in shaping the future technology landscape of RBC, by delivering key business values for a transformational project in our Banking Technology while implementing strategic components servicing across all...


  • Old Toronto, Canada Soda Full time

    Job Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...


  • Old Toronto, Canada Tbwa ChiatDay Inc Full time

    Automate and Optimize Brick and Mortar RetailFocal Systems is the industry leader in retail AI solutions, revolutionizing brick and mortar retail with deep learning computer vision. As a Silicon Valley-based startup, we have more than doubled in size every year since inception.Our MissionWe are looking for smart, creative, and passionate individuals who want...


  • Old Toronto, Canada Olx Full time

    p>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...


  • Old Toronto, Canada Mastech Inc. Full time

    Mastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...


  • Toronto, Canada CB Canada Full time

    Site Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...


  • Old Toronto, Canada Sentry Full time

    p>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...


  • Old Toronto, Canada Tecsys Full time

    Tecsys is a fast-growing innovator offering supply chain solutions to industry-leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. As a Cloud Infrastructure Specialist, you will be responsible for ensuring the reliability and uptime of our platform and applications in a data-driven way to support internal and...


  • Old Toronto, Canada Thomson Reuters Full time

    h3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...


  • Old Toronto, Canada Tecsys Inc. Full time

    p>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Old Toronto, Canada Loblaw Companies Ltd - Head Office Full time

    Cloud Engineering OpportunityWe are seeking an experienced Site Reliability Engineer to join our team at Loblaw Companies Ltd - Head Office. This role offers a unique opportunity to design, develop, and maintain cloud native solutions using services like Kubernetes, AppEngine, Cloud Functions, CloudSql, BigQuery, Pub/Sub on Google Cloud Platform and...


  • Toronto, ON, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. PointsBet is a sports & casino betting operator...


  • Old Toronto, Canada Tbwa ChiatDay Inc Full time

    p>Company DescriptionFocal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. Our mission is to automate and optimize brick and mortar retail using deep learning computer vision. We are looking for smart, creative and passionate people who want to help...


  • Old Toronto, Canada Tecsys Full time

    p>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...


  • Toronto, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...


  • Toronto, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...


  • Toronto, ON, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...


  • Toronto, ON, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...


  • Toronto, Ontario, Ontario, Canada PointsBet Canada Full time

    SITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...