Site Reliability Engineer- Automation
2 months ago
We are currently seeking a full-time Site Reliability Engineer to join our IT team. In this role, you will collaborate closely with the client services team to diagnose, troubleshoot, and resolve issues related to system reliability.
RESPONSIBILITIES:
- Take ownership of customer-reported issues and see problems through to resolution.
- Develop preventive measures to avoid recurring issues.
- Follow standard procedures for escalating unresolved issues to the appropriate internal teams.
Infrastructure Management:
- Design, configure, deploy, and maintain AWS infrastructure using best practices.
- Implement Infrastructure as Code (IaC) using Terraform for scalability, repeatability, and maintainability.
- Collaborate with the development team to optimize .NET applications for peak performance in a cloud environment.
Monitoring and Alerting:
- Design and implement advanced system monitoring solutions for high performance, availability, and security.
- Use monitoring tools proactively to identify and diagnose infrastructure and application-level issues.
- Collaborate on defining Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets.
Reliability and Availability:
- Optimize cloud resource availability, performance, and cost using best practices.
- Plan and execute disaster recovery drills and ensure high availability of critical systems.
- Respond promptly to system alerts, lead incident resolution, and contribute to post-mortem analyses.
Automation and Optimization:
- Automate repetitive tasks related to infrastructure provisioning, configuration, and deployment.
- Ensure continuous deployment and continuous integration best practices are implemented and maintained.
Collaboration and Knowledge Sharing:
- Collaborate with developers, product managers, and other teams to ensure seamless and stable application deployment.
- Document processes, architectures, and best practices to facilitate knowledge sharing.
WHAT WE SEEK IN OUR IDEAL CANDIDATE:
- AWS certifications such as AWS Certified Solutions Architect or AWS Certified DevOps Engineer.
- Experience with monitoring and alerting tools in the AWS ecosystem.
- Familiarity with Site Reliability Engineering (SRE) philosophy, SLOs, SLIs, and Error Budgets.
- Strong analytical and troubleshooting skills.
- Excellent communication and collaboration skills.
YOUR EXPERIENCE & SKILLS:
- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent experience.
- 5+ years of experience managing and operating AWS environments.
- Familiarity with best practices in monitoring, logging, and alerting.
WHY WORK AT ASCEND?
- Intellectual curiosity, dedication, and a team willing to get the job done.
- Opportunity to make a significant impact on the business in the short and long term.
- Contribute to a company that supports charities and NPOs in funding their causes.
- Beautiful downtown Toronto office with lake views and proximity to transit.
- Hybrid work environment.
-
Lead Site Reliability Engineer
2 months ago
Old Toronto, Canada https:www.energyjobline.comsitemap.xml Full timeProduct: Global Platform Engineering Your role: Supervise a team of Site Reliability Engineers Report metrics on application performance and incidents Act proactively and responsively to infrastructure and application failures Build and automate failover and recovery workflows Implement observability and monitoring stack for infrastructure and application...
-
Lead Site Reliability Engineer
2 weeks ago
Old Toronto, Canada RBC Full timeb>RBC is seeking a Lead SRE for our US Cash Management Technology. This is a brand-new system to serve our corporate clients. You will be heavily involved in shaping the future technology landscape of RBC, by delivering key business values for a transformational project in our Banking Technology while implementing strategic components servicing across all...
-
AWS Site Reliability Engineer
1 month ago
Old Toronto, Canada Soda Full timeJob Description Job Title: Site Reliability Engineer Location: Poland - Fully Remote Salary: 324K PLN or 27.3K monthly Start: ASAP Stack: AWS, Docker, Kubernetes, Terraform, Jenkins, Ansible, Linux, JavaScript, and Lambda. Are you a seasoned DevOps/SRE professional passionate about building high-performance, scalable systems? I am working with a Media/IT...
-
Site Reliability Engineering Manager
1 week ago
Old Toronto, Canada Tbwa ChiatDay Inc Full timeAutomate and Optimize Brick and Mortar RetailFocal Systems is the industry leader in retail AI solutions, revolutionizing brick and mortar retail with deep learning computer vision. As a Silicon Valley-based startup, we have more than doubled in size every year since inception.Our MissionWe are looking for smart, creative, and passionate individuals who want...
-
AWS Site Reliability Engineer
1 month ago
Old Toronto, Canada Olx Full timep>Site Reliability EngineerRemote Poland, PolandOLX – Engineering / Full-time / Remote At OLX, we work together to build a more sustainable world through trade. We make it safe, smart, and convenient to buy and sell cars, find housing, get jobs, buy and sell household goods, and more. Our colleagues around the world help to serve millions of people around...
-
Digital Site Reliability Engineer
2 months ago
Old Toronto, Canada Mastech Inc. Full timeMastech Digital is an IT Staffing and Digital Transformation Services company.Mastech Digital provides digital and mainstream technology staff as well as Digital Transformation Services for all American Corporations. We are currently seeking a Site Reliability Engineer (GCP) for our client in the Consulting domain. We value our professionals, providing...
-
Site Reliability Engineer
7 months ago
Toronto, Canada CB Canada Full timeSite Reliability Engineer On behalf of our client in the Banking Sector, PROCOM is looking for a Site Reliability Engineer. Site Reliability Engineer – Job Description Azure cloud Jira and confluence CICD Experience with automating (provisioning, configuration management, deployment) and integrating Azure PaaS solutions (Azure App services, Azure...
-
AWS Site Reliability Engineer
2 months ago
Old Toronto, Canada Sentry Full timep>The Site Reliability Engineering team is responsible for the deployment, configuration, maintenance, and monitoring of Sentry's hosted platform. We do this by leveraging automation tools to automatically spin up and scale services to meet the traffic demands of 1,000,000+ developers. Sentry receives over a billion events a day and processes terabytes of...
-
AWS Site Reliability Engineer
3 weeks ago
Old Toronto, Canada Tecsys Full timeTecsys is a fast-growing innovator offering supply chain solutions to industry-leading healthcare systems, hospitals, and pharmacy businesses to distributors, retailers, and 3PLs. As a Cloud Infrastructure Specialist, you will be responsible for ensuring the reliability and uptime of our platform and applications in a data-driven way to support internal and...
-
Site Reliability Engineering Linux or Windows
2 months ago
Old Toronto, Canada Thomson Reuters Full timeh3>(Canada) Site Reliability Engineer (Contract)Contract (9 months 4 days)Published 3 days agoNew RelicData DogSite Reliability Engineer - in the Service Management OrganizationDo you have experience in IT Service Management, working with cloud providers, software development, and technology infrastructure?The Site Reliability Engineer will analyze chronic...
-
AWS Site Reliability Engineer
2 weeks ago
Old Toronto, Canada Tecsys Inc. Full timep>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Site Reliability Engineer
2 weeks ago
Old Toronto, Canada Loblaw Companies Ltd - Head Office Full timeCloud Engineering OpportunityWe are seeking an experienced Site Reliability Engineer to join our team at Loblaw Companies Ltd - Head Office. This role offers a unique opportunity to design, develop, and maintain cloud native solutions using services like Kubernetes, AppEngine, Cloud Functions, CloudSql, BigQuery, Pub/Sub on Google Cloud Platform and...
-
Site Reliability Engineer
2 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. PointsBet is a sports & casino betting operator...
-
Senior Site Reliability Engineer, Data
1 week ago
Old Toronto, Canada Tbwa ChiatDay Inc Full timep>Company DescriptionFocal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. Our mission is to automate and optimize brick and mortar retail using deep learning computer vision. We are looking for smart, creative and passionate people who want to help...
-
AWS Site Reliability Engineer
1 month ago
Old Toronto, Canada Tecsys Full timep>Having recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Site Reliability Engineer
3 weeks ago
Toronto, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
3 weeks ago
Toronto, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
2 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
2 weeks ago
Toronto, ON, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLE As a Site Reliability Engineer (SRE) , you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...
-
Site Reliability Engineer
3 weeks ago
Toronto, Ontario, Ontario, Canada PointsBet Canada Full timeSITE RELIABILITY ENGINEER ABOUT THE ROLEAs a Site Reliability Engineer (SRE), you will ensure the reliability, scalability, and performance of our product. You will lead efforts in proactive monitoring, incident management, automation, collaborating across teams to implement best practices in reliability engineering. Your expertise will drive resilient...