Reliability Engineer
2 weeks ago
Chelsea Avondale is the world's most cutting-edge home insurance group. We have developed sophisticated risk modeling and insurance pricing technologies for home insurance and deploy that technology through our own insurance company.
Our team consists of some of the brightest minds in insurance, software development, finance, and operations. Our group includes our scientific research & engineering division (Skynet Software) and Canadian property & casualty insurance company (Max Insurance).
Together, our group is transforming the Canadian and global insurance landscape
JOB DESCRIPTION:
Chelsea Avondale is looking for a Reliability Engineer with a background in infrastructure system engineering to support the growth of a secure, dynamic, and scalable IT environment across the group. Our business is going through rapid growth, and it is essential that our systems infrastructure keeps pace.
The Reliability Engineer will play a crucial role in ensuring the reliability, scalability, and performance of our systems, enabling the continuous delivery of our products and services. They will be accountable for ensuring overall availability, as well as enhancing Engineering teams' capability to design, build and operate robust systems at scale.
This position is ideal for candidates who have an extraordinary sense of responsibility and are not afraid to roll up their sleeves. Our IT environment is not toolkit rich. What we are NOT looking for is someone who wants to take months installing a large number of tools from their preferred toolkit. We take pride in maintaining a fundamental stack of technologies, much of it in Python, and we are looking for someone who shares this mentality.
If you are someone who thrives in a high-performance culture and is eager for work that is both challenging and constantly evolving, this role is perfect for you. We strongly encourage and help our team members to improve and enhance their personal skill sets within our organization. On your journey with us, you will have the ability to learn and grow rapidly, taking on more responsibilities.
RESPONSIBILITIES:
- Play an integral role in the design, implementation & maintenance of AWS cloud server environments.
- Design, implement, and maintain robust monitoring and alerting systems in Python to detect and respond to incidents in a timely manner
- Collaborate with cross-functional teams to enhance reliability of our systems and services.
- Design, configure, deploy, and maintain infrastructure on AWS using best practices and industry standards.
- Conduct post-incident analysis to identify root causes, implement corrective actions, and prevent similar issues in the future.
- Assist in capacity planning & optimize services to provide scalable, stable, & secure systems.
- Implement high availability and disaster recovery solutions to provide data redundancy, resilience, and data loss prevention.
- Assist with the implementation of select network engineering solutions including firewalls, load balancing, VPNs & LANs, where necessary.
PREFERRED EXPERIENCE & SKILLS:
- Bachelor's degree in Computer Science, Computer Engineering, Electrical Engineering, or related field.
- 1+ years of experience as a Reliability Engineer or similar role, with a focus on maintaining high-performance, scalable, and reliable web systems.
- We also encourage highly motivated new grads to apply.
- Hands-on experience with AWS cloud environments – instances, CloudWatch, EFS, etc.
- Proficiency at Python is a must.
- Experience using NGINX for reverse proxy, load balancing, and caching.
- Experience with Unix / Windows server configuration, administration, performance tuning and troubleshooting.
- Working knowledge of web technologies (web servers, DNS, SSL, Browsers).
- Working knowledge of web development processes (source control, deployment, etc).
- Experience load testing, pen testing, and providing security for cloud resources is beneficial.
Skynet Software welcomes and encourages applications from people with disabilities. Accommodations are available on request for candidates taking part in all aspects of the selection process.
-
Site Reliability Engineer
3 days ago
Toronto, Ontario, Canada Procom Full timeSite Reliability Engineer (SRE)/ Ingénieur Fiabilité des SitesOn behalf of our banking client, Procom is seeking a Site Reliability Engineer (SRE) for a 12-month contract role. This position is a hybrid role, 3 days a week onsite at our client's Montréal, Quebec office.Site Reliability Engineer - Job Description:The Site Reliability Engineer is...
-
Site-Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Aarorn Technologies Inc Full timeJob Title: Site-Reliability Engineer (SRE)Location: Toronto, ON (3x onsite a week)Employment Type: ContractJob DescriptionWe are seeking a highly skilled Site Reliability Engineer (SRE) to enhance the reliability, performance, and efficiency of mission-critical batch workloads within Capital Markets Technology. In this role, you will serve as the technical...
-
Site Reliability Engineer
7 days ago
Toronto, Ontario, Canada Tecsys Inc. Full timeHaving recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Performance Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Cerebras Systems Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
-
Site Reliability Engineer
2 weeks ago
Toronto, Ontario, Canada Pixomondo Full timeWe're seeking an experienced Site Reliability Engineer to join our team and lead infrastructure automation, CI/CD workflows, and deployment operations for a custom web platform. You'll be working with a modern DevOps stack including GitHub Actions, GCP, Kubernetes, Terraform, PostgreSQL, CodeDeploy, and Cloudflare to ensure our platform is robust, scalable,...
-
Site Reliability Engineer
7 days ago
Toronto, Ontario, Canada Kablamo Full timeReports to: Technical Support ManagerLocation: Toronto (Hybrid)Role Type: Full timeLevel: Intermediate/MidIntroductionKablamo is a fast-growing cloud digital product development company. Founded in 2017 in Australia, the business has grown quickly over the last several years, including the expansion of the team to Canada in 2021. We are proud to have...
-
Manager, Site Reliability Engineer
3 days ago
Toronto, Ontario, Canada Command Alkon USA Full timeTitle: Manager, Site Reliability Engineer (SRE)Summary of RoleThe Site Reliability Engineer (SRE) Manager leads the teams responsible for ensuring the availability, performance, and reliability of mission-critical systems. This role bridges the gap between software engineering and operations by implementing automation, observability, and scalability...
-
Site Reliability Engineer
3 days ago
Toronto, Ontario, Canada McCain Foods Full timePosition Title:Site Reliability EngineerPosition Type:Regular - Full-TimePosition Location:Toronto HQRequisition ID:36904Our Global Technology team's goal is to leverage technology and data to drive profitable growth, focus on enhancing customer experience and to further our purpose of 'Celebrating real connections through delicious, planet-friendly food'....
-
Site Reliability Engineer
3 days ago
Toronto, Ontario, Canada Xplor Full time $125,000 - $150,000Company Description Take a seat on the Xplor rocketship and join us as Site Reliability Engineer to help people succeed across the world.From dropping your kids off at childcare, getting something at home repaired, going to the gym or a fitness studio, to picking up your dry cleaning — our software, payments, and commerce-enabling solutions help everyday...
-
Site Reliability Engineer
3 days ago
Toronto, Ontario, Canada Apptoza Inc. Full timeHI,Hope you are doing Great,If you are fine with below JD please share me your Updated resume ASAP.Site Reliability EngineerLocation: TORONTO (ONSITE)Duration: 6 monthsExp Required: 10 YearsJob Description: Job Title : SRETechnical/Functional Skills• 8+ years of overall IT experience.• Advanced Linux / Unix support experience required.• Strong shell...