Site Reliability Engineer
2 days ago
Intelcom | Dragonfly
With more than 100 sorting stations and operations across three continents,
Intelcom | Dragonfly
is Canada's leader in last-mile logistics. Our vision is clear:
to deliver fast, accurate, and reliable service powered by cutting-edge technology.
A Strategic Role at the Heart of Logistics
Responsibilities
- Incident Management: Detect and respond to issues, ensuring rapid recovery to minimize downtime. Current on-call contributors need better coordination and structure in investigations. This role involves off-hours events, but these are cyclical with quieter periods. Define and implement an escalation process. Ensure the communication and adhesion of all the stakeholders across the business to the process. Document incident reports and conduct post-mortems to promote a continuous improvement approach.
- Collaboration: Work closely with development and operations teams to ensure smooth deployment and operation of applications. Provide primary operational support and engineering for large-scale distributed software applications. Collaborate with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. This requires a diligent follow-up and close collaboration with all teams.
- Influence: Create sustainable systems and services through automation and enhancements. Promote a culture of innovation and continuous improvement within the SRE team and the broader organization. Coordinate with the SRE team manager in establishing and executing operational policies that promote agility and scalability. Coordinate and mentor other SRE team members, fostering professional growth and development. Work closely with development and operations teams to ensure smooth deployment and operation of applications.
- Automation: Automate repetitive tasks to improve efficiency and reduce human errors. Improve the reliability, quality, and time-to-market of our software solutions. Measure and optimize system performance anticipating business needs.
- Monitoring and Alerting: Implement and enhance monitoring systems (e.g., Datadog) to track the health and performance of applications and infrastructure. There are existing systems, but additional ones are needed. Monitor and maintain the production environment, ensuring high availability and system health. Gather and process metrics from operating systems and applications to assist in performance tuning and fault finding. Develop an health monitoring dashboard to enable the visibility of our various stakeholders on our production environment.
- Disaster Recovery: Prepare and implement disaster recovery plans to manage unexpected outages.
- Performance Optimization: Continuously improve system performance and scalability.
- Capacity Planning: Ensure the infrastructure can handle current and future demands.
- Chaos Engineering: Intentionally introduce failures to test system resilience and improve robustness.
Qualifications
- Bachelor's degree in software engineering, computer science or equivalent.
- 3+ years experience in cloud management, development and/or SRE responsibilities.
- Experience in Agile methodology and technical project execution. Knowledgeable in DevOps concepts, AWS, Azure, GCP, observability tools (Datadog, cloudflare), Terraform, PagerDuty and how to integrate all these things together.
Other Skills
- Strong initiative and resilience, with a demonstrated ability to explore new ideas and innovative approaches to solving complex problems.
- Excellent interpersonal and communication skills in both French and English.
- Be able and comfortable evolving in fast-moving environment.
Schedule: Primarily daytime hours, but on-call availability is required for the initial months to observe and refine existing processes.
Join Our Team
Be part of a dynamic and innovative company at the forefront of the last-mile delivery industry. If you are a strategic thinker, results-driven leader, and passionate about driving business growth, we'd love to hear from you.
Why Join Us?
Benefits
At
Intelcom | Dragonfly
, you'll thrive in a flexible and stimulating environment, surrounded by passionate talent. You'll also enjoy a wide range of benefits:
- On-site gym with a personal trainer
- Employer-provided lunch of your choice
- Comprehensive group insurance
- Group RRSP plan
- Wellness days
- Partial reimbursement for public transportation
- Employee Assistance Program
…and much more.
Diversity & Inclusion
At
Intelcom | Dragonfly
, we move forward guided by strong values: collaboration, innovation, excellence, and responsibility.
We embrace diversity, ensure equity, and foster a true sense of belonging.
Accommodation measures are available for individuals with disabilities throughout our recruitment process, in compliance with the law. Please let us know if you have any specific needs.
-
Site Reliability Engineer
6 days ago
Montreal, Quebec, Canada Open Systems Technologies Full timeJob Title: Site Reliability EngineerLocation: Montreal – Hybrid – 3 days/weekTerm: 12 months contract plus extensionThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client's ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
6 days ago
Montreal, Quebec, Canada Roshan Consulting Services Full timeCompany DescriptionRoshan Consulting empowers businesses to optimize operations and enhance efficiency through innovative strategies and technologies tailored to their unique needs. Our mission is to drive digital transformation and deliver sustainable growth by offering services such as Robotic Process Automation (RPA), business process optimization, and...
-
Site Reliability Engineer
3 days ago
Montreal, Quebec, Canada Open Systems Technologies Full timeThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead.This role requires delivering a range of SRE practices within a...
-
Site Reliability Engineer
19 hours ago
Montreal, Quebec, Canada Tecsys Inc. Full timeHaving recognized the advantages of remote work, including employee morale, productivity, reduced commuting on employee wellbeing and the environment, we are proud to be a digital-first company. The technologies and programs in which we invested have provided a fantastic foundation to this end. Our digital-first work environment, together with our...
-
Site Reliability Engineering
2 weeks ago
Montreal, Quebec, Canada Intelcom Full timeMake your internship countAt Intelcom, interns don't just observe, they contribute meaningfully to real projects that shape how we operate. You'll gain hands-on experience, grow your skills, and explore long-term career opportunities in a fast-moving, innovation-driven environment. Ride the next mile with usWe are seeking a Site Reliability Engineering (SRE)...
-
Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Omiz Staffing Solutions (OSS) Full timePosition: Site Reliability EngineerLocation: Montreal, QC Canada (Hybrid – 3-4 days onsite in a week)Duration: Long-Term ContractJob DescriptionDelivery of improvements that will maximize the availability and performance of supported systems through optimized and automated operational tasks, collaborating on the development of operational tools, ongoing...
-
Senior Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Botpress Technologies Inc. Full timeDescription Help bring AI agents to companies worldwide. Over the next decade, autonomous agents will redefine how we work. Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic. Our product works today and at scale, across industries, regions, and limitless use cases. As the 3rd...
-
Senior Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Botpress Technologies Inc. Full timeHelp bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use cases.As the 3rd fastest-growing B2B AI...
-
Senior Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Botpress Full timeHelp bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use cases.As the 3rd fastest-growing B2B AI...
-
Senior Site Reliability Engineer
2 weeks ago
Montreal, Quebec, Canada Orion Innovation Full timeOrion Innovation is a premier, award-winning, global business and technology services firm. Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries...