Site Reliability Engineer
4 weeks ago
Site Reliability Engineer (SRE) - ServiceNow, Application Infrastructure
The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to drive reliability engineering, operations, and customer support services for a ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead, this role involves delivering SRE practices within a global community of engineers.
The position focuses on implementing ServiceNow Software as a Service, which supports IT service management and integrates with technologies like chatbots, on-call escalation, incident management, SQL databases, APIs, and web infrastructure. This role combines development, process improvement, and production-side operational responsibilities, including occasional participation in on-call rotations.
We welcome candidates from diverse backgrounds, whether transitioning from development, infrastructure, or system administration, who are passionate about reliability and resilience principles.
Key Responsibilities:
- Optimize System Reliability:
- Drive improvements to maximize system availability and performance by automating operational tasks, developing tools, managing technical debt, and participating in architecture reviews.
- ServiceNow and Infrastructure Support:
- Troubleshoot ServiceNow issues and related on-premise capabilities in a Linux environment, collaborating to identify root causes and implement lasting improvements.
- Observability and Monitoring:
- Design and deliver solutions for metrics, logging, tracing, and alerting to measure and improve system reliability.
- On-Call Support:
- Participate in a global on-call rotation, ensuring dependability and responsiveness during agreed hours, with time-off in lieu for on-call duties.
- Documentation and Knowledge Sharing:
- Contribute to and maintain thorough documentation of the ServiceNow environment and its dependencies.
- Technical Debt Management:
- Identify and prioritize technical debt impacting client satisfaction and operational efficiency.
- Process Feedback:
- Provide input on policies and procedures to enhance SRE practices, operational efficiency, and system safety.
Required Skills:
- ServiceNow Expertise:
- Experience in ServiceNow administration or development (preferred but not mandatory; on-the-job training available).
- Programming Skills:
- Proficiency in at least one programming language (e.g., Python).
- Communication and Collaboration:
- Strong verbal and written communication skills, with the ability to build effective relationships with global teams.
- Problem Solving:
- Ability to respond to technical emergencies, troubleshoot effectively, and implement sustainable solutions.
- Teamwork and Dependability:
- A committed team player with a client-focused approach.
Preferred Skills:
- ServiceNow administration or development experience.
- Familiarity with Linux environments and operational troubleshooting.
- Knowledge of observability tools and techniques (metrics, logging, tracing).
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Soho Square Solutions Full timeSite Reliability Engineer (SRE) - ServiceNow, Application InfrastructureThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to drive reliability engineering, operations, and customer support services for a ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead, this role involves...
-
Site Reliability Engineer
1 month ago
Montreal, Quebec, Québec, Canada Soho Square Solutions Full timeSite Reliability Engineer (SRE) - ServiceNow, Application InfrastructureThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to drive reliability engineering, operations, and customer support services for a ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead, this role involves...
-
Site Reliability Engineer
7 months ago
Montreal, Canada Lyft Full timeAt Lyft, our mission is to improve people’s lives with the world’s best transportation. Imagine cities where streets are safe, communities thrive, and personal cars are a thing of the past. We envision a future where shared and active transportation modes are the norm, fostering vibrant, connected neighborhoods.As a leader in micromobility, Lyft powers...
-
Site Reliability Engineer
5 hours ago
Montreal, Canada LanceSoft, Inc. Full timeSite Reliability EngineerMontreal, Quebec, Canada HybridDuration: 12+ monthsResponsibilities: • Are interested in distributed systems and working with highly scalable and reliable services. • Like to work in a fast-moving environment and you aren't afraid to change things to make them better. • Enjoy new technological challenges and solving hard...
-
AWS Site Reliability Engineer
4 weeks ago
Montreal, Canada SAP SE Full timep>We help the world run betterAt SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. p>The Reliability Engineering organization provides a multitude of products and services related to operations and continuity of business delivery.The Site Reliability Engineering teams...
-
Site Reliability Engineer
1 day ago
Montreal, Quebec, Canada LanceSoft, Inc. Full timeUnlock a career as a Site Reliability Engineer at LanceSoft, Inc., a cutting-edge technology company based in Montreal, Quebec, Canada. We are seeking an experienced and highly motivated individual to join our team.Job Type: Full-timeDuration: 12+ monthsCompany OverviewLanceSoft, Inc. is a leading technology firm dedicated to delivering innovative solutions...
-
Site Reliability Engineering Leader
2 days ago
Montreal, Quebec, Canada Royal Bank of Canada Full timeTransform Your Career with a Leadership Role in Site Reliability Engineering We are seeking an experienced Senior Site Reliability Engineer to join our team at the Royal Bank of Canada. As a key member of our Digital Branch SRE organization, you will play a critical role in developing, implementing, and supporting SRE solutions for applications supported by...
-
AWS Site Reliability Engineer
3 weeks ago
Montreal, Canada SAP SE Full timep>We help the world run betterAt SAP, we enable you to bring out your best. Our company culture is focused on collaboration and a shared passion to help the world run better. We focus every day on building the foundation for tomorrow and creating a workplace that embraces differences, values flexibility, and is aligned to our purpose-driven and...
-
AWS Site Reliability Engineer
4 months ago
Montreal, Canada Alltech Consulting Services Full timeJob Description Level 4 The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations, and customer support services for Company’s ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead, this role requires delivering a range of SRE...
-
Site Reliability Engineer
1 day ago
Montreal, Quebec, G4F, CA LanceSoft, Inc. Full timeSite Reliability EngineerMontreal, Quebec, Canada HybridDuration: 12+ monthsResponsibilities: • Are interested in distributed systems and working with highly scalable and reliable services. • Like to work in a fast-moving environment and you aren't afraid to change things to make them better. • Enjoy new technological challenges and solving hard...
-
Montreal, Quebec, Canada Alltech Consulting Services Full timeWe are seeking an experienced Site Reliability Engineer to join our team at Alltech Consulting Services. As a key member of our Application Infrastructure department, you will play a vital role in driving the reliability engineering, operations, and customer support services for our ServiceNow SaaS implementation.The ideal candidate will have experience in...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeLocation : Montreal (Hybrid 3 days)Duration: 12+ MonthsJob ProfileSystems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across *** by applying sound software engineering principles and adopting the latest technology and tooling.Responsibilities:Are...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeLocation : Montreal (Hybrid 3 days)Duration: 12+ MonthsJob ProfileSystems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across *** by applying sound software engineering principles and adopting the latest technology and tooling.Responsibilities:Are...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Experience AI Solutions Full timeSenior Systems Administrator Start Date : as soon as possible Type of employment: permanent Location: Montreal, QC (hybrid model for working in the office) Number of Positions: 1 Language skills : Excellent English language skills Perks: Work for a multinational, award winning, socially responsible company with an operational presence in many...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Experience AI Solutions Full timeSenior Systems AdministratorStart Date: as soon as possibleType of employment: permanentLocation: Montreal, QC (hybrid model for working in the office)Number of Positions: 1Language skills: Excellent English language skillsPerks: Work for a multinational, award winning, socially responsible company with an operational presence in many countries, having been...
-
Site Reliability Engineer
2 days ago
Montreal, Canada Experience AI Solutions Full timeSenior Systems AdministratorStart Date: as soon as possibleType of employment: permanentLocation: Montreal, QC (hybrid model for working in the office)Number of Positions: 1Language skills: Excellent English language skillsPerks: Work for a multinational, award winning, socially responsible company with an operational presence in many countries, having been...
-
Site Reliability Engineer
4 weeks ago
Montreal, Quebec, Québec, Canada LanceSoft, Inc. Full timeLocation : Montreal (Hybrid 3 days)Duration: 12+ MonthsJob ProfileSystems Reliability Engineering (SRE) is a discipline focused on improving system service availability, observability, scalability, performance, and resilience across *** by applying sound software engineering principles and adopting the latest technology and tooling.Responsibilities:Are...
-
Technical Site Reliability Engineering
1 month ago
Montreal, Canada Ubisoft Entertainment Full timeh3>Technical Site Reliability Engineering (SRE) LeadFull-timeContract: PermanentFlexible Working Organization: HybridUbisoft’s 19,000 team members, working across more than 30 countries around the world, are bound by a common mission to enrich players’ lives with original and memorable gaming experiences. If you are excited about solving game-changing...
-
Site Reliability Engineer
2 months ago
Montreal, Canada National Bank Full timeAs a Specialist in site reliability engineering on the National Bank Data Protection team, you will ensure the operational reliability of data protection assets. With your experience and knowledge in the operational management of high-availability assets (HA), you will have a positive impact on the Bank's stability and reputation with its internal and...
-
Site Reliability Specialist
4 weeks ago
Montreal, Quebec, Canada LanceSoft, Inc. Full timeJob SummaryWe are seeking a skilled Site Reliability Engineer to join our team at LanceSoft, Inc. in Montreal (Hybrid 3 days). This is a long-term contract position with a duration of 12+ Months.About the RoleIn this role, you will be responsible for improving system service availability, observability, scalability, performance, and resilience across various...