Site Reliability Engineer

3 weeks ago


Montreal, Canada Compunnel, Inc. Full time

Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in scaling and maintaining high-performing systems, ensuring system resilience, and working closely with developers to maximize productivity while minimizing manual operational effort.

Job Responsibilities:

- Gain and maintain full-stack knowledge of Morgan Stanley’s development environment
- Ensure maximum availability and performance of systems through architecture reviews, problem management, and plant optimization
- Automate plant management tasks and develop tools to reduce operational effort and support costs
- Identify and address technical debt that impacts developer productivity or system reliability
- Collaborate with other SREs across Application Infrastructure to implement shared solutions
- Troubleshoot complex issues across the full development stack
- Enhance Ops team product knowledge to reduce issue escalation rates
- Consult with internal developer clients to help troubleshoot and optimize use of Client tooling
- Experiment with emerging technologies, tools, and techniques to improve operations
- Participate in a global on-call rotation with compensatory time-off
- Champion operational responsiveness and a strong culture of reliability and automation

Required Skills:

- Programming/scripting experience for task automation (Python preferred)
- Hands-on experience with observability tools like Prometheus and Grafana
- Experience with version control (Bitbucket, GitHub), issue tracking (Jira), CI tools (Jenkins, GitHub Actions, Azure DevOps)
- Familiarity with automated testing and deployment pipelines
- Strong interpersonal and communication skills
- Proven collaboration capabilities within technical stakeholder groups

Preferred Skills:

- Familiarity with SRE principles such as SLOs, error budgets, toil reduction, and blameless postmortems
- Experience with containerization technologies such as Docker and orchestration tools like Kubernetes
- Prior exposure to large-scale development environments or developer tooling platforms

Certifications:

[Not Specified – Relevant certifications in Linux, Python, Kubernetes, or SRE practices are a plus]

Education:

Bachelor’s degree in computer science, Engineering, or related field (preferred)

Email ID * This field is required Please enter valid emailId.

Cell phone * This field is required Please enter valid cell phone.

First Name * This field is required Please enter valid first name.

Last Name * This field is required Please enter valid last name.

#J-18808-Ljbffr



  • Montreal, Canada ApTask Full time

    Direct message the job poster from ApTask Looking for an intermediate between 2 to 5 years' experience. The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services clients ServiceNow SaaS implementation. Reporting to a Site Reliability...


  • Montreal, Quebec, Canada Open Systems Technologies Full time

    Job Title: Site Reliability EngineerLocation: Montreal – Hybrid – 3 days/weekTerm: 12 months contract plus extensionThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client's ServiceNow SaaS implementation. Reporting to a Site...


  • Montreal, Canada Compunnel, Inc. Full time

    Client is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global ServiceNow SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise...


  • Montreal, Canada Compunnel, Inc. Full time

    We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global...


  • Montreal, Canada Compunnel, Inc. Full time

    We are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global...


  • Montreal, Canada Compunnel, Inc. Full time

    Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in...


  • Montreal, Canada AKUR8 Full time

    Site Reliability Engineer – AKUR8 – Paris, Île-de-France, France Overview Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insurers’ pricing processes. With teams in 8 global cities and over 320 clients...


  • Montreal, Canada AKUR8 Full time

    Site Reliability Engineer – AKUR8 – Paris, Île-de-France, France Overview Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insurers’ pricing processes. With teams in 8 global cities and over 320 clients...


  • Montreal (administrative region), Canada Canonical Full time

    Site Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...


  • Montreal (administrative region), Canada TMC Canada Full time

    Summary The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role requires delivering a range of SRE practices...