Site Reliability Engineer
3 weeks ago
Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in scaling and maintaining high-performing systems, ensuring system resilience, and working closely with developers to maximize productivity while minimizing manual operational effort.
Job Responsibilities:
- Gain and maintain full-stack knowledge of Morgan Stanley’s development environment
- Ensure maximum availability and performance of systems through architecture reviews, problem management, and plant optimization
- Automate plant management tasks and develop tools to reduce operational effort and support costs
- Identify and address technical debt that impacts developer productivity or system reliability
- Collaborate with other SREs across Application Infrastructure to implement shared solutions
- Troubleshoot complex issues across the full development stack
- Enhance Ops team product knowledge to reduce issue escalation rates
- Consult with internal developer clients to help troubleshoot and optimize use of Client tooling
- Experiment with emerging technologies, tools, and techniques to improve operations
- Participate in a global on-call rotation with compensatory time-off
- Champion operational responsiveness and a strong culture of reliability and automation
Required Skills:
- Programming/scripting experience for task automation (Python preferred)
- Hands-on experience with observability tools like Prometheus and Grafana
- Experience with version control (Bitbucket, GitHub), issue tracking (Jira), CI tools (Jenkins, GitHub Actions, Azure DevOps)
- Familiarity with automated testing and deployment pipelines
- Strong interpersonal and communication skills
- Proven collaboration capabilities within technical stakeholder groups
Preferred Skills:
- Familiarity with SRE principles such as SLOs, error budgets, toil reduction, and blameless postmortems
- Experience with containerization technologies such as Docker and orchestration tools like Kubernetes
- Prior exposure to large-scale development environments or developer tooling platforms
Certifications:
[Not Specified – Relevant certifications in Linux, Python, Kubernetes, or SRE practices are a plus]
Education:
Bachelor’s degree in computer science, Engineering, or related field (preferred)
Email ID * This field is required Please enter valid emailId.
Cell phone * This field is required Please enter valid cell phone.
First Name * This field is required Please enter valid first name.
Last Name * This field is required Please enter valid last name.
#J-18808-Ljbffr
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada ApTask Full timeDirect message the job poster from ApTask Looking for an intermediate between 2 to 5 years' experience. The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services clients ServiceNow SaaS implementation. Reporting to a Site Reliability...
-
Site Reliability Engineer
3 hours ago
Montreal, Quebec, Canada Open Systems Technologies Full timeJob Title: Site Reliability EngineerLocation: Montreal – Hybrid – 3 days/weekTerm: 12 months contract plus extensionThe Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client's ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel, Inc. Full timeClient is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global ServiceNow SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel, Inc. Full timeWe are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel, Inc. Full timeWe are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel, Inc. Full timeClient’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada AKUR8 Full timeSite Reliability Engineer – AKUR8 – Paris, Île-de-France, France Overview Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insurers’ pricing processes. With teams in 8 global cities and over 320 clients...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada AKUR8 Full timeSite Reliability Engineer – AKUR8 – Paris, Île-de-France, France Overview Akur8 is a fast-growing Insurtech scale‑up that transforms insurance pricing and reserving with transparent machine learning. Our SaaS platform injects speed, performance and reliability into insurers’ pricing processes. With teams in 8 global cities and over 320 clients...
-
Site Reliability Engineer
4 weeks ago
Montreal (administrative region), Canada Canonical Full timeSite Reliability Engineer Canonical is a leading provider of open source software and operating systems to the global enterprise and technology markets. Our platform, Ubuntu, is widely used in breakthrough enterprise initiatives such as public cloud, data science, AI, engineering innovation, and IoT. Our customers include the world's leading public cloud and...
-
Site Reliability Engineer
3 weeks ago
Montreal (administrative region), Canada TMC Canada Full timeSummary The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services for Morgan Stanley's ServiceNow SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead. This role requires delivering a range of SRE practices...