Site Reliability Engineer
4 weeks ago
Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in scaling and maintaining high-performing systems, ensuring system resilience, and working closely with developers to maximize productivity while minimizing manual operational effort. Job Responsibilities: Gain and maintain full-stack knowledge of Morgan Stanley’s development environment Ensure maximum availability and performance of systems through architecture reviews, problem management, and plant optimization Automate plant management tasks and develop tools to reduce operational effort and support costs Identify and address technical debt that impacts developer productivity or system reliability Collaborate with other SREs across Application Infrastructure to implement shared solutions Troubleshoot complex issues across the full development stack Enhance Ops team product knowledge to reduce issue escalation rates Consult with internal developer clients to help troubleshoot and optimize use of Client tooling Experiment with emerging technologies, tools, and techniques to improve operations Participate in a global on-call rotation with compensatory time-off Champion operational responsiveness and a strong culture of reliability and automation Required Skills: Programming/scripting experience for task automation (Python preferred) Hands-on experience with observability tools like Prometheus and Grafana Experience with version control (Bitbucket, GitHub), issue tracking (Jira), CI tools (Jenkins, GitHub Actions, Azure DevOps) Familiarity with automated testing and deployment pipelines Strong interpersonal and communication skills Proven collaboration capabilities within technical stakeholder groups Preferred Skills: Familiarity with SRE principles such as SLOs, error budgets, toil reduction, and blameless postmortems Experience with containerization technologies such as Docker and orchestration tools like Kubernetes Prior exposure to large-scale development environments or developer tooling platforms Certifications: (Not Specified – Relevant certifications in Linux, Python, Kubernetes, or SRE practices are a plus) Education: Bachelor’s degree in computer science, Engineering, or related field (preferred) Email ID * This field is required Please enter valid emailId. Cell phone * This field is required Please enter valid cell phone. First Name * This field is required Please enter valid first name. Last Name * This field is required Please enter valid last name. #J-18808-Ljbffr
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada ApTask Full timeDirect message the job poster from ApTask Looking for an intermediate between 2 to 5 years' experience. The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services clients ServiceNow SaaS implementation. Reporting to a Site Reliability...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada Botpress Full time3 weeks ago Be among the first 25 applicants Help bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel Inc. Full timeSite Reliability Engineer – KUMDC Long Term Contract The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to drive reliability engineering, operations, and customer support services for ServiceNow SaaS implementation. Reporting to the Site Reliability Engineering & Operations Lead, this role involves delivering SRE...
-
Site Reliability Engineer
3 weeks ago
Montreal, Canada Compunnel Inc. Full timeSite Reliability Engineer – KUMDC5681698 Long Term Contract The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to drive reliability engineering, operations, and customer support services for ServiceNow SaaS implementation. Reporting to the Site Reliability Engineering & Operations Lead, this role involves delivering...
-
Site Reliability Engineer
1 day ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
3 days ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
1 week ago
Montreal, Canada Compunnel, Inc. Full timeClient is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global ServiceNow SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada Compunnel, Inc. Full timeClient is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global ServiceNow SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada Botpress, Inc. Full timeHelp bring AI agents to companies worldwide. Over the next decade, autonomous agents will redefine how we work. Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic. Our product works today and at scale, across industries, regions, and limitless use cases. As the 3rd fastest-growing B2B AI...
-
Site Reliability Engineer
4 weeks ago
Montreal, Canada Compunnel, Inc. Full timeWe are seeking a Site Reliability Engineer (SRE) to support and enhance the reliability engineering, operations, and customer support for our ServiceNow SaaS platform. This is a hybrid role combining automation, process improvement, and production support with a strong emphasis on building and maintaining reliable and scalable systems. As part of a global...