Senior Site Reliability Engineer
2 weeks ago
1 week ago Be among the first 25 applicants Orion Innovation is a premier, award‑winning, global business and technology services firm. Orion delivers game‑changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education. Role Senior Site Reliability Engineer (SRE) Type Remote - working EST Hours Clearance Requirement Must be eligible for up to a Top-Secret Security Clearance. Job Summary The Senior Site Reliability Engineer (SRE) will play a critical, hands‑on role in ensuring the reliability, scalability, and performance of systems supporting highly classified government projects within an air‑gapped deployment. This position demands expertise in both DevOps methodologies and deep coding skills to maintain uptime, resilience, and stringent compliance in a secure, disconnected environment. Key Responsibilities Develop robust automation, configuration management, and toolsets primarily using Ruby and Shell Scripting (CLI/PowerShell) to manage infrastructure and deployment pipelines (Git/Infrastructure Automation). Implement and manage advanced observability solutions with Grafana and Prometheus, along with Splunk and Elastic, to monitor system health and proactively identify issues in an air‑gapped setting. Collaborate closely with the Lead, Infrastructure, and Security Specialists to rapidly resolve incidents and significantly improve overall system resilience. Create and maintain comprehensive documentation for system configurations, runbooks, and disaster recovery procedures tailored for a classified environment. Contribute an intermediate level of proficiency in Go to team projects and codebase. Must‑Have Requirements 8+ years of experience in DevOps OR SRE using Ruby for writing robust automation and tooling. Observability and monitoring with Grafana and Prometheus. CLI tools including Shell Scripting and/or PowerShell for operational tasks. Deploying Kubernetes in production environments. Git and various Infrastructure Automation tools. Deep administrative experience with Linux operating systems. Nice‑to‑Have Requirements Experience with Go programming language. Prior experience in government or defense‑related SRE roles. Experience with Python for scripting and data analysis. Familiarity with packaging and deployment using Helm. Knowledge of network security protocols, specifically IPSec. Seniority Level Mid‑Senior Level Employment Type Full‑time Job Function Engineering and Information Technology Industries IT Services and IT Consulting Equal Opportunity Employer Orion is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, citizenship status, disability status, genetic information, protected veteran status, or any other characteristic protected by law. Candidate Privacy Policy What information we collect during our application and recruitment process and why we collect it; How we handle that information; and How to access and update that information. Your use of Orion services is governed by any applicable terms in this notice and our general Privacy Policy. #J-18808-Ljbffr
-
Senior Site Reliability Engineer
2 weeks ago
Montreal West, Canada Orion Innovation Full time1 week ago Be among the first 25 applicants Orion Innovation is a premier, award‑winning, global business and technology services firm. Orion delivers game‑changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide...
-
Site Reliability Engineer
3 days ago
Montreal, Canada ApTask Full timeDirect message the job poster from ApTask Looking for an intermediate between 2 to 5 years' experience. The Application Infrastructure (Al) department is seeking a Site Reliability Engineer (SRE) to help drive the reliability engineering, operations and customer support services clients ServiceNow SaaS implementation. Reporting to a Site Reliability...
-
Site Reliability Engineer
3 days ago
Montreal, Canada Botpress Full time3 weeks ago Be among the first 25 applicants Help bring AI agents to companies worldwide.Over the next decade, autonomous agents will redefine how we work.Botpress allows companies to build and deploy advanced AI agents that move beyond conversation into real business logic.Our product works today and at scale, across industries, regions, and limitless use...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada Open Systems Technologies Full timeSite Reliability Engineer (SRE), ServiceNow, Application Infrastructure Location: Montreal – Hybrid – 3 days/week The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s ServiceNow SaaS implementation. Reporting to a Site...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeDirect message the job poster from LanceSoft, Inc. Site Reliability Engineer Job Title: Site Reliability Engineer Experience Level: Level 4 (advanced): 7-15 years Location: Montreal (Day 1 onboarding onsite / in office presence 3x week) Duration: 12+ months contract Primary Responsibilities: Provide L3 support for ***'s private cloud, including on-call...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeDirect message the job poster from LanceSoft, Inc. Site Reliability Engineer Job Title: Site Reliability Engineer Experience Level: Level 4 (advanced): 7-15 years Location: Montreal (Day 1 onboarding onsite / in office presence 3x week) Duration: 12+ months contract Primary Responsibilities: Provide L3 support for ***'s private cloud, including on-call...
-
Site Reliability Engineer
2 weeks ago
Montreal, Canada LanceSoft, Inc. Full timeDirect message the job poster from LanceSoft, Inc. Site Reliability Engineer Job Title: Site Reliability Engineer Experience Level: Level 4 (advanced): 7-15 years Location: Montreal (Day 1 onboarding onsite / in office presence 3x week) Duration: 12+ months contract Primary Responsibilities: Provide L3 support for ***'s private cloud, including on-call...
-
Senior Site Reliability Engineer
1 day ago
Montreal, Canada Orion Innovation Full timeOrion Innovation is a premier, award-winning, global business and technology services firm. Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries...