Site Reliability Engineer
3 weeks ago
Site Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime. As they scale internationally, they are building a dedicated Site Reliability team to strengthen observability, automation, and uptime across a fleet of remote devices. In this high-visibility role you will be the guardian of system reliability, owning incident response and long-term reliability engineering for mission-critical edge deployments. Your work will directly enable factories, energy sites, and transportation hubs to run with confidence around the clock. Key Responsibilities Act as first responder during the 24x7 on-call rotation, triaging and resolving production incidents across Linux-based edge devices and cloud services. Lead root-cause analysis and deliver durable fixes that eliminate classes of failures. Build and tune dashboards, alerts, and health checks using Prometheus, Grafana, and log aggregation tools for real-time fleet visibility. Automate operational tasks with Python or Bash to reduce toil and improve response times. Evolve CI/CD pipelines, configuration management, and infrastructure-as-code to support reliable, repeatable deployments. Run load tests, network validation, and hardware burn-in to surface issues pre-production. Create concise SOPs, runbooks, and post-incident reports that raise the bar for operational excellence. Partner with software, hardware, and customer-success teams to embed reliability best practices early in the development lifecycle. What You'll Need to Succeed Strong hands-on Linux administration experience (Ubuntu or embedded distributions) and comfort working with ARM-based systems. Proficiency in a scripting language such as Python or Bash for automation and diagnostics. Solid networking fundamentals (TCP/IP, routing, DNS, VPNs, VLANs, firewalls) and familiarity with tools like tcpdump or nmap. Experience operating modern observability stacks (Prometheus, Grafana, ELK/EFK, or Loki) and container technologies such as Docker. Proven ability to troubleshoot distributed systems under pressure and communicate findings clearly to technical and non-technical stakeholders. Willingness to share on-call responsibilities that span evenings, weekends, and holidays on a rotational basis. Nice-to-Have Extras Exposure to GPU-accelerated, computer-vision, or machine-learning workloads. Familiarity with embedded edge hardware platforms and industrial automation protocols. Prior SRE, DevOps, or Systems Engineering experience supporting always-on, customer-facing solutions. Experience writing customer-facing operational documentation or SOPs. Work Environment & Schedule 100 percent remote within Canada. Core coverage needs are 9:00 a.m. – 9:00 p.m. Eastern Time; on-call rotation is shared globally for true 24x7 support. Standard 40-hour workweek with flexibility to swap shifts inside the team. Compensation & Benefits Competitive salary plus bonus eligibility. Choice of full-time employment or contract engagement, with comprehensive health benefits available through our employer-of-record partner. Expense coverage for approved home-office and professional-development costs. Opportunity to work with cutting-edge AI and edge-computing technology in a high-impact role. Why Join You will be the reliability champion for a product that makes real-world industrial sites safer and smarter every day. If you love digging into complex systems, writing clean automation, and seeing your work translate into measurable uptime for customers, we would love to meet you. About Blue Signal: Blue Signal is an award-winning, executive search firm specializing in various specialties. Our recruiters have a proven track record of placing top-tier talent across industry verticals, with deep expertise in numerous professional services. Learn more at bit.ly/46Gs4yS
-
Site Reliability Engineer
5 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
5 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
3 weeks ago
, , Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3 items to see on resumes 5+ years of experience in DevOps, Site...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford...
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....