Senior Live-Ops Site Reliability Engineer

3 weeks ago


Canada eDynamic Learning Full time

Senior Live-Ops Site Reliability Engineer Location: Remote (Anywhere in Canada) Company Overview eDynamic Learning is celebrating 16 years of serving educators. Founded by a classroom teacher, we’re on a mission to empower educators with accessible and equitable resources, guiding students on their journey to life after graduation. We are dedicated to supporting teachers and programs that facilitate student exploration of interests, career options, and skill acquisition through Career and Technical Education (CTE). We prioritize quality and the development of vital life readiness skills, including interpersonal communication and financial literacy. Our commitment to fostering exploration starts early, with resources tailored to middle school students. Our rich courseware catalog and Learning Blade resource have a proven track record of expanding STEM, computer science, and career interest and awareness. As the largest publisher of CTE and elective digital curriculum in North America, we offer a vast catalog of over 250 courses spanning grades 6-12. Our CTE pathway curriculum aligns to 14 career clusters, preparing students for nearly 100 industry certifications. To help bring our curriculum to learners, we provide professional development as well as virtual instructional services, supported by certified teachers, that facilitate personalized learning. eDynamic Learning doesn’t stop at coursework alone. We are passionate about helping students grow their skills through experiential learning via our Knowledge Matters virtual simulation instructional materials and projects. Our simulations are true hands‑on learning in a virtual environment. We take pride in the fact that our solutions and services are designed to empower educators and students alike, enabling them to take a transformative journey of exploration, engage in learning, and participate in real‑world experiences. In July 2025, eDynamic Learning was acquired by Pearson. Role Overview We are seeking a Senior Live‑Ops Site Reliability Engineer (SRE) to ensure the performance, reliability, and scalability of eDynamic Learning’s platforms and services. In this role, you will be a key member of the engineering operations team, responsible for maintaining uptime, optimizing production systems, and building automation that scales. You’ll work closely with software engineering, DevOps, and infrastructure teams to deliver seamless and reliable experiences for students and educators across North America. This position combines hands‑on engineering, systems design, and incident management in a mission‑driven, fast‑paced environment. Responsibilities Own the availability, reliability, and performance of production systems and services Design and maintain scalable infrastructure to support high‑traffic educational applications Build monitoring, alerting, and observability systems to proactively detect and resolve issues Lead incident response and postmortem processes to improve resilience and reduce downtime Develop automation tools and scripts to streamline deployments, operations, and recovery Collaborate closely with engineering and DevOps teams to design and implement fault‑tolerant systems Continuously refine CI/CD pipelines and deployment processes for speed and safety Champion best practices in infrastructure‑as‑code (IaC), security, and configuration management Partner with development teams to ensure reliable service releases and smooth rollouts Analyze capacity trends and system performance to plan for future growth Mentor junior engineers and contribute to an operational culture of transparency, ownership, and continuous learning Ideal Qualifications Bachelor’s Degree in Computer Science or equivalent experience 8+ years of experience in systems engineering, DevOps, or Site Reliability Engineering roles Proven experience managing mission‑critical, high‑availability production environments Strong background in Linux systems administration and performance tuning Expertise with AWS infrastructure and related services Proficiency with Docker, Kubernetes, and infrastructure‑as‑code tools such as Terraform or CloudFormation Solid programming/scripting skills in Python, Bash, or similar Experience with CI/CD pipelines, deployment automation, and Git‑based workflows Deep understanding of networking, HTTP, and distributed systems principles Familiarity with monitoring and observability tools (Datadog, Prometheus, Grafana, etc.) Legally eligible to work in Canada and/or the U.S. Skills Self‑starter who thrives in a remote, fast‑paced environment Strong problem‑solving and debugging skills Excellent communication and collaboration abilities Strong incident management, root cause analysis, and troubleshooting skills Seniority level Mid‑Senior level Employment type Full‑time Job function Engineering and Information Technology Industries Internet Publishing #J-18808-Ljbffr



  • , , Canada Targeted Talent Full time

    Overview We are looking for an experienced Senior Site Reliability Engineer for our client. This is a permanent position that is remote to start with later relocation to Calgary or Winnipeg . Our client is a global enterprise company with a product that you've likely used. Experience with coding/software development, along with Site Reliability will be the...


  • , , Canada Thinkific Full time

    Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...


  • , , Canada Akamai Technologies Full time

    Senior Site Reliability Engineer Join Akamai Technologies as we build a reliable, secure, and scalable Internet. We are looking for a Senior Site Reliability Engineer to help us solve complex performance and reliability challenges. Job Description Are you passionate about cutting‑edge technology and ready to tackle some of the Internet’s most difficult...


  • , , Canada DuckDuckGo Full time

    6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...


  • , , Canada Orion Innovation Full time

    Job Description: Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical...


  • , , Canada TextNow Full time

    This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...


  • , , Canada Orion Innovation Full time

    We are seeking a highly specialized and experienced Senior Site Reliability Engineer (SRE) to drive the reliability, performance, and automation of our core platform. This role requires an exceptional blend of deep programming expertise in both Ruby and Go , coupled with hands‑on mastery of Linux systems, advanced networking concepts (specifically IPSec),...


  • , , Canada Orion Innovation Full time

    Senior Site Reliability Engineer (SRE) with Kubernetes & Rancher Location: Canada - Remote (Working EST hours) Job Type: Full-time About the Role Are you an exceptional Site Reliability Engineer with a passion for building and maintaining highly resilient and secure systems? We are seeking a Senior SRE to join our team and play a critical role in managing...


  • , , Canada TekRek Full time

    This range is provided by TekRek. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$90.00/hr - CA$120.00/hr Senior Site Reliability Engineer – Distributed Systems, Kubernetes, AWS/GCP The Company TekRek has partnered with a fast‑scaling AI infrastructure company building one of the...


  • , , Canada D-Wave Full time

    Join to apply for the Senior Site Reliability Engineer role at D‑Wave . D‑Wave (NYSE: QBTS) is a leader in the development and delivery of quantum computing systems, software, and services. We are the world’s first commercial supplier of quantum computers, and the only company building both annealing and gate‑model quantum computers. Our mission is...