Senior Live-Ops Site Reliability Engineer
2 days ago
Senior Live-Ops Site Reliability Engineer
Location: Remote (Anywhere in Canada)
Company Overview
eDynamic Learning is celebrating 16 years of serving educators. Founded by a classroom teacher, we're on a mission to empower educators with accessible and equitable resources, guiding students on their journey to life after graduation. We are dedicated to supporting both teachers and programs that facilitate student exploration of interests, career options, and skill acquisition through Career and Technical Education (CTE). We prioritize quality and the development of vital life readiness skills, including interpersonal communication and financial literacy.
Our commitment to fostering exploration starts early, with resources tailored to middle school students. Our rich courseware catalog and Learning Blade resource have a proven track record of expanding STEM, computer science, and career interest and awareness.
As the largest publisher of CTE and elective digital curriculum in North America, we offer a vast catalog of over 250 courses spanning grades 6-12. Our CTE pathway curriculum aligns to 14 career clusters, preparing students for nearly 100 industry certifications. To help bring our curriculum to learners, we provide professional development as well as virtual instructional services, supported by certified teachers, that facilitate personalized learning.
eDynamic Learning doesn't stop at coursework alone. We are passionate about helping students grow their skills through experiential learning through our Knowledge Matters virtual simulation instructional materials and projects. Our simulations are true hands-on learning in a virtual environment.
We take pride in the fact that our solutions and services are designed to empower educators and students alike, enabling them to take a transformative journey of exploration, engage in learning, and participate in real-world experiences.
In July 2025, eDynamic Learning was acquired by Pearson.
Role Overview
We are seeking a Senior Live-Ops Site Reliability Engineer (SRE) to ensure the performance, reliability, and scalability of eDynamic Learning's platforms and services.
In this role, you will be a key member of the engineering operations team, responsible for maintaining uptime, optimizing production systems, and building automation that scales. You'll work closely with software engineering, DevOps, and infrastructure teams to deliver seamless and reliable experiences for students and educators across North America.
This position combines hands-on engineering, systems design, and incident management in a mission-driven, fast-paced environment.
Responsibilities
- Own the availability, reliability, and performance of production systems and services
- Design and maintain scalable infrastructure to support high-traffic educational applications
- Build monitoring, alerting, and observability systems to proactively detect and resolve issues
- Lead incident response and postmortem processes to improve resilience and reduce downtime
- Develop automation tools and scripts to streamline deployments, operations, and recovery
- Collaborate closely with engineering and DevOps teams to design and implement fault-tolerant systems
- Continuously refine CI/CD pipelines and deployment processes for speed and safety
- Champion best practices in infrastructure-as-code (IaC), security, and configuration management
- Partner with development teams to ensure reliable service releases and smooth rollouts
- Analyze capacity trends and system performance to plan for future growth
- Mentor junior engineers and contribute to an operational culture of transparency, ownership, and continuous learning
Ideal Qualifications
- Bachelor's Degree in Computer Science or equivalent experience
- 8+ years of experience in systems engineering, DevOps, or Site Reliability Engineering roles
- Proven experience managing mission-critical, high-availability production environments
- Strong background in Linux systems administration and performance tuning
- Expertise with AWS infrastructure and related services
- Proficiency with Docker, Kubernetes, and infrastructure-as-code tools such as Terraform or CloudFormation
- Solid programming/scripting skills in Python, Bash, or similar
- Experience with CI/CD pipelines, deployment automation, and Git-based workflows
- Deep understanding of networking, HTTP, and distributed systems principles
- Familiarity with monitoring and observability tools (Datadog, Prometheus, Grafana, etc.)
- Legally eligible to work in Canada and/or the U.S.
Skills
- Self-starter who thrives in a remote, fast-paced environment
- Strong problem-solving and debugging skills
- Excellent communication and collaboration abilities
- Strong incident management, root cause analysis, and troubleshooting skills
WD0S4z4h2q
-
Senior Site Reliability Engineer
2 days ago
Canada (Remote) Glia Full time $120,000 - $180,000 per yearAbout GliaGlia is the leading AI customer service solution for banks and credit unions. Our platform unifies AI and human agents across every voice and digital conversation through our proprietary ChannelLess Architecture. With AI for All, organizations overcome the tradeoff between efficiency and experience by using AI to automate conversations and elevate...
-
Senior Site Reliability Engineer
2 days ago
Remote, Canada INNOSPHERE SDG LTD. Full time $120,000 - $140,000 per yearOur ideal candidate is able to work with our product and platform engineering teams to create reliable, repeatable, and performant infrastructure, as well as facilitate product deployments into that infrastructure. Our primary focus is within public cloud, where we primarily utilize AWS. Engineers must be familiar with cloud native approaches, including but...
-
Staff Site Reliability Engineer
1 week ago
Remote - USA & Canada Boulevard Full time US$181,125 - US$258,750 per yearWho is Boulevard? Boulevard provides the first and only client experience platform for appointment-based, self-care businesses. We empower our customers to give their clients more of the magical moments that matter most.Before launching in 2016, our founders spent months interviewing salon managers and working behind front desks to understand their pain...
-
Site Reliability Engineer III
2 days ago
Remote (ON, Canada) Guidewire Full time $120,000 - $150,000 per yearSummaryAt Guidewire, we deliver the software that Property and Casualty (P&C) insurance companies rely on to protect their customers during crises, natural disasters, accidents, and cyber risks. Our core applications enable insurers to sell and underwrite policies, settle claims, and bill their customers. We also offer a suite of innovative products for data...
-
Remote - United States, Remote - Canada Paxos Full time US$150,000 - US$250,000 per yearAbout Paxos Today's financial infrastructure is archaic, expensive, inefficient and risky — supporting a system that leaves out more people than it lets in. So we're rebuilding it. We're on a mission to open the world's financial system to everyone by enabling the instant movement of any asset, any time, in a trustworthy way. For over a decade, we've...
-
Remote, Americas; Remote, Canada GitLab Full time US$124,300 - US$266,400 per yearGitLab is an open-core software company that develops the most comprehensive AI-powered DevSecOps Platform, used by more than 100,000 organizations. Our mission is to enable everyone to contribute to and co-create the software that powers our world. When everyone can contribute, consumers become contributors, significantly accelerating human progress. Our...
-
Site Reliability Engineer
1 week ago
Remote CA BC Red Hat Full time $125,000 - $175,000 per yearAbout the JobWe're seeking an Site Reliability Engineer (SRE) with passion for maintaining highly reliable cloud-based services. In this role, you will support Red Hat's software manufacturing services on our hybrid cloud infrastructure. You will partner with development, quality engineering and release engineering colleagues to support the health and...
-
FedRAMP Site reliability Engineer
1 week ago
Remote, Ontario, Canada Confluent Full time $120,000 - $180,000 per yearWe're not just building better tech. We're rewriting how data moves and what the world can do with it. With Confluent, data doesn't sit still. Our platform puts information in motion, streaming in near real-time so companies can react faster, build smarter, and deliver experiences as dynamic as the world around them.It takes a certain kind of person to join...
-
Cloud Site Reliability Engineer
9 hours ago
Toronto, Ontario / Remote, Canada Smile Digital Health Full time US$1,000,000 - US$1,440,000 per yearWorking for a company like Smile Digital Health means supporting our mandate for #BetterGlobalHealth. We strive towards this goal every day, and the results can be seen in the impact of our innovative health data platform and data management solutions, which are used in over 20 countries. We were #19 on Deloitte's Technology Fast 50 Ranking for Smile...
-
Principal Site Reliability Engineer
1 week ago
Remote CA BC Red Hat Full time $900,000 - $1,200,000 per yearAbout the JobWe're seeking an Site Reliability Engineer (SRE) with passion for maintaining highly reliable cloud-based services. In this role, you will support Red Hat's software manufacturing services on our hybrid cloud infrastructure. You will partner with development, quality engineering and release engineering colleagues to support the health and...