Site Reliability Engineer, India
2 weeks ago
We’re looking for highly motivated, passionate site reliability engineers to join our growing team. At evertz.io, our teams are building services used by major players in broadcast and media. Our services are hosted in AWS with a Serverless First mindset. As part of this role, you will help harden our multi-tenant SaaS platform. You will use best-in-class observability tooling to debug incidents and identify and implement improvements to ensure reliability. You will automate processes and build tools to reduce toil. Responsibilities Debug incidents and drive improvements to the multi-tenant SaaS platform using observability tooling. Translate SLOs and SLIs into actionable reliability improvements. Automate processes and build tooling to reduce toil and improve efficiency. Collaborate with cross-functional teams on reliability, monitoring, and incident response. Skills and experience you will bring At least 3 years of hands-on experience managing critical, high-availability production infrastructure with proven reliability and uptime improvements. Proficient in at least one programming language (such as Python, Java, or Rust), with experience designing and building production-quality automation or tools. At least 3 years working with monitoring, log aggregation, and observability platforms (e.g., Datadog, CloudWatch, Honeycomb, Splunk, New Relic) and using data-driven insights to resolve issues. Excellent analytical skills to understand end-to-end use cases, map system flows, debug complex issues, and anticipate failure points. Experience translating SLOs/SLIs into actionable improvements; strong focus on reliability, monitoring, and observability. At least 3 years with cloud technologies, particularly AWS (CloudFormation, Lambda, DynamoDB, SQS, SNS, EC2, S3, AWS CLI, Boto3). Solid foundation in Linux systems administration, networking, and security. Familiarity with CI/CD pipelines (e.g., Jenkins, AWS CodePipeline). Additional skills and experience that will make you standout Experience architecting and deploying serverless applications in cloud environments. Experience with infrastructure-as-code tools like Terraform or CloudFormation for reproducible environments. Participation in production on-call rotations and incident management. Expertise in performance optimization for core AWS services (Lambda, DynamoDB, API Gateway, SQS, EventBridge, EC2). Experience supporting systems with frequent, high-velocity deployments. Familiarity with security compliance frameworks (e.g., OWASP, ISO, CSA, PCI) and hands-on threat assessments and remediation. Security practices including penetration testing, threat modeling, and use of security tools. Experience with advanced deployment strategies (canary, A/B testing, blue/green, etc.). Hands-on experience with chaos engineering to improve fault tolerance. Track record of championing reliability, continuous improvement, and operational excellence. Experience and working arrangement Experience: 4 to 6 years + Education: Computer Science and Information Technology graduation Work mode: Remote/Hybrid Office Timing: 1pm to 9pm IST The Team The evertz.io Engineering Team builds next-generation systems for content management and distribution in the Media and Entertainment industry. Our technology stack includes a Serverless microservice architecture on AWS, with Python, Rust, and Java; UI using Angular, TypeScript, and NgRx; CI/CD involving AWS, Jenkins, Nexus, Bazel, and our release-management application. The team collaborates across regions in agile, low-bureaucracy environments, with opportunities for growth, mentorship, and continued learning. The team emphasizes trust, openness, and inclusivity. Please note , this email address will respond only to privacy concerns. When you apply to a job on this site, your personal data will be collected by Evertz Microsystems Ltd, located in Burlington, Ontario, Canada, and processed for recruitment-related activities in accordance with our privacy policy. Your data may be processed under applicable data protection laws. A complete privacy policy is available at Your personal data will be retained as long as necessary to evaluate your application. You may have rights regarding access, correction, erasure, and restriction of processing under applicable data protection laws. For regional rights, consult the privacy policy or contact the data protection officer via #J-18808-Ljbffr
-
Site Reliability Engineer
4 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
4 days ago
(s): Canada : Ontario : Toronto Scotiabank Global Site Full timeRequisition ID: 244027Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
, , Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3 items to see on resumes 5+ years of experience in DevOps, Site...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top 3...
-
Site Reliability Engineer
3 weeks ago
Canada SPECTRAFORCE Full timeJob Title: DevOps/Site Reliability Engineer Duration: 12+ months Locations: Ontario, Toronto, Vancouver, Montreal (100% remote) Core hours of the position: somewhat flexible, but able to attend meetings and collaborate with team members between 8 am Pacific and 3 pm Pacific. Team members are located in Pacific, Mountain, Central, and East time zones Top...
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford downtime....
-
Site Reliability Engineer
3 weeks ago
Canada Blue Signal Search Full timeSite Reliability Engineer Location: Remote, Canada Our client is a fast-growing provider of AI-driven edge-computing platforms that keep industrial operations safe, smart, and always on. Their distributed hardware and software suite processes high-volume video and sensor data at the edge, delivering real-time insight for customers who cannot afford...