Senior SRE — Scale AI Infra with Kubernetes

1 week ago


Canada TekRek Full time

An AI infrastructure company in Canada is looking for a Senior Site Reliability Engineer to build and maintain systems supporting data and AI workloads. You will automate system operations and develop tooling for incident management. Ideal candidates have a deep understanding of distributed systems, experience with cloud technologies (AWS/GCP), and a passion for solving complex problems. This role offers competitive pay and the chance to work in a fast-paced, innovative environment.#J-18808-Ljbffr



  • , , Canada High Tech Genesis Full time

    Overview Join to apply for the SRE-DevSecOps Engineer role at High Tech Genesis Allowed Staffing Countries: Canada, Costa Rica, Mexico or Brazil, (Remote) Term: Contract High Tech Genesis is seeking a 3-month contractor who can hit the ground running to support our SaaS platform on AWS. Responsibilities Kubernetes/EKS Operations – Manage, troubleshoot, and...


  • , , Canada Cohere AI Full time

    A leading AI infrastructure company in Canada is seeking a Staff Software Engineer to build and scale ML-optimized HPC infrastructure. You will work closely with AI researchers to ensure optimal performance of AI workloads and systems. The ideal candidate has deep expertise in ML infrastructure and strong skills in Kubernetes and Python. This role offers...


  • , , Canada MeshyAI Full time

    About Meshy Headquartered in Silicon Valley, Meshy is the leading 3D generative AI company on a mission to Unleash 3D Creativity by transforming the content creation pipeline. Meshy makes it effortless for both professional artists and hobbyists to create unique 3D assets—turning text and images into stunning 3D models in minutes. Our world-class team...


  • , , Canada Oscilar Full time

    Overview Join to apply for the DevOps/Site Reliability Engineer (SRE) role at Oscilar . Get AI-powered advice on this job and more exclusive features. Shape the future of trust in the age of AI At Oscilar, we're building the most advanced AI Risk Decisioning Platform. Banks, fintechs, and digitally native organizations rely on us to manage their fraud,...


  • , , Canada Meshy Full time

    Overview Headquartered inSilicon Valley , Meshy is the leading 3D generative AI company on a mission toUnleash 3D Creativity by transforming the content creation pipeline. Meshy makes it effortless for both professional artists and hobbyists to create unique 3D assets—turning text and images into stunning 3D models in just minutes. Our world-class team of...


  • , , Canada Orion Innovation Full time

    A leading IT services company is looking for a Senior Site Reliability Engineer with expertise in Kubernetes and Rancher to ensure the reliability of mission-critical systems. This remote position, available in Canada, requires over 8 years of experience in SRE roles, particularly in secure environments. The role offers competitive compensation and the...


  • Toronto, Canada (Hybrid) Tubi Full time $120,000 - $180,000 per year

    About Tubi:Boldly built for every fandom, Tubi is a free streaming service that entertains over 100 million monthly active users. Tubi offers the world's largest collection of Hollywood movies and TV shows, thousands of creator-led stories and hundreds of Tubi Originals made for the most passionate fans. Headquartered in San Francisco and founded in 2014,...


  • , , Canada DataRobot, Inc. Full time

    A leading AI solutions provider is seeking a Senior Backend Engineer to join their AI Compute team in Canada. The role involves developing and supporting features, building secure micro-services, and ensuring quality in CI/CD pipelines. Ideal candidates should have extensive experience in Kubernetes and Python, with a strong grasp of automated testing and...


  • , , Canada HRB Full time

    A leading entertainment company seeks a Senior Site Reliability Engineer to enhance the performance and reliability of its infrastructure. The role involves managing cloud technologies, employing Kubernetes orchestration, and collaborating with teams for continuous improvement while leveraging AI technologies. Experience with Terraform, Oracle EBS, and...


  • , , Canada Orion Innovation Full time

    A cutting-edge technology firm is seeking a Senior Site Reliability Engineer to manage mission-critical infrastructure. This fully remote position requires 8+ years of experience and expertise in Kubernetes and observability tools like Prometheus and Grafana. The ideal candidate will thrive in challenging air-gapped environments and have a passion for system...