Senior Site Reliability Engineer

5 days ago


Canada Menlo Ventures Full time

Senior Site Reliability Engineer About the Company Clarifai is a leading, compute orchestration AI platform specializing in computer vision and generative AI. We empower organizations to transform unstructured image, video, text, and audio data into actionable insights, significantly faster and more accurately than manual processes. Founded in 2013 by Matt Zeiler, Ph.D., Clarifai has been at the forefront of AI innovation since achieving the top five placements in the 2013 ImageNet Challenge. Our diverse, globally distributed team operates across the United States, Canada, Estonia, Argentina, and India. We have secured $100M in funding, including a $60M Series C round, backed by industry leaders such as Menlo Ventures, Union Square Ventures, Lux Capital, NEA, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm, and Osage. Clarifai is proud to be an equal-opportunity workplace committed to building and maintaining a diverse and inclusive team. Your Impact Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges. You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments. The Opportunity Ensure the smooth operation and high availability of Clarifai's core services Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency Develop Kubernetes resources and custom tooling for seamless cloud and on-premise deployments Design and implement scalable, secure, and cost-effective infrastructure solutions. Partner with teams across the organization to identify & solve engineering challenges Requirements BS/BA in Computer Science or related degree Good knowledge of cloud providers (AWS, GCP or similar) Expertise with Kubernetes (EKS, GKE, self-hosted) and Infrastructure as Code using Terraform, Helm Solid understanding of web and networking (HTTP, TLS, DNSadena, etc) Experience with CI/CD pipelines using tools such as GitHub Actions, ArgoCD, and Atlantis Strong interpersonal skills working with teams across different time zones and regions Great to Have Knowledge of basic Microservice Architecture principles Familiarity with security best practices for cloud-based systems. Experience with relational databases, message queues, key value stores Experience writing python, golang, or any other popular programming language Familiarity with any RPC framework Experience developing & building custom Kubernetes operators #J-18808-Ljbffr



  • , , Canada Thinkific Full time

    Join to apply for the Senior Site Reliability Engineer role at Thinkific Join to apply for the Senior Site Reliability Engineer role at Thinkific Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for a Senior Site Reliability Engineer to join us at Thinkific. We’re looking for a Senior Site Reliability Engineer...


  • , , Canada DuckDuckGo Full time

    6 days ago Be among the first 25 applicants Get AI-powered advice on this job and more exclusive features. Who We AreHi, we're DuckDuckGo, the online protection company and remote-first team of 300+ on a mission to raise the standard of trust online. Founded in 2008 and profitable since 2014, our annual revenue now exceeds $100 million USD. Millions use our...


  • , , Canada Sage Recruiting Inc. Full time

    This range is provided by Sage Recruiting Inc.. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$180,000.00/yr - CA$200,000.00/yr Senior Site Reliability Engineer (Founding Role) Location: Canada About the Role This team is building a brand-new fintech platform from the ground up and is...


  • , , Canada TextNow Full time

    This range is provided by TextNow. Your actual pay will be based on your skills and experience — talk with your recruiter to learn more. Base pay range CA$113,400.00/yr - CA$162,000.00/yr We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that\'s because we\'re made up of...


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 245210Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.The TeamGlobal Banking and Markets Engineering (GBME) is the fast-moving, award-winning technology engine that powers Scotiabank's Corporate, Investment Banking and Capital Markets businesses.The RoleGBME is searching for a Site...


  • , BC, Canada Orion Innovation Full time

    Overview Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments. Responsibilities System Architecture & Management: Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools...


  • (s): Canada : Ontario : Toronto Scotiabank Global Site Full time

    Requisition ID: 244026Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.Overview: As a Site Reliability Engineer (SRE), you will join the Digital Engineering Operations team, responsible for ensuring the operations and reliability of Scotiabank digital applications. You will have the opportunity to drive...


  • , , Canada Bitcomplete Full time

    Join us as a Senior Site Reliability Engineer to help us run an industry-scale GPU cluster via Kubernetes. Together with senior members of our team, you will combine your strong understanding of system scaling and security practices with your cloud-native expertise to stand up and maintain Kubernetes clusters from scratch. Your role will also be pivotal in...


  • , , Canada Paxos Full time

    About Paxos Today’s financial infrastructure is archaic, expensive, inefficient and risky — supporting a system that leaves out more people than it lets in. So we’re rebuilding it. We’re on a mission to open the world’s financial system to everyone by enabling the instant movement of any asset, any time, in a trustworthy way. For over a decade,...


  • , , Canada Medium Full time

    We believe communication belongs to everyone. We exist to democratize phone service. TextNow is evolving the way the world connects and that’s because we’re made up of people with curious minds who bring an optimistic, yet critical lens into the work we do. We’re the largest provider of free phone service in the nation. And we’re just getting...