Senior DevOps Engineer, ML Infrastructure

1 week ago


Montreal, Quebec, Canada Serve Robotics Full time

At Serve Robotics, we're reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It's designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.

The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We're looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.

Who We Are

We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.

As a Senior DevOps Engineer on the Machine Learning (ML) Infrastructure team, you will help design, build, and maintain our petabyte-scale data and ML platform that powers data partnerships, ML research, and autonomy engineering. You will play a key role in ensuring reliability, security, scalability, and performance across our internal systems, and maintain a suite of internal tools used by dozens of engineers. Your work will make a significant impact on our autonomous capabilities and act as a catalyst for the entire autonomy team, helping us train our next generation of ML models.

Responsibilities

  • Deploy and maintain our ML training orchestration system that operates across multiple platforms.
  • Manage cloud and on-premise environments for large-scale distributed data processing and ml training/inference systems.
  • Automate deployment pipelines, monitoring, and alerting for ML and data services.
  • Collaborate closely with data scientists, ML engineers, and autonomy teams to streamline experimentation and model deployment.
  • Maintain and improve CI/CD systems to support rapid development and testing.
  • Implement best practices for system security, reliability, and observability.
  • Optimize infrastructure costs and ensure efficient resource utilization.
  • Support internal developer productivity through tooling, documentation, and support.

Qualifications

  • Bachelor's or Master's degree in Computer Science, Engineering, or equivalent experience.
  • 5+ years of experience as a DevOps, SRE, or Infrastructure Engineer, preferably supporting ML or data-intensive systems.
  • Strong experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).
  • Proficiency in infrastructure-as-code tools such as Terraform or Helm.
  • Solid understanding of CI/CD systems (GitLab CI, Jenkins, ArgoCD, etc.).
  • Experience with Python and SQL
  • Experience with cloud security, IAM (Identity and Access Management), and access control
  • Experience analysing and optimizing hardware performance
  • Experience with GPU cluster management

What Makes You Stand Out

  • Experience managing large-scale distributed data processing systems.
  • Experience analysing and optimizing ml training workloads
  • Background in observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).
  • Contributions to open-source DevOps or ML infrastructure projects.

  • Please note: The base salary range listed in this job description reflects compensation for candidates based in the United States. While we prefer candidates located in the U.S, we are also open to qualified talent working remotely across:

Canada - Base salary range (Canada - all locations): $130k - 160k CAD

Compensation Range: $155K - $195K



  • Montreal, Quebec, Canada Datatonic Full time

    Shape the Future of AI & Data with UsAt Datatonic, we are Google Cloud's premier partner in AI, driving transformation for world-class businesses. We push the boundaries of technology with expertise in machine learning, data engineering, and analytics on Google Cloud. By partnering with us, clients future-proof their operations, unlock actionable insights,...


  • Montreal, Quebec, Canada Orion Innovation Full time

    Orion Innovation is a premier, award-winning, global business and technology services firm. Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity. We work with a wide range of clients across many industries...


  • Montreal, Quebec, Canada Datatonic Full time

    Shape the Future of AI & Data with Us At Datatonic, we are Google Cloud's premier partner in AI, driving transformation for world-class businesses. We push the boundaries of technology with expertise in machine learning, data engineering, and analytics on Google Cloud. By partnering with us, clients future-proof their operations, unlock actionable insights,...


  • Montreal, Quebec, Canada Medeloop Full time

    About MedeloopMedeloop is creating the future of clinical operations and health research through cutting-edge AI and big data technologies. Our unified platform, spanning AI-powered analytics, study management, and grant automation, streamlines the entire research lifecycle, enabling faster, smarter, and more impactful discoveries across medicine and public...


  • Montreal, Quebec, Canada Medeloop Full time

    About MedeloopMedeloopis creating the future of clinical operations and health research through cutting-edge AI and big data technologies. Our unified platform, spanning AI-powered analytics, study management, and grant automation, streamlines the entire research lifecycle, enabling faster, smarter, and more impactful discoveries across medicine and public...

  • Senior, ML Engineer

    3 days ago


    Montreal, Quebec, Canada Torc Robotics Full time

    About The CompanyAt Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business.A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. Now a part of the Daimler family, we are focused solely on developing software for...

  • Senior, ML Engineer

    2 days ago


    Montreal, Quebec, Canada Torc Robotics Full time $199,200 - $298,800

    About the Company At Torc, we have always believed that autonomous vehicle technology will transform how we travel, move freight, and do business.A leader in autonomous driving since 2007, Torc has spent over a decade commercializing our solutions with experienced partners. Now a part of the Daimler family, we are focused solely on developing software for...

  • DevOps Engineer

    7 days ago


    Montreal, Quebec, Canada Citylogix Full time

    About the roleAt Citylogix, we build data platforms that help cities and transportation agencies make smarter, safer infrastructure decisions. We're looking for a DevOps Engineer who enjoys working close to production systems and enabling teams to ship reliable, secure software at scale.In this role, you'll be responsible for the cloud infrastructure and...

  • DevOps Engineer

    2 days ago


    Montreal, Quebec, Canada Affinity Full time

    Job Description:On behalf of our client, Affinity is seeking a DevOps Engineer. You will play a critical role in shaping the foundation of our infrastructure, development workflows, and deployment pipelines. You will help the team in building a highly scalable, secure, and efficient system, working closely with product and engineering stakeholders to deliver...


  • Montreal, Quebec, Canada Cohere Full time

    Who are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...