Senior DevOps Engineer, ML Infrastructure
2 weeks ago
At Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.
The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles, Miami, Dallas, Atlanta and Chicago while doing commercial deliveries. We’re looking for talented individuals who will grow robotic deliveries from surprising novelty to efficient ubiquity.
Who We Are
We are tech industry veterans in software, hardware, and design who are pooling our skills to build the future we want to live in. We are solving real-world problems leveraging robotics, machine learning and computer vision, among other disciplines, with a mindful eye towards the end-to-end user experience. Our team is agile, diverse, and driven. We believe that the best way to solve complicated dynamic problems is collaboratively and respectfully.
As a Senior DevOps Engineer on the Machine Learning (ML) Infrastructure team, you will help design, build, and maintain our petabyte-scale data and ML platform that powers data partnerships, ML research, and autonomy engineering. You will play a key role in ensuring reliability, security, scalability, and performance across our internal systems, and maintain a suite of internal tools used by dozens of engineers. Your work will make a significant impact on our autonomous capabilities and act as a catalyst for the entire autonomy team, helping us train our next generation of ML models.
Responsibilities
- Deploy and maintain our ML training orchestration system that operates across multiple platforms.
- Manage cloud and on-premise environments for large-scale distributed data processing and ml training/inference systems.
- Automate deployment pipelines, monitoring, and alerting for ML and data services.
- Collaborate closely with data scientists, ML engineers, and autonomy teams to streamline experimentation and model deployment.
- Maintain and improve CI/CD systems to support rapid development and testing.
- Implement best practices for system security, reliability, and observability.
- Optimize infrastructure costs and ensure efficient resource utilization.
- Support internal developer productivity through tooling, documentation, and support.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent experience.
- 5+ years of experience as a DevOps, SRE, or Infrastructure Engineer, preferably supporting ML or data-intensive systems.
- Strong experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker).
- Proficiency in infrastructure-as-code tools such as Terraform or Helm.
- Solid understanding of CI/CD systems (GitLab CI, Jenkins, ArgoCD, etc.).
- Experience with Python and SQL
- Experience with cloud security, IAM (Identity and Access Management), and access control
- Experience analysing and optimizing hardware performance
- Experience with GPU cluster management
What Makes You Stand Out
- Experience managing large-scale distributed data processing systems.
- Experience analysing and optimizing ml training workloads
- Background in observability stacks (Prometheus, Grafana, ELK, OpenTelemetry).
- Contributions to open-source DevOps or ML infrastructure projects.
* Please note: The base salary range listed in this job description reflects compensation for candidates based in the United States. While we prefer candidates located in the U.S, we are also open to qualified talent working remotely across:
Canada - Base salary range (Canada - all locations): $130k - 160k CAD
#J-18808-Ljbffr
-
Senior DevOps Engineer, ML Infrastructure
2 weeks ago
Montreal (administrative region), Canada Serve Robotics Full timeAt Serve Robotics, we’re reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It’s designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses. The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los...
-
Senior DevOps Engineer, ML Infrastructure
2 weeks ago
Montreal, Quebec, Canada Serve Robotics Full timeAt Serve Robotics, we're reimagining how things move in cities. Our personable sidewalk robot is our vision for the future. It's designed to take deliveries away from congested streets, make deliveries available to more people, and benefit local businesses.The Serve fleet has been delighting merchants, customers, and pedestrians along the way in Los Angeles,...
-
Senior ML Infra DevOps Engineer
5 days ago
Montreal (administrative region), Canada Serve Robotics Full timeA technology company in Canada is seeking a Senior DevOps Engineer to design and maintain their ML infrastructure. The role involves managing cloud and on-premise environments, automating deployment pipelines, and ensuring system security, reliability, and performance. Candidates should have robust experience with cloud platforms and container orchestration....
-
Senior ML Infra DevOps Engineer
2 weeks ago
Montreal (administrative region), Canada Serve Robotics Full timeA technology company in Canada is seeking a Senior DevOps Engineer to design and maintain their ML infrastructure. The role involves managing cloud and on-premise environments, automating deployment pipelines, and ensuring system security, reliability, and performance. Candidates should have robust experience with cloud platforms and container orchestration....
-
Senior Devops Engineer
5 hours ago
Montreal (administrative region), Canada AppCard, Inc. Full timeDirect message the job poster from AppCard, Inc. AppCard Inc. is a technology and marketing company headquartered in Manhattan, NY. AppCard has a powerful marketing tool that leverages data acquired at the point of sale (POS) via an advanced rewards program to create advanced retargeting campaigns that help businesses increase their bottom line. AppCard is...
-
DevOps Engineer
5 days ago
Montreal (administrative region), Canada Sky Systems, Inc. (SkySys) Full timeDirect message the job poster from Sky Systems, Inc. (SkySys) Service Delivery Manager - Talent Acquisition Requis : Ansible, CLI The candidate must have the following qualifications: The Tech Cloud Lead DevOps is responsible for analyzing, implementing, and evolving cloud infrastructures in Azure. They drive the DevOps strategy, automation through IaC...
-
DevOps Engineer
4 days ago
Montreal (administrative region), Canada Sky Systems, Inc. (SkySys) Full timeDirect message the job poster from Sky Systems, Inc. (SkySys) Service Delivery Manager - Talent Acquisition Requis : Ansible, CLI The candidate must have the following qualifications: The Tech Cloud Lead DevOps is responsible for analyzing, implementing, and evolving cloud infrastructures in Azure. They drive the DevOps strategy, automation through IaC...
-
Software Engineer
4 hours ago
Montreal (administrative region), Canada Hunter Bond Full timeSoftware Engineer (Distributed System & ML Infrastructure) - Elite Tech Firm - Up to $250k CAD + Industry Leading Bonus Job Title: Software Engineer (Distributed Systems & ML Infrastructure) Client: Elite Tech Firm Salary: Up to $250k CAD + Industry-Leading Bonus Location: Montreal (Hybrid) Sells: Work on next-gen distributed systems and ML infrastructure,...
-
Senior DevOps Engineer
4 weeks ago
Montreal, Canada Datatonic Full timeShape the Future of AI & Data with Us At Datatonic, we are Google Cloud's premier partner in AI, driving transformation for world-class businesses. We push the boundaries of technology with expertise in machine learning, data engineering, and analytics on Google Cloud. By partnering with us, clients future‑proof their operations, unlock actionable...
-
Senior DevOps Engineer
4 weeks ago
Montreal, Canada Datatonic Full timeShape the Future of AI & Data with Us At Datatonic, we are Google Cloud's premier partner in AI, driving transformation for world-class businesses. We push the boundaries of technology with expertise in machine learning, data engineering, and analytics on Google Cloud. By partnering with us, clients future‑proof their operations, unlock actionable...