Staff Software Engineer, GPU Infrastructure
1 week ago
Staff Software Engineer, GPU Infrastructure (HPC) This position is posted by Jobgether on behalf of a partner company in the United States, Canada. As a Staff Software Engineer in GPU infrastructure, you will design, build, and operate high‑performance computing clusters to accelerate AI and machine learning workloads. You will collaborate closely with researchers and engineers to ensure workloads run reliably, efficiently, and at scale across cloud environments. The role includes optimizing infrastructure for cost, performance, and stability, while providing self‑service tools for ML teams, troubleshooting complex issues, implementing automation and observability best practices, and driving innovations in distributed GPU/TPU systems. This position offers opportunities to mentor engineers, influence infrastructure strategy, and directly impact the development of cutting‑edge AI models in a fast‑paced, collaborative environment. Accountabilities Design, deploy, and manage Kubernetes‑based GPU/TPU superclusters across multiple clouds for AI/ML workloads Optimize HPC infrastructure for distributed training frameworks such as JAX, PyTorch, and TensorFlow Identify and resolve performance bottlenecks, system failures, and infrastructure issues Build self‑service tools to enable researchers to monitor, debug, and optimize AI/ML training jobs independently Implement best practices for automation, observability, and infrastructure‑as‑code (IaC) Collaborate closely with AI researchers and ML engineers to translate emerging needs into robust infrastructure solutions Mentor team members, conduct code reviews, document processes, and foster a culture of knowledge sharing Requirements Deep expertise in ML/HPC infrastructure, including GPU/TPU clusters and distributed training frameworks Proven experience with cloud‑native Kubernetes deployments at scale Strong programming skills in Python and Go, with preference for open‑source contributions Knowledge of Linux internals, RDMA networking, and performance optimization for ML workloads Demonstrated ability to collaborate with research teams and solve complex infrastructure challenges Self‑directed problem‑solving mindset with ability to drive impact in fast‑paced environments Experience in building scalable, resilient, and maintainable infrastructure systems Benefits Inclusive and collaborative work culture Opportunities to work on cutting‑edge AI research and infrastructure projects Weekly lunch stipend, in‑office meals, and snacks Comprehensive health and dental benefits, including mental health budget 100% parental leave top‑up for up to six months Personal enrichment benefits for arts, fitness, and workspace improvement Remote‑flexible work options, co‑working stipend, and offices in major global cities Six weeks of vacation (30 working days) #J-18808-Ljbffr
-
Staff Software Engineer, GPU Infrastructure
2 weeks ago
, , Canada Cohere AI Full timeWho are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we...
-
Staff Software Engineer, GPU Infrastructure
2 weeks ago
, , Canada Cohere Full timeWho are we? Our mission is to scale intelligence to serve humanity. We’re training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI. We obsess over what we...
-
Staff Software Engineer, GPU Infrastructure
1 week ago
Canada Cohere Full time $120,000 - $180,000 per yearWho are we?Our mission is to scale intelligence to serve humanity. We're training and deploying frontier models for developers and enterprises who are building AI systems to power magical experiences like content generation, semantic search, RAG, and agents. We believe that our work is instrumental to the widespread adoption of AI.We obsess over what we...
-
, , Canada Jobgether Full timeA leading technology staffing platform seeks a Staff Software Engineer in GPU Infrastructure. In this role, you will design and optimize high-performance computing clusters for AI and machine learning. Collaborating with researchers, you'll drive innovations in distributed GPU systems, with a focus on performance and stability. This position offers a...
-
GPU Cloud Platform Engineer
6 days ago
, , Canada Yotta Labs Full timeJoin to apply for the GPU Cloud Platform Engineer role at Yotta Labs . About Yotta Labs Yotta Labs is pioneering the development of a Decentralized Operating System (DeOS) for AI workload orchestration at a planetary scale. Our mission is to democratize access to AI resources by aggregating geo-distributed GPUs, enabling high-performance computing for AI...
-
Software Engineer – GPU
4 days ago
- Street Northwest Edmonton, Alberta, TG C Canada Huawei Technologies Canada Co. Full time $100,000 - $120,000 per yearJob description Huawei Canada has an immediate 12-month contract opening for a Software Engineer. About the team:The Software-Hardware System Optimization Lab continuously improves the power efficiency and performance of smartphone products through software-hardware systems optimization and architecture innovation. We keep tracking the trends of...
-
Staff Software Engineer
3 weeks ago
, , Canada DataRobot Full timeJoin to apply for the Staff Software Engineer (Compute) role at DataRobot Join to apply for the Staff Software Engineer (Compute) role at DataRobot Job Description:DataRobot delivers AI that maximizes impact and minimizes business risk. Our platform and applications integrate into core business processes so teams can develop, deliver, and govern AI at scale....
-
Staff Software Engineer
3 weeks ago
, , Canada Qualified Full timeStaff Software Engineer - Platform & Infrastructure Join to apply for the Staff Software Engineer - Platform & Infrastructure role at Qualified Qualified is the Agentic Marketing Platform for B2B companies. With Piper the AI SDR Agent, Qualified offers a whole new way to grow inbound pipeline. Piper operates across both the website and email, working to...
-
Staff Infrastructure Software Engineer, Metadata
3 weeks ago
, , Canada Dropbox Full timeStaff Infrastructure Software Engineer, Metadata Join to apply for the Staff Infrastructure Software Engineer, Metadata role at Dropbox Get AI-powered advice on this job and more exclusive features. Dropbox is a Virtual First company. For this role, we are currently only authorized to hire candidates from the following provinces: Alberta, British Columbia,...
-
Staff Software Engineer
2 days ago
, ON, Canada DataRobot Full timeJoin to apply for the Staff Software Engineer (Fleet) role at DataRobot Continue with Google Continue with Google 2 days ago Be among the first 25 applicants Join to apply for the Staff Software Engineer (Fleet) role at DataRobot Get AI-powered advice on this job and more exclusive features. Sign in to access AI-powered advices Continue with Google Continue...