Senior SRE: AI/ML GPU HPC Infra

2 weeks ago

Toronto, Canada Boson AI Full time

A technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize a cutting-edge GPU cluster. The role involves hands-on lifecycle management of HPC infrastructure, troubleshooting, and developing automation for operational efficiency. Candidates should have over 5 years of experience in SRE or HPC and be proficient in Linux and Kubernetes. The position offers a competitive salary of $150,000 - $250,000 a year.
#J-18808-Ljbffr

Senior SRE: AI/ML GPU HPC Infra

2 weeks ago

Toronto, Canada Boson AI Full time

A technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize a cutting-edge GPU cluster. The role involves hands-on lifecycle management of HPC infrastructure, troubleshooting, and developing automation for operational efficiency. Candidates should have over 5 years of experience in SRE or HPC and be proficient in...
Senior SRE: AI/ML GPU HPC Infra

2 weeks ago

Toronto, Canada Boson AI Full time

A technology company in Toronto is seeking a Senior Site Reliability Engineer to manage and optimize a cutting-edge GPU cluster. The role involves hands-on lifecycle management of HPC infrastructure, troubleshooting, and developing automation for operational efficiency. Candidates should have over 5 years of experience in SRE or HPC and be proficient in...
Senior HPC Engineer — AI/ML Infra on Massive GPU Cluster

4 weeks ago

Toronto, Canada Boson AI Full time

A leading technology company in Toronto is seeking a Senior High Performance Computing Engineer to manage one of the most advanced GPU clusters. You'll handle the full lifecycle of HPC infrastructure, from planning to deployment, and work closely with engineering teams. Candidates should have 5+ years of experience in HPC operations, proficiency in Linux,...
Senior HPC Engineer — AI/ML Infra on Massive GPU Cluster

4 weeks ago

Toronto, Canada Boson AI Full time

A leading technology company in Toronto is seeking a Senior High Performance Computing Engineer to manage one of the most advanced GPU clusters. You'll handle the full lifecycle of HPC infrastructure, from planning to deployment, and work closely with engineering teams. Candidates should have 5+ years of experience in HPC operations, proficiency in Linux,...
HPC Engineer, AI/ML Infrastructure

3 weeks ago

Toronto, Canada Boson AI Full time

Base pay range CA$150,000.00/yr - CA$250,000.00/yr About The Role We're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers. You'll be hands‑on with the full...
HPC Engineer, AI/ML Infrastructure

4 weeks ago

Toronto, Canada Boson AI Full time

Base pay range CA$150,000.00/yr - CA$250,000.00/yr About The Role We're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers. You'll be hands‑on with the full...
Senior HPC

4 weeks ago

Toronto, Canada Boson AI Full time

A leading tech company in Toronto is seeking a Senior High Performance Computing Engineer to manage a GPU cluster and support ML teams. This role requires 5+ years of HPC operations experience, proficiency in Linux systems, and knowledge of Kubernetes. Candidates will develop automation solutions and optimize infrastructure in a dynamic environment. The...
HPC Engineer, AI/ML Infrastructure

1 week ago

Toronto, Ontario, Canada Boson AI Full time US$150,000 - US$250,000

About The RoleWe're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers.You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...
HPC Engineer, AI/ML Infrastructure

1 week ago

Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per year

About The Role We're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...
Site Reliability Engineer, AI/ML Infrastructure

3 weeks ago

Toronto, Canada Boson AI Full time

About The Role We're looking for a Senior Site Reliability Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers. You'll be hands‑on with the full lifecycle of HPC infrastructure: planning, building, testing,...

Americas

Europe

Asia / Oceania

Africa

Senior SRE: AI/ML GPU HPC Infra