HPC Engineer
2 weeks ago
Job Title: HPC Engineer – AI Workloads & Infrastructure
Location:
Toronto, ON (Hybrid)
Department:
Operations – High Performance Computing (HPC)
Company:
iVedha Inc
About iVedha
iVedha is a leading provider of cloud and managed services, helping enterprises modernize their IT infrastructure and accelerate digital transformation. Be part of the Canada's next-generation
sovereign AI infrastructure team
, delivering high-performance computing solutions for AI, ML, and scientific workloads. Our mission is to empower innovators with secure, scalable, and sustainable compute platforms.
Role Overview
We are seeking an
HPC Engineer
to join our operational team supporting
AI workloads in a high-performance computing environment
. This role focuses on
building and managing HPC compute nodes
, deploying
Kubernetes clusters
, and orchestrating
bare-metal and virtualized environments
. You will also work with
advanced storage technologies
such as
VAST Data
and
MooseFS
, ensuring seamless integration with GPU-accelerated infrastructure.
Key Responsibilities
- Design, deploy, and maintain HPC clusters
for AI/ML workloads, including GPU-accelerated compute nodes (NVIDIA DGX/HGX platforms). - Implement and manage
Kubernetes
for containerized AI workloads, ensuring scalability and high availability. - Configure and optimize
bare-metal servers
,
VMs
, and
virtualized environments
for HPC applications. - Integrate and manage
high-performance storage systems
(VAST, MooseFS, Lustre, or similar parallel file systems). - Implement
job scheduling and orchestration
using Slurm or equivalent tools for AI and HPC workloads. - Monitor and tune system performance for
GPU utilization, network throughput, and storage I/O
. - Automate deployment and configuration using
Forman,
Ansible, Terraform, or similar tools
. - Collaborate with AI engineers, DevOps, and data teams to optimize infrastructure for
LLM training, fine-tuning, and inference pipelines
. - Ensure
security, compliance, and data integrity
across HPC environments.
Required Skills & Experience
- 3+ years
in HPC engineering, systems administration, or AI infrastructure roles. - Strong experience with
Linux (RHEL/CentOS/Ubuntu)
in HPC environments. - Hands-on experience with
Kubernetes
,
Docker
, and container orchestration for AI workloads. - Familiarity with
GPU clusters
,
CUDA
, NCCL and NVIDIA ecosystem tools. - Knowledge of
high-speed interconnects
(InfiniBand, RoCE) and
networking for HPC
. - Experience with
parallel/distributed file systems
(VAST, MooseFS, Lustre, GPFS). - Proficiency in
automation and scripting
(Python, Bash, Ansible). - Understanding of
job schedulers
(Slurm, PBS, Torque) and workload optimization.
Nice-to-Have
- Experience with
cloud HPC platforms
(Azure HPC, AWS ParallelCluster, or similar). - Familiarity with
AI/ML frameworks
(PyTorch, TensorFlow) and
MLOps pipelines
. - Exposure to
observability tools
(Prometheus, Grafana) for HPC environments.
Why Join iVedha?
- Work on
cutting-edge AI infrastructure projects
powering Canada's sovereign AI ecosystem. - Collaborate with a
world-class team
of engineers and AI specialists. - Competitive compensation, benefits, and opportunities for
career growth in HPC and AI
.
Apply Now:
Send your resume to
with the subject line:
HPC Engineer – AI Infrastructure
.
-
HPC Engineer, AI/ML Infrastructure
6 days ago
Toronto, Ontario, Canada Boson AI Full time US$150,000 - US$250,000About The RoleWe're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers.You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...
-
HPC Engineer, AI/ML Infrastructure
6 days ago
Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per yearAbout The Role We're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...
-
DevOps Engineer
1 week ago
Toronto, Ontario, Canada RulesIQ Full time $80,000 - $120,000 per yearDevOps Engineer (Windows & Linux)Toronto, ON (5 day Onsite)FulltimeOverview:We're looking for aDevOps Engineer with deep expertise in Windows or Linux system administrationand strong infrastructure engineering skills. This role requires proven experience setting up and maintaining systems from the ground up—not just surface-level administration. Candidates...
-
Software Engineer
2 weeks ago
Toronto, Ontario, Canada Altair Full time $90,000 - $120,000 per yearTransforming the Future with Convergence of Simulation and DataSoftware Engineer - Commodity Engineer 2Job Summary:Our client in Kanata, ON is looking for a Software Engineer - Commodity Engineer 2. This is a contract position.What You Will Do:We are seeking an experienced Automation Developer to join our Embedded Performance Automation Team. The candidate...
-
Network Engineer, AI/ML Infrastructure
6 days ago
Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per yearAbout The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph...
-
Network Engineer, AI/ML Infrastructure
6 days ago
Toronto, Ontario, Canada Boson AI Full time US$150,000 - US$250,000About The RoleWe're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage,...
-
software engineer controls
1 week ago
Toronto, Ontario, Canada Altair Full time $80,000 - $120,000 per yearTransforming the Future with Convergence of Simulation and DataSoftware Engineer ControlsJob Summary:Our client in Kanata, ON is looking for a Software Engineer Controls. This is a contract position.What You Will Do:Our Client is seeking to hire a person with embedded software development expertise. This team develops control algorithms and produces code for...
-
Network Engineer, AI/ML Infrastructure
9 hours ago
Toronto, Ontario, Canada Boson AI Full time $150,000 - $250,000 per yearAbout The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph...
-
Sales Manager CANADA
6 days ago
Toronto, Ontario, Canada Quandela Full time $150,000 - $250,000 per yearQuandela is a leading European quantum technology company pioneering the development and commercialization of photonic quantum computers. Our flagship quantum processing unit (QPU), MosaiQ, delivers scalable, reliable quantum computation for HPC centers, cloud providers, enterprises, and academic research institutions.As we expand our footprint across global...
-
GPU Hardware Sales Specialist
21 hours ago
Toronto, Ontario, Canada Arc Compute Full time US$1,000,000 - US$2,000,000 per yearGPU Hardware Sales SpecialistLocation:Onsite, Toronto, ONType:Full-TimeAbout ArcArc Compute designs and delivers GPU infrastructure for AI, deep learning, high-performance computing (HPC), and media workloads.From custom GPU servers to full-scale clusters, Arc equips organizations with the performance, reliability, and control they can't get from public...