HPC Engineer

2 weeks ago


Toronto, Ontario, Canada iVedha Inc. Full time $120,000 - $180,000 per year

Job Title: HPC Engineer – AI Workloads & Infrastructure

Location:
Toronto, ON (Hybrid)

Department:
Operations – High Performance Computing (HPC)

Company:
iVedha Inc

About iVedha

iVedha is a leading provider of cloud and managed services, helping enterprises modernize their IT infrastructure and accelerate digital transformation. Be part of the Canada's next-generation
sovereign AI infrastructure team
, delivering high-performance computing solutions for AI, ML, and scientific workloads. Our mission is to empower innovators with secure, scalable, and sustainable compute platforms.

Role Overview

We are seeking an
HPC Engineer
to join our operational team supporting
AI workloads in a high-performance computing environment
. This role focuses on
building and managing HPC compute nodes
, deploying
Kubernetes clusters
, and orchestrating
bare-metal and virtualized environments
. You will also work with
advanced storage technologies
such as
VAST Data
and
MooseFS
, ensuring seamless integration with GPU-accelerated infrastructure.

Key Responsibilities

  • Design, deploy, and maintain HPC clusters
    for AI/ML workloads, including GPU-accelerated compute nodes (NVIDIA DGX/HGX platforms).
  • Implement and manage
    Kubernetes
    for containerized AI workloads, ensuring scalability and high availability.
  • Configure and optimize
    bare-metal servers
    ,
    VMs
    , and
    virtualized environments
    for HPC applications.
  • Integrate and manage
    high-performance storage systems
    (VAST, MooseFS, Lustre, or similar parallel file systems).
  • Implement
    job scheduling and orchestration
    using Slurm or equivalent tools for AI and HPC workloads.
  • Monitor and tune system performance for
    GPU utilization, network throughput, and storage I/O
    .
  • Automate deployment and configuration using
    Forman,
    Ansible, Terraform, or similar tools
    .
  • Collaborate with AI engineers, DevOps, and data teams to optimize infrastructure for
    LLM training, fine-tuning, and inference pipelines
    .
  • Ensure
    security, compliance, and data integrity
    across HPC environments.

Required Skills & Experience

  • 3+ years
    in HPC engineering, systems administration, or AI infrastructure roles.
  • Strong experience with
    Linux (RHEL/CentOS/Ubuntu)
    in HPC environments.
  • Hands-on experience with
    Kubernetes
    ,
    Docker
    , and container orchestration for AI workloads.
  • Familiarity with
    GPU clusters
    ,
    CUDA
    , NCCL and NVIDIA ecosystem tools.
  • Knowledge of
    high-speed interconnects
    (InfiniBand, RoCE) and
    networking for HPC
    .
  • Experience with
    parallel/distributed file systems
    (VAST, MooseFS, Lustre, GPFS).
  • Proficiency in
    automation and scripting
    (Python, Bash, Ansible).
  • Understanding of
    job schedulers
    (Slurm, PBS, Torque) and workload optimization.

Nice-to-Have

  • Experience with
    cloud HPC platforms
    (Azure HPC, AWS ParallelCluster, or similar).
  • Familiarity with
    AI/ML frameworks
    (PyTorch, TensorFlow) and
    MLOps pipelines
    .
  • Exposure to
    observability tools
    (Prometheus, Grafana) for HPC environments.

Why Join iVedha?

  • Work on
    cutting-edge AI infrastructure projects
    powering Canada's sovereign AI ecosystem.
  • Collaborate with a
    world-class team
    of engineers and AI specialists.
  • Competitive compensation, benefits, and opportunities for
    career growth in HPC and AI
    .

Apply Now:
Send your resume to

with the subject line:
HPC Engineer – AI Infrastructure
.



  • Toronto, Ontario, Canada Boson AI Full time US$150,000 - US$250,000

    About The RoleWe're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers.You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...


  • Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per year

    About The Role We're looking for a Senior High Performance Computing Engineer to help us run one of the most exciting GPU clusters around—our Toronto datacenter packed with NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage, terabit networking, and hundreds of servers You'll be hands-on with the full lifecycle of HPC infrastructure: planning, building,...

  • DevOps Engineer

    1 week ago


    Toronto, Ontario, Canada RulesIQ Full time $80,000 - $120,000 per year

    DevOps Engineer (Windows & Linux)Toronto, ON (5 day Onsite)FulltimeOverview:We're looking for aDevOps Engineer with deep expertise in Windows or Linux system administrationand strong infrastructure engineering skills. This role requires proven experience setting up and maintaining systems from the ground up—not just surface-level administration. Candidates...

  • Software Engineer

    2 weeks ago


    Toronto, Ontario, Canada Altair Full time $90,000 - $120,000 per year

    Transforming the Future with Convergence of Simulation and DataSoftware Engineer - Commodity Engineer 2Job Summary:Our client in Kanata, ON is looking for a Software Engineer - Commodity Engineer 2. This is a contract position.What You Will Do:We are seeking an experienced Automation Developer to join our Embedded Performance Automation Team. The candidate...


  • Toronto, Ontario, Canada Boson AI Full time $120,000 - $180,000 per year

    About The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph...


  • Toronto, Ontario, Canada Boson AI Full time US$150,000 - US$250,000

    About The RoleWe're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph storage,...


  • Toronto, Ontario, Canada Altair Full time $80,000 - $120,000 per year

    Transforming the Future with Convergence of Simulation and DataSoftware Engineer ControlsJob Summary:Our client in Kanata, ON is looking for a Software Engineer Controls. This is a contract position.What You Will Do:Our Client is seeking to hire a person with embedded software development expertise. This team develops control algorithms and produces code for...


  • Toronto, Ontario, Canada Boson AI Full time $150,000 - $250,000 per year

    About The Role We're seeking an experienced Network Engineer to design, build, and optimize the high-performance networking infrastructure powering our AI/ML operations in Toronto. You'll work at the cutting edge of network technology—managing InfiniBand and ultra-high-speed Ethernet fabrics that connect NVIDIA H100 and A100 GPUs, over 20PB of Ceph...


  • Toronto, Ontario, Canada Quandela Full time $150,000 - $250,000 per year

    Quandela is a leading European quantum technology company pioneering the development and commercialization of photonic quantum computers. Our flagship quantum processing unit (QPU), MosaiQ, delivers scalable, reliable quantum computation for HPC centers, cloud providers, enterprises, and academic research institutions.As we expand our footprint across global...


  • Toronto, Ontario, Canada Arc Compute Full time US$1,000,000 - US$2,000,000 per year

    GPU Hardware Sales SpecialistLocation:Onsite, Toronto, ONType:Full-TimeAbout ArcArc Compute designs and delivers GPU infrastructure for AI, deep learning, high-performance computing (HPC), and media workloads.From custom GPU servers to full-scale clusters, Arc equips organizations with the performance, reliability, and control they can't get from public...