System Administrator

3 days ago


Edmonton, Canada Alberta Machine Intelligence Institute Full time

Robert Craig | Director, IT

**About Amii**

Alberta Machine Intelligence Institute (Amii) is one of Canada’s three main institutes for artificial intelligence (AI) and machine learning, our world-renowned researchers drive fundamental and applied research at the University of Alberta (and other academic institutions), training some of the world’s top scientific talent. Our cross-functional teams work collaboratively with Alberta-based businesses and organizations to build AI capacity and translate scientific advancement into industry adoption and economic impact.

**About the Role**

We are seeking an HPC System Administrator for a **full-time, 4-year term position**. The System Administrator, High Performance Cluster (HPC) is critical to maintaining the stability, security, and performance of our mission-critical infrastructure, enabling our AI researchers and engineers to focus on pioneering innovations that advance the mission of 'AI for good and for all'.

Reporting to the Director, IT, the System Administrator, HPC is responsible for the day-to-day operation and maintenance of the data center's systems infrastructure. This includes servers, storage, network devices, and related software. This role requires a proactive approach to problem-solving, a commitment to best practices, and the ability to work effectively in a fast-paced environment.

The position focuses on achieving excellence in three main accountabilities:

- System Maintenance & Optimization
- Security Management
- Technical Support
- HPC Administration & Support

**Required Skills / Expertise**

**Key Responsibilities**:

- Assists in expanding HPC resources based on user needs and growth projections, and maintain capacity planning models for scalability and performance
- Build, configure, and maintain **high-performance computing clusters**, including compute, storage, and networking components.
- Oversees daily operations and maintenance of the High-Performance Computing (HPC) Cluster running on Linux and SLURM, including monitoring system health and performance, and managing job queues and SLURM configurations for optimized scheduling and resource allocation
- Design, configure, and troubleshoot **high-speed networking** (InfiniBand, Ethernet, VLANs, etc.) to optimize cluster performance.
- Manages and maintains Linux-based and Windows servers, ensuring high availability, performance, and security by performing regular updates, patches, and backups, while also configuring and managing essential network services such as DNS, DHCP, NFS, and SNMP
- Assists in the development and maintenance of comprehensive documentation for systems, configurations, procedures, and policies
- Collaborates with other departments to align IT initiatives with organizational goals
- Plans, tests, and deploys system upgrades and patches, keeping systems updated with security and performance enhancements, and coordinating maintenance to minimize user impact
- Monitors system logs and performance metrics to proactively resolve issues, troubleshoot problems with vendors and support teams, and manage monitoring tools for real-time system health
- Implements and maintains virtualization and containerization solutions (e.g., VMware) to optimize resource use and ensure secure, efficient operation
- Recommends & updates standard tech packages for staff considering job requirements, latest technology and budget; deployment of tech packages to new staff
- Monitors system performance, usage, and resource availability, proactively identifying and resolving issues that could impact performance or user experience
- Collaborates with Researchers and Machine Learning Scientists to understand computational needs and implement solutions that enhance usability, throughput, and system efficiency
- Administers workload management and scheduling systems (Slurm) to enable efficient resource allocation and job execution across the cluster
- Drives continuous improvement of HPC services and support models, identifying opportunities to enhance efficiency, usability, and researcher experience
- Prepares regular reports on system performance, security incidents, and project status for management review
- Provides technical support to users by assisting with job submissions, troubleshooting issues, and resolving problems
- Evaluates and recommends new hardware and software solutions to enhance HPC capabilities

**Qualifications**:

- Post Secondary Degree in Computer Science, Information Technology, or a related field (Nice to have), equivalent experience will be considered
- 3+ years of experience in system administration, preferably in a HPC environment.
- Strong understanding of Linux (e.g., CentOS, RHEL, Ubuntu) and Windows Server operating systems
- Experience with virtualization technologies (e.g., VMware, Hyper-V) (Nice to have)
- Knowledge of scripting languages (e.g., Bash, Python, PowerShell) (Nice to have)
- Insight into HPC hardware components (CPUs, GPUs, memory, interconnec



  • Brampton, Toronto, Montreal, Calgary, Vancouver, Edmonton, Old Toronto, Ottawa, Mississauga, Quebec, Winnipeg, Halifax, Saskatoon, Burnaby, Hamilton, Surrey, Victoria, London, Halton Hills, Regina, Markham, Vaughan, Kelowna, Laval, Southwestern Ontario, R, Canada William Osler Health System Full time

    Reporting to the Manager of Scheduling Services this role provides input and support to users on the functional and technical internal control of the staff scheduling system. This position is also responsible for analysis and reporting as required by the scheduling services department. The systems administrator is responsible for maintaining and updating the...

  • System Administrator

    2 weeks ago


    Edmonton, Canada Vantix Systems Inc. Full time

    Our Client is seeking a Senior System Administrator / Network Engineer to provide Azure Active Directory, Microsoft 365, and AWS/Azure infrastructure administration and troubleshootingRemote or hybrid work arrangements are possible, but candidate must be Canadian resident and work within Canada at all times.Graduation from a recognized Institute of...

  • System Administrator

    3 weeks ago


    Edmonton, Canada Vantix Systems Inc. Full time

    Our Client is seeking a Senior System Administrator / Network Engineer to provide Azure Active Directory, Microsoft 365, and AWS/Azure infrastructure administration and troubleshooting Remote or hybrid work arrangements are possible, but candidate must be Canadian resident and work within Canada at all times. - Graduation from a recognized Institute of...

  • System Administrator

    2 weeks ago


    Edmonton, Canada Vantix Systems Inc Full time

    **Description** Our Client is seeking a Senior System Administrator / Network Engineer to provide Azure Active Directory, Microsoft 365, and AWS/Azure infrastructure administration and troubleshooting **Skills** - Microsoft 365 / Azure, MFA, SSO, SAML, OAUTH, FSLogix **Type** - Contract **Experience Required/Mandatory Skills** - Graduation from a...


  • Edmonton, Canada Space Race Cannabis Full time

    **About Space Race**: Space Race is a tech-forward company focused on operational efficiency and scalable systems. We’re seeking an IT Systems Administrator to manage internal technology operations, support our growing team, and maintain secure, reliable infrastructure. **Key Responsibilities**: - Set up laptops and accounts for new employees (Windows...

  • Administrative Clerk

    2 weeks ago


    Edmonton, Canada Ryder System Full time

    **Job Description**: **About Ryder** At Ryder, we have a long history of investing in our employees and providing a collaborative team-based culture that encourages growth across all levels and positions. We are a Fortune 500 company with 800+ locations and over 40,000 employees across the US and Canada! Employee satisfaction is part of our...


  • Edmonton, Canada Edmonton Chamber of Voluntary Organizations Full time

    IT Systems Administrator Location: Edmonton, AB | Hybrid | Full-Time | Union Position About the Role CKUA is seeking a proactive and service‑minded IT Systems Administrator to join its Information Technology & Broadcast team. This full‑time role offers the opportunity to support a wide range of digital infrastructure and play a key part in helping...


  • Edmonton, Canada Edmonton Chamber of Voluntary Organizations Full time

    IT Systems Administrator Location: Edmonton, AB | Hybrid | Full-Time | Union Position About the Role CKUA is seeking a proactive and service‑minded IT Systems Administrator to join its Information Technology & Broadcast team. This full‑time role offers the opportunity to support a wide range of digital infrastructure and play a key part in helping...


  • Edmonton, Canada Edmonton Chamber of Voluntary Organizations Full time

    IT Systems Administrator Location: Edmonton, AB | Hybrid | Full-Time | Union Position About the Role CKUA is seeking a proactive and service‑minded IT Systems Administrator to join its Information Technology & Broadcast team. This full‑time role offers the opportunity to support a wide range of digital infrastructure and play a key part in helping...


  • Edmonton, Canada Edmonton Chamber of Voluntary Organizations Full time

    IT Systems Administrator Location: Edmonton, AB | Hybrid | Full-Time | Union Position About the Role CKUA is seeking a proactive and service‑minded IT Systems Administrator to join its Information Technology & Broadcast team. This full‑time role offers the opportunity to support a wide range of digital infrastructure and play a key part in helping...