Software Engineer
4 weeks ago
Luma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. This requires a massive, reliable, and performant GPU infrastructure that pushes the boundaries of scale. Our SRE team is the foundation of our research and product velocity, responsible for the thousands of NVIDIA and AMD GPUs across multiple providers that power our work. Where You Come In We are looking for a hands‑on, first‑principles engineer who is fluent in Linux, comfortable operating close to the metal, and capable of architecting systems for the next generation of AI infrastructure. You will build, maintain, and scale Luma’s infrastructure across on‑prem and multi‑vendor clouds (AWS & OCI), serving as the bridge between hardware vendors, cloud providers, and our research teams. What You’ll Do Architect for Reliability & Scale: Participate in critical re‑architecture sessions to redesign our systems for higher efficiency and scale. You won't just maintain existing clusters; you will help define how our next‑generation infrastructure operates. Own Multi‑Cloud GPU Clusters: Take end‑to‑end ownership of our production clusters for training and inference across AWS and OCI, ensuring high availability and peak performance. Drive Security & Compliance: Assist in achieving and maintaining security certifications (SOC 2 Type 1 & 2, ISO standards) by implementing robust infrastructure security practices in a fast‑moving AI startup environment. Deep Linux Performance Tuning: Use your mastery of Linux systems to troubleshoot and optimise performance at the OS and kernel level. Build Robust Automation: Write high‑quality tools and automation in Python, Go, or Bash to manage, monitor, and heal our infrastructure without relying on heavy operational toil. Debug Complex Hardware/Software Failures: Serve as the final escalation point for the most challenging GPU, networking (InfiniBand/RDMA), and system‑level issues, often collaborating directly with hardware vendors like NVIDIA. Who You Are 8+ years of experience as an SRE, production engineer, or infrastructure engineer in a fast‑paced, large‑scale environment. Deep Linux Mastery: You possess deep, hands‑on expertise in Linux, containerised systems, and debugging low‑level system performance. Cloud Infrastructure Expert: You have strong experience with providers like AWS or OCI. Tenacious Troubleshooter: You thrive on solving complex, low‑level problems where hardware and software intersect. Startup DNA: You are energetic and thrive in a less structured, fast‑paced environment. Security‑Minded: You possess a working knowledge of security best practices and familiarity with compliance frameworks, such as SOC 2 and ISO. Expert in High‑Performance Networking: You have practical experience with InfiniBand, RDMA, or RoCE and understand how to optimise throughput for massive distributed training jobs. What Sets You Apart (Bonus Points) Deep expertise with GPU tooling for NVIDIA and AMD GPUs like DCGM or ROCm. Experience managing large‑scale GPU clusters for AI/ML workloads (training or inference). Familiarity with job management systems based on Kubernetes or orchestration frameworks like Ray. Compensation The base pay range for this role is $170,000 – $360,000 per year. #J-18808-Ljbffr
-
Software Test Engineer
3 weeks ago
London, Canada Aversan Full timeOverview Software Test Engineer – Aversan Inc. ( is a trusted multi-service engineering and electronics manufacturing company. Aversan delivers leading-edge and reliable safety-critical electronics and software systems to the aerospace, defence, and space industries. We are currently seeking a qualified Software Test Engineer to join our team. The...
-
Embedded Software Engineer
24 hours ago
London, Ontario, Canada Aversan Full timeAversan Inc.15 days agoLondon, OntarioMid LevelcontractAbout the roleAversan Inc. ) is a trusted multi-service engineering and electronics manufacturing company. Aversan delivers leading-edge and reliable safety-critical electronics and software systems to the aerospace, defence, and space industries.We are currently seeking a qualified Embedded Software...
-
Research Engineer
5 days ago
London, Canada Huawei Full timeHuawei Canada has an immediate 12-month contract opening for a Research Engineer. About the team: The Intelligent Testing Technology Team, currently a part of the Waterloo Research Centre, is at the forefront of integrating large language models (LLMs) with formal methods to advance artificial intelligence. By harnessing LLMs' strengths in natural language...
-
Embedded Software Engineer
4 weeks ago
London, Canada Insight Global Full timeInsight Global is seeking an Intermediate Embedded Software Engineer to join a top avionics company in Ottawa. This role involves developing and integrating embedded software solutions for aviation and connectivity systems. Ideal candidates will have strong technical expertise in C/C++, Linux environments, and networking, with a passion for innovation and...
-
Software Engineering Manager
2 weeks ago
London, Canada Navtech, Inc. Full timeAviation. It connects our world, brings people together, provides opportunities, accelerates economic growth, and is just so very cool! Come work for NAVBLUE, a leading services company wholly owned by Airbus, dedicated to Flight Operations & Air Traffic Management solutions and services for airlines, airports, and Air Navigation Service Providers. We...
-
Senior Software Engineer
5 days ago
London, Canada Affirm Full timeOverviewAffirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest.ResponsibilitiesSite Reliability Engineering at Affirm supports Engineering partners to “Operate What They Own” with excellence to protect the customer experience. SRE defines...
-
Senior Software Engineer
2 weeks ago
London, Canada Affirm Full timeOverview Affirm is reinventing credit to make it more honest and friendly, giving consumers the flexibility to buy now and pay later without any hidden fees or compounding interest. Responsibilities Site Reliability Engineering at Affirm supports Engineering partners to “Operate What They Own” with excellence to protect the customer experience. SRE...
-
Software Engineer
4 weeks ago
London, Canada lumalabs.ai Full timeLuma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. This requires a massive, reliable, and performant GPU infrastructure that pushes the boundaries of scale. Our SRE team is the foundation of our research and product velocity, responsible for the thousands of...
-
Software Engineer
5 days ago
London, Canada lumalabs.ai Full timeLuma’s mission is to build multimodal AI to expand human imagination and capabilities. We believe that multimodality is critical for intelligence. This requires a massive, reliable, and performant GPU infrastructure that pushes the boundaries of scale. Our SRE team is the foundation of our research and product velocity, responsible for the thousands of...
-
On-Site Software Test Engineer – MBSE
4 weeks ago
London, Canada Aversan Full timeA trusted engineering company in London, ON is looking for a Software Test Engineer to design and execute test plans ensuring compliance with software requirements. Candidates should have a Bachelor's degree in Software or Computer Engineering and at least 3 years of experience in software testing methodologies. Strong analytical and communication skills are...