Runtime Engineer
1 week ago
Location: SF Bay Area / Toronto | Full-time | Hybrid
Compensation: Competitive salary (based on experience & location) + Equity + Bonus
About the Role
This is your opportunity to join a mission-driven startup building the foundation of sustainable AI. The team is creating a high-performance compiler and runtime that allows developers to "build once, deploy anywhere"—from the cloud to the edge—while optimizing efficiency and scalability.
As a Runtime Engineer, you will focus on developing multi-target runtime systems, leveraging advanced techniques in concurrency, parallelization, and hardware optimization. Your work will directly shape the performance and accessibility of next-generation AI infrastructure.
What You Will DoDesign, develop, and optimize multi-target runtime systems.
Use modern techniques in parallelization and partitioning to generate highly efficient kernels.
Rapidly prototype and explore new approaches using data-driven methods.
Benchmark and analyze compiler outputs across diverse hardware platforms.
Collaborate with product teams to align runtime design with ML engineers' needs.
Build tools to identify, analyze, and resolve performance bottlenecks.
Strong expertise in asynchronous and concurrent programming.
4+ years of experience with C/C++ (C++14 or newer).
Solid understanding of hardware architectures (vector vs scalar registers, memory hierarchies).
Experience with OS kernel or hypervisor development.
Preferred:
Experience with CUDA, ROCm, or GPU programming.
Background in HPC (high-performance computing).
Advanced degree (MS/PhD) in Computer Science or related field.
Familiarity with deep learning frameworks (PyTorch, JAX, Triton).
Experience programming for large compute clusters.
Salary: Competitive, adjusted for experience and geography.
Equity + performance-based bonus opportunities.
Medical, dental, and vision coverage.
Retirement savings plan.
Supplemental wellness benefits.
Hybrid flexibility (SF Bay Area or Toronto).
Build core AI infrastructure – work on runtime systems that power sustainable, scalable AI.
Cutting-edge engineering – apply advanced concurrency, GPU programming, and HPC techniques.
Mission-driven culture – focus on innovation with environmental responsibility.
High-growth environment – contribute to core architecture at a pivotal stage for the company.
-
Principal Runtime Expert
2 weeks ago
Greater Toronto Area, Canada Huawei Canada Full time $150,000 - $180,000 per yearAbout the CompanyWe're in search of a top-tier Runtime Expert with extensive hands-on experience in Java/GraalVM/JavaScript runtime systems Your deep expertise in JVM internals, GraalVM compiler optimization, JS engine architecture, and large-scale runtime performance tuning will drive critical technical initiatives.About the RoleYou'll lead the design and...
-
Toronto, Ontario, Canada Tenstorrent University Jobs Full time $90,000 - $120,000 per yearAs a Software Engineering PEY intern on the Metal Runtime team at Tenstorrent, you'll get hands-on experience working on the low-level software that powers our AI accelerators. You'll learn how high-performance runtime systems are built, how software interacts with custom silicon, and what it means to work close to the metal. This role is onsite based out...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Systems Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Systems Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Systems Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users...
-
Senior Runtime Performance Engineer
3 weeks ago
Toronto, Canada Cerebras Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users...
-
Senior Runtime Performance Engineer
2 weeks ago
Toronto, Canada Cerebras Full timeCerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users...
-
Compiler Tech Lead
1 week ago
San Francisco, CA|Toronto, ON|Hybrid Amadeus Search Full time US$120,000 - US$200,000 per yearRole: Compiler Tech LeadLocation: SF Bay Area / Toronto | Full-time | HybridCompensation: Competitive salary (based on experience & location) + Equity + BonusAbout the RoleThis is your opportunity to take on a leadership role at a mission-driven startup building sustainable AI infrastructure. The team is creating a high-performance, portable compiler that...