LLM Inference Deployment Engineer

1 day ago

US Canada Germany Norway EnCharge AI Full time US$120,000 - US$200,000 per year

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software integration and will enable the immense potential of AI to be accessible in power, energy, and space constrained applications. EnCharge AI launched in 2022 and is led by veteran technologists with backgrounds in semiconductor design and AI systems.

About the Role

EnCharge AI is seeking an LLM Inference Deployment Engineer to optimize, deploy, and scale large language models (LLMs) for high-performance inference on its energy efficient AI accelerators. You will work at the intersection of AI frameworks, model optimization, and runtime execution to ensure efficient model execution and low-latency AI inference.

Responsibilities

Deploy and optimize LLMs (GPT, LLaMA, Mistral, Falcon, etc.) post-training from libraries like HuggingFace
Utilize inference runtimes such as ONNX Runtime, vLLM for efficient execution.
Optimize batching, caching, and tensor parallelism to improve LLM scalability in real-time applications.
Develop and maintain high-performance inference pipelines using Docker, Kubernetes, and other inference servers.

Qualifications

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
Experience in LLM inference deployment, model optimization, and runtime engineering.
Strong expertise in LLM inference frameworks (PyTorch, ONNX Runtime, vLLM, TensorRT-LLM, DeepSpeed).
In-depth knowledge of the Python programming language for model integration and performance tuning.
Strong understanding of high-level model representations and experience implementing framework-level optimizations for Generative AI use cases
Experience with containerized AI deployments (Docker, Kubernetes, Triton Inference Server, TensorFlow Serving, TorchServe).
Strong knowledge of LLM memory optimization strategies for long-context applications.
Experience with real-time LLM applications (chatbots, code generation, retrieval-augmented generation).

EnchargeAI is an equal employment opportunity employer in the United States.

Senior Software Engineer

3 weeks ago

, , Canada LLM Full time

LLM.co delivers private, secure large language model (LLM) solutions tailored for enterprises operating in highly regulated industries such as law, healthcare, finance, and government. We build and deploy domain-specific AI tools that help our clients gain insight and efficiency while maintaining full control over their data and compliance requirements....
Senior Software Engineer

3 weeks ago

, , Canada LLM Full time

A technology solutions provider in Canada is seeking a Senior Software Engineer to architect backend systems for secure LLM deployment. The ideal candidate should have over 5 years of experience in backend systems engineering, proficient in Python and APIs, and familiarity with cloud infrastructure. This role offers competitive salary, remote flexibility,...
Deployment Engineer, AI Inference

4 weeks ago

, , Canada Cerebras Full time

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to...
AI Runtime Engineer

1 week ago

U.S., Canada, Germany, Norway EnCharge AI Full time US$125,000 - US$175,000 per year

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software...
Head of Inference Platform Engineering

3 weeks ago

, , Canada Cerebras Full time

A pioneering AI hardware company is seeking a technical engineering leader for their Inference Service Platform. The role involves leading a team to tackle scaling challenges for LLM inference while ensuring high reliability and performance. Ideal candidates should have strong experience in distributed systems, inference optimization, and technical...
AI Compiler Engineer

2 days ago

U.S., Canada, Germany, Norway EnCharge AI Full time US$120,000 - US$200,000 per year

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software...
AI Inference Engineer — Open-Source Integrations

7 days ago

, , Canada Cerebras Full time

A leading AI technology company in Canada seeks an experienced software engineer to develop open-source libraries and applications for its innovative inference platform. The role involves collaborating with engineering teams and creating demo applications that showcase the platform's advantages. Candidates should have a degree in computer science, 4+ years...
Embedded SW Engineer

1 day ago

U.S., Canada, Germany, Norway EnCharge AI Full time US$120,000 - US$180,000 per year

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software...
Senior Software Engineer, AI Inference Platform

22 hours ago

, , Canada Cerebras Full time

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer‑scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry‑leading training and inference speeds and empowers machine learning users...
AI Research Engineer

7 days ago

Canada, Germany, Norway, United States EnCharge AI Full time $120,000 - $180,000 per year

EnCharge AI is a leader in advanced AI hardware and software systems for edge-to-cloud computing. EnCharge's robust and scalable next-generation in-memory computing technology provides orders-of-magnitude higher compute efficiency and density compared to today's best-in-class solutions. The high-performance architecture is coupled with seamless software...

Americas

Europe

Asia / Oceania

Africa

LLM Inference Deployment Engineer