AI Compute Infrastructure Engineer
1 week ago
Cerebras Systems has pioneered a groundbreaking chip and system that revolutionizes deep learning applications. Our system empowers ML researchers to achieve unprecedented speeds in training and inference workloads, propelling AI innovation to new horizons.
Condor Galaxy 1 (CG-1), a supercomputer set to revolutionize the world of artificial intelligence. With an astounding processing power of 4 ExaFLOPs, 54 million cores, and a cutting-edge 64-node architecture, the CG-1 is the first milestone of a larger project that will redefine the possibilities of AI.
About The Role
As a Software Quality Engineer on our team, you will use your knowledge to influence better software design, bug prevention strategies, testability, scalability, and other advanced quality concepts. This position will play a huge role on the quality of Cerebras software. We are looking for engineers that have a broad set of technical skills who are ready to tackle the biggest at-scale problems in HW-based deep learning accelerators.
The successful completion and deployment of the CG-1, the first of nine powerful supercomputers, is a significant achievement for Cerebras. As we enter phase 2 of the project with CG2, we are taking a bold step towards creating a network of interconnected supercomputers that will collectively deliver a mind-boggling 36 ExaFLOPs of AI compute power upon completion.
Cerebras is building a team of exceptional people to work together on big problems. Join us
Responsibilities
- Monitor and oversee CG health to ensure stability and security
- Manage and customize k8s, cluster, cloud features on CGs
- Provide solutions to ML users using tools and components available in a vast linux-based ecosystem - compute, storage, networking.
- Configure, deploy and debug container-based services on orchestration platforms like Kubernetes.
- Provide 24/7 monitoring, support – using automated tools and hands-on manual troubleshooting
- Training and Inference in data center, LLM (50b to 500b parameter models), multi-modal, mistral etc.
- Adapt and make progress in a fast-paced and constantly evolving environment.
- Document processes and procedures needed to efficiently operate CGs.
Requirements
- BS CS/EE, MS CS/EE
- Relevant experience in managing/maintaining compute infrastructure
- Proficiency with Python and other common programming languages
- Experience in container orchestration platforms like Kubernetes and SLURM
- Familiar with ML frameworks like PyTorch, Tensorflow, etc.
- Strong knowledge and demonstrated experience with:
- Good understanding of cloud infrastructure design, deployment and maintenance
Cerebras Systems is committed to creating an equal and diverse environment and is proud to be an equal opportunity employer. We celebrate different backgrounds, perspectives, and skills. We believe inclusive teams build better products and companies. We try every day to build a work environment that empowers people to do their best work through continuous learning, growth and support of those around them.
This website or its third-party tools process personal data. For more details, click here to review our CCPA disclosure notice.
Apply for this JobRequired
Candidate saved successfully
Functional Functional Always active The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network. Preferences Preferences The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user. Statistics Statistics The technical storage or access that is used exclusively for statistical purposes. The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you. Marketing Marketing The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.
#J-18808-Ljbffr-
AI Compute Infrastructure Engineer
1 week ago
Toronto, Ontario, Canada Cerebras Full timeOur system empowers ML researchers to achieve unprecedented speeds in training and inference workloads, propelling AI innovation to new horizons. Condor Galaxy 1 (CG-1), a supercomputer set to revolutionize the world of artificial intelligence. With an astounding processing power of 4 ExaFLOPs, 54 million cores, and a cutting-edge 64-node architecture, the...
-
DevOps Engineer
1 week ago
Old Toronto, Ontario, Canada Nestbox AI Inc Full timeNestbox AI is a leading technology company headquartered in New York and Toronto. We are at the forefront of the third wave of generative AI, enabling enterprise clients to bring their custom AI visions to life quickly and securely. Our platform is revolutionizing industries such as technology, finance, healthcare, and education. At Nestbox AI, we are...
-
Computational Design Engineer: EDA
1 week ago
Toronto, Ontario, Canada Axiomatic-AI Full timeComputational Design Engineer: EDA & Scientific Computing + Hirgin now industry Axiomatic_AI's mission: Axiomatic_AI is launching with the aim to accelerate R&D by "Automated Interpretable Reasoning" (AIR) a verifiably truthful AI model built for reasoning in science and engineering. Axiomatic_AI is hiring top talent interested in a future of human...
-
AI Engineer Senior
1 week ago
Toronto, Ontario, Canada Zuswork Full timeJob Type: Full TimePosition: Senior Cybersecurity and AI EngineerLocation: Remote but comfortable following the Eastern Region (EST) or Central Region (CST) time zone.Industry: SaaS - Asset ManagementThey are a team of passionate and dedicated individuals building great software. Thousands of organizations across AV rental, event management, construction,...
-
AI Software Developer + Hirgin now
1 week ago
Toronto, Ontario, Canada Axiomatic-AI Full timeAI Software Developer + Hirgin now industry AI Software Developer(Toronto, Canada- hybrid)Axiomatic_AI's mission:Axiomatic_AI is readying to launch with the aim to accelerate R&D by "Automated Interpretable Reasoning" (AIR) -- a verifiably truthful AI model built for reasoning in science and engineering. Axiomatic_AI is hiring top talent interested in a...
-
Developer for reasoning AI + Hirgin now
1 week ago
Toronto, Ontario, Canada Axiomatic-AI Full timeDeveloper for reasoning AI + Hirgin now industry Axiomatic_AI's mission: Axiomatic_AI is launching with the aim to accelerate R&D by "Automated Interpretable Reasoning" (AIR) a verifiably truthful AI model built for reasoning in science and engineering. Axiomatic_AI is hiring top talent interested in a future of human reasoning aided by not replaced by ...
-
Lead Machine Learning Engineer
1 week ago
Old Toronto, Ontario, Canada Maker AI Full timeLead Machine Learning Engineer (NLP) at Maker AI (United States)Full TimeWork Location: Toronto, Canada, United StatesSalary Offered: Not SpecifiedExperience Required: No experience requiredRemote Work: YesStock Options: NoDraft (formerly Contentfly) is revolutionizing the content infrastructure companies use to boost their revenue. We are a fully remote...
-
Verification Engineer
1 week ago
Toronto, Ontario, Canada Untether AI Full timeWhen you join Untether AI, you will be part of the team that creates innovative hardware and delivers industry leading AI performance and efficiency. Utilizing non Von Neumann techniques, you will help propel sustainable AI inference . Formal technologies include Formal Property Verification, Sequential Equivalence Checking and Data path Validation. ...
-
AI Senior Engineer
1 week ago
Toronto, Ontario, Canada Blanc Labs Full timeBlanc Labs is a premier partner for global enterprises, leading the way in digitization, automation, and the development of next-generation digital products and services. Our expertise in digital transformation powers businesses to accelerate service delivery, drive customer engagement, and foster growth. Blanc Labs is at the forefront of AI/ML innovation...
-
Cloud Infrastructure Architect/Engineer
1 week ago
Old Toronto, Ontario, Canada Tundra Technical Solutions Inc. Full timeCloud Infrastructure Architect / Engineer6 Month ContractHybrid, TorontoWe are seeking a highly skilled Cloud Infrastructure Architect / Engineer to join our growing cloud data platform team. You will play a pivotal role in supporting a critical program focused on the enablement of advanced analytics for enterprise consumption. In this role, you will be...
-
AI Engineer
2 months ago
Toronto, Ontario, Ontario, Canada Infotek Consulting Inc. Full timeOverview: We are seeking a talented and innovative AI Engineer to join our team. As an AI Engineer, you will be responsible for designing, developing, and deploying artificial intelligence (AI) and machine learning (ML) solutions to solve complex business problems and drive innovation. Your expertise in AI algorithms, data analytics, and software engineering...
-
AI Engineer
4 weeks ago
Toronto, Ontario, Ontario, Canada Infotek Consulting Inc. Full timeOverview: We are seeking a talented and innovative AI Engineer to join our team. As an AI Engineer, you will be responsible for designing, developing, and deploying artificial intelligence (AI) and machine learning (ML) solutions to solve complex business problems and drive innovation. Your expertise in AI algorithms, data analytics, and software engineering...
-
SDE (AI/ML Engineer)
1 week ago
Toronto, Ontario, Canada Merican Inc Full timeJob role : SDE (AI/ML Engineer) Job Location : Remote Job Description: As a Senior Full-Stack Engineer, you'll be integral to our team, responsible for developing the infrastructure that supports our groundbreaking media and entertainment AI initiatives. This role combines creative problem-solving with high-level engineering to enhance our ability to...
-
Senior Cybersecurity and AI Engineer
1 week ago
Toronto, Ontario, Canada Zuswork Full timeHiring for a SaaS-based client based in Carson City, NVJob Type: Full TimePosition: Senior Software Engineer - Systems (EndPoint)Location: Preferably Eastern Region (EST) or Central Region (CST)Industry: SaaS - Asset ManagementAbout the team:They are a team of passionate and dedicated individuals building great software. Who strives for excellence in all...
-
Formal Verification Engineer
1 week ago
Toronto, Ontario, Canada Untether AI Full timeWe're looking for best in class engineers to join our existing top-notch team. When you join Untether AI, you will be part of the team that creates innovative hardware and delivers industry leading AI performance and efficiency. Utilizing non Von Neumann techniques, you will help propel sustainable AI inference . As part of this talented team of engineers,...
-
Toronto, Ontario, Canada Oracle Full time $142,900 - $338,600Oracle Senior Director, AI Infrastructure Optical Network Performance Madison , Wisconsin Apply Now Oracle is looking for a Senior Director in AI infrastructure Optical Network Performance in Oracle Cloud Infrastructure (OCI). We are building the next generation of cloud. Our customers run their business on our cloud, and our mission is to provide them with...
-
Toronto, Ontario, Canada Oracle Full timeOracle Senior Director, AI Infrastructure Optical Network Performance - Madison, Wisconsin Oracle is seeking a Senior Director for AI infrastructure Optical Network Performance in Oracle Cloud Infrastructure (OCI). Join us in shaping the future of cloud computing.
-
Research engineer in machine learning
1 week ago
Toronto, Ontario, Canada Borealis AI Full timeBorealis AI is looking for an enthusiastic Senior Research Engineer who's excited by the opportunity of being at the forefront of machine learning technology, and working on extremely challenging problems in the financial services industry. As a Senior Research Engineer, you'll be part of a collaborative team delivering AI projects end to end – everything...
-
Sales Engineer
1 week ago
Toronto, Ontario, Canada STAN AI Full timeLocation: Toronto, ONEmployment Type: Full-Time, In-OfficeCompany Overview:STAN AI is a leading innovator in Artificial Intelligence in the community management industry, dedicated to providing cutting-edge solutions that empower businesses to achieve their goals. We are committed to delivering exceptional products and services that exceed customer...
-
Sales Engineer
1 week ago
Toronto, Ontario, Canada STAN AI Full timeLocation:Toronto, ONEmployment Type:Full-Time, In-OfficeSTAN AI is a pioneering company in Artificial Intelligence within the community management sector, focused on offering innovative solutions that empower businesses to reach their objectives.We are dedicated to providing top-notch products and services that surpass customer expectations. We are in search...