Current jobs related to Infrastructure Architect – GPU Test Automation Farm - Markham, Ontario - AMD
-
GPU Design Verification Engineer
1 day ago
Markham, Ontario, Canada Qualcomm Full time $108,500 - $158,500Company:Qualcomm Canada ULCJob Area:Engineering Group, Engineering Group > GPU ASICS EngineeringGeneral Summary: Architects, designs, implements, verifies, and optimizes performance and power of GPU cores. Responsible for verification of Graphics IP , and performing pre- and post-silicon verification to verify correctness and ensure performance and power...
-
GPU Compiler Engineer
7 days ago
Markham, Ontario, Canada Qualcomm Full timeCompany:Qualcomm Canada ULCJob Area:Engineering Group, Engineering Group > GPU ASICS EngineeringGeneral Summary: *********Open to hiring in US and Canada As a leading technology innovator, Qualcomm pushes the boundaries of what's possible to enable next-generation experiences and drives digital transformation to help create a smarter, connected future for...
-
Linux Platform Tech Lead
2 weeks ago
Markham, Ontario, Canada AMD Full timeWHAT YOU DO AT AMD CHANGES EVERYTHINGAt AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...
-
Automation Testing Architecture
2 weeks ago
Markham, Ontario, Canada Maarut Inc Full timeTechnical Skills PrimaryContribute to the Automation Testing Architecture and Design practice acrossReview user stories requirements specifications and technical design documents to provide timely feedback and recommendations from a QualityAssurance perspective proficientlyCreate and build comprehensive test strategies test plans estimatessophisticated test...
-
Sr AI/ML Applications Architect
2 weeks ago
Markham, Ontario, Canada GE Vernova Full timeJob Description SummaryGE Vernova is accelerating the path to more reliable, affordable, and sustainable energy, while helping our customers power economies and deliver the electricity that is vital to health, safety, security, and improved quality of life. Are you excited at the opportunity to electrify and decarbonize the world?We are seeking a highly...
-
Sr AI/ML Applications Architect
1 week ago
Markham, Ontario, Canada GE Vernova Full timeJob Description SummaryGE Vernova is accelerating the path to more reliable, affordable, and sustainable energy, while helping our customers power economies and deliver the electricity that is vital to health, safety, security, and improved quality of life. Are you excited at the opportunity to electrify and decarbonize the world?We are seeking a highly...
-
Sr AI/ML Applications Architect
1 day ago
Markham, Ontario, Canada GE Vernova Full timeJob Description SummaryGE Vernova is accelerating the path to more reliable, affordable, and sustainable energy, while helping our customers power economies and deliver the electricity that is vital to health, safety, security, and improved quality of life. Are you excited at the opportunity to electrify and decarbonize the world?We are seeking a highly...
-
AI Models GPU deployment software Engineer
3 days ago
Markham, Ontario, Canada Advanced Micro Devices, Inc Full timeWHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...
-
SoC System Performance Architect AI/ML
2 weeks ago
Markham, Ontario, Canada Advanced Micro Devices, Inc Full timeWHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create...
-
Test Automation Engineer
5 days ago
Markham, Ontario, Canada Evertz Microsystems Limited Full timeIn this role, you'll test and validate hardware/software for real-world applications, collaborate with talented designers to solve complex functionality challenges, and leverage cutting-edge AI tools to push the boundaries of quality assurance. You'll write reusable automation code, manage QA test suites, and ensure releases meet the highest customer...
Infrastructure Architect – GPU Test Automation Farm
2 weeks ago
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you'll discover the real differentiator is our culture. We push the limits of innovation to solve the world's most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
The Role
AMD is looking for a highly skilled and experienced systems deployment architect to design, plan, and lead the deployment of a large-scale GPU test automation farm in a datacenter-style environment. This individual will translate AMD's test and validation vision into a robust, modular, and scalable infrastructure capable of supporting continuous integration and validation for next-generation products.
The Person
The ideal candidate combines deep technical expertise in infrastructure design with hands-on experience building large compute farms and automation systems, and has a strong understanding of datacenter operational constraints. Able to demonstrate strong architectural judgment, operational discipline, and a practical understanding of the technologies that enable scalable infrastructure.
Key Responsibilities
- Architect and design a distributed, large-scale GPU test automation farm optimized for performance, scalability, and reliability.
- Lead the deployment and operation of infrastructure in datacenter-like environments, ensuring compliance with standards for power, cooling, networking, and management systems.
- Define and enforce best practices for system configuration, monitoring, and fault tolerance to ensure high availability and performance.
- Collaborate with cross-functional teams (QA, IT, software, datacenter ops, and engineering) to deliver seamless test workflows and system integration.
- Evaluate and implement technologies that improve deployment efficiency, system observability, and scalability (containerization, virtualization, orchestration, MaaS, etc.).
- Mentor engineers in infrastructure design principles and contribute to the overall architectural vision of AMD's GPU validation environment.
Preferred Experience
- Proven expertise in GPU or HPC cluster environments, including system provisioning, scheduling, and performance tuning.
- Expert background in Windows and Linux administration, including automation tools and scripting.
- Experience with automation frameworks (Ansible, Terraform, etc.) and CI/CD pipelines for infrastructure deployment.
- Hands-on experience with MaaS (Metal-as-a-Service) platforms for large-scale bare-metal provisioning.
- Knowledge of Network Boot (PXE, iPXE, UEFI) configurations and automation.
- Experience building or integrating inventory health management systems, including real-time monitoring of servers, network devices, and supporting services.
- Skilled in space allocation and racking strategies in datacenter or lab environments.
- Deep understanding of power planning for dense compute environments.
- Experience with network design and topology optimization for high-throughput data paths.
Academic Credentials
- Bachelor's or Master's degree in Computer Science, Computer Engineering, Electrical Engineering, or equivalent
LOCATION:
Markham, Ontario Canada
Benefits offered are described:
AMD benefits at a glance.
AMD does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services. AMD and its subsidiaries are equal opportunity, inclusive employers and will consider all applicants without regard to age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third-party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status, or any other characteristic protected by law. We encourage applications from all qualified candidates and will accommodate applicants' needs under the respective laws throughout all stages of the recruitment and selection process.