Senior HPC ML Applications Engineer - GPU & CPU
Austin, TX 
Share
Posted 13 days ago
Job Description


What you do at AMD changes everything

We care deeply about transforming lives with AMD technology to enrich our industry, our communities and the world. Our mission is to build great products that accelerate next-generation computing experiences - the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world's most important challenges. We strive for execution excellence, while being direct, humble, collaborative and inclusive of diverse perspectives. This is who we are at our best. One Company. One Team.
AMD together we advance_

Senior HPC ML Applications Engineer - GPU & CPU

The Role:

We are looking for our next team member to join our growing HPC Data Center GPU (DCGPU) team, to enable and optimize HPC applications and provide performance and systems expertise to our internal partners & customers prior to 1st Si through going to production of our Epyc processors and Instinct(MI) GPU accelerators based systems and solutions .

A 'hands-on' role working independently and with other AMD engineers to tackle technical HPC functional and performance issues, collaborating with our customer-facing organizations, our internal R&D and other key engineering groups. Working across a variety of partners on the bring-up, design, debug and performance of the world's largest HPC systems, making a significant impact at a global level, including working with the 'Mega Datacenters' and HPC cloud providers. Growing the success and market penetration of the AMD GPU as it applies to HPC.

The Person:

Very Strong solution-oriented mindset

Expertise in HPC application performance testing and debug on CPU and/or GPU

Strong technical ownership and ability to lead technical relationships with both customers and HPC partners

Ability to independently prioritize opportunities to deliver results on time

Proven success establishing relationships internally and across a network of customers and partners

Excellent verbal and written communication skills

Key Responsibilities:

Seek maximum HPC performance while achieveing highest quality on AMD EPYC plus Instict systems through a combination of performance optimization, HPC workload debug and characterization, compilers, math libraries and lower-level AMD-internal toolsets

Feeding back performance bottlenecks and functional issues to the relevant engineering groups during bring-up to improve quality and performance

Partner with our collaborative internal development and validation teams supporting with a deeper level of HPC application and system-level expertise

Attending and leading high-value technical HPC discussions to portray general AMD GPU proposition and its application to HPC

Technically owning and resolving customer and partner issues. Submitting JIRA tickets and driving resolution

Collaborate on future architectures, functional validation and performance testing

Attend internal working groups in resolving engineering issues; contribute to the debug and testing of unreleased GPU based solutions and their readiness for HPC workloads

Document and publish system health and performance results, as well as procedures you have generated and procedures automation

Preferred Experience:

Proven HPC application experience balanced with partner or customer-facing experience

HPC Functional applications bring-up, triage, and performance profiling, monitoring tools, and software performance optimization

Expertise working with large codes from source, with appropriately linked math libraries and flag optimization, working with different compilers, MPI libraries, and math libraries

System-level hardware and its configuration on performance, such as Infiniband and shared parallel filesystems

Proven understanding of baseline testing of synthetic codes: HPL, STREAM, DGEMM, HPCG, HPCC

Linux administration; understanding setup for HPC middleware

Nice to Haves:

Experience working on very large codes such as weather and associated tuning for greater scalability

Any experience understanding/inspecting/writing assembly

Understanding of memory and cache hierarchy and methods to query performance/latency at each level

Understanding HPC dataflow down to the register-level

Academic Credentials:

  • List any desired degrees, certifications, etc.
  • Use the words preferred or desired, instead of required

Location:

Austin Texas



Requisition Number: 175741
Country: United States State: Texas City: Austin
Job Function: Design

Benefits offered are described .

AMD does not accept unsolicited resumes from headhunters, recruitment agencies or fee based recruitment services. AMD and its subsidiaries are equal opportunity employers. We consider candidates regardless of age, ancestry, color, marital status, medical condition, mental or physical disability, national origin, race, religion, political and/or third party affiliation, sex, pregnancy, sexual orientation, gender identity, military or veteran status. Please click for more information.

 

Job Summary
Start Date
As soon as possible
Employment Term and Type
Regular, Full Time
Required Experience
Open
Email this Job to Yourself or a Friend
Indicates required fields