Azure Specialized Compute drives the hardware roadmap, software and services that enable our users to run technical computing workloads in Azure - from batch workloads to AI & machine learning to traditional HPC simulations to remote visualization. We are responsible for providing the most scalable MPI platform as well as the most powerful GPU clusters for our end customers in their quest for finding answers to some of the most difficult questions of science and industry.
This is an exciting time for HPC + AI, as they are undergoing a massive shift. AI technologies are being merged with existing HPC approaches, and both are moving to the cloud. At this critical juncture, we are looking for AI benchmarking technical lead to be part of our benchmarking initiative. This team member would be responsible for AI industry standard and customer specific application benchmarks on our latest and greatest hardware offering showcasing the best of Azure. Typical team project would include gathering performance data and characteristics for key AI applications, analyzing and optimizing the application to run best on Azure HPC infrastructure based on latest GPUs, CPUs and other accelerators. The benchmarking team works closely with Product management, engineering, and is engaged in key customer performance evaluations.
A successful candidate would have experience in experience with AI training and validation workloads with interest in large scale applications [think supercomputer scale]. Be a self-starter that is willing and able to mentor junior members of the team and provide training to the field team. Finally, the candidate must be coachable and a team player.
* Excellent problem-solving skills and analytical ability.
* Solid understanding of AI architecture and requirements [processor technology, networks, memory components etc.]
* Solid working knowledge of Linux and able to compile and modify AI codes that use C++, MPI, CUDA, Python, and OpenMP.
* Ability to use CPU and GPU profiling tools to identify bottlenecks in the performance.
* Proficient in one of the following AI frameworks: PyTorch, TensorFlow, MXNet
* Working knowledge of container orchestration including setting up and configuring Docker and Kubernetes.
* Working knowledge of Slurm is desired.
* Experience with scientific/engineering software for AI systems
* Experience/education in fields where AI is used, including Deep Learning, Computer Vision, Physics, Engineering, Data Analytics, etc.
* An understanding of the issues affecting AI application performance.
* Willingness to take feedback and be a team player.
* Ability to clearly communicate issues and results to stakeholders.
* Master's program or Ph.D completed in Computer Science or related fields
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
* Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
MicrosoftATL
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. We also consider qualified applicants regardless of criminal histories, consistent with legal requirements. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.
Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Redmond, WA
Microsoft Corporation develops, licenses, and supports software, services, devices, and solutions worldwide. Its company’s Productivity and Business Processes segment offers Office 365 commercial products and services, such as Office, Exchange, SharePoint, Skype for Business, Microsoft Teams, and related Client Access Licenses (CALs); Office 365 consumer services, including Skype, Outlook.com, and OneDrive; LinkedIn online professional network; and Dynamics business solutions comprising financial management, enterprise resource planning, customer relationship management, supply chain management, and analytics applications for small and medium businesses, large organizations, and divisions of enterprises.
The company’s Intelligent Cloud segment licenses server products and cloud services, such as SQL Server, Windows Server, Visual Studio, System Center, and related CALs, as well as Azure, a cloud platform; and enterprise services, including premier support and Microsoft consulting services to assist customers in developing, deploying, and managing Microsoft server and desktop solutions, as well as provides training and certification to developers and IT professionals.
Its More Personal Computing segment offers Windows OEM, volume, and other non-volume licensing of the Windows operating system; patent licensing, Windows Internet of Things, and MSN display advertising; Surface, PC accessories, and other devices; Xbox hardware and software and services; and Bing and Bing Ads search advertising. It markets its products through original equipment manufacturers, distributors, and resellers; and online and Microsoft retail stores.
Microsoft Corporation has collaboration with E.ON, NIIT Technologies Ltd., CUNA Mutual Group, and Mastercard Incorporated; strategic alliance with Nielsen Holdings plc and PAREXEL International Corp.; and a strategic partnership with SK Telecom Co., Ltd. The company was founded in 1975 and is headquartered in Redmond, Washington.