Facebook

Research Engineer, PyTorch at Scale (PhD)

Posted on: 6 Feb 2021

Menlo Park, CA

Job Description

The PyTorch Large Scale Training team is enabling cutting edge research and massive production workloads. The team is building an efficient and scalable deep learning training system combining Highly Performant Compute and Networking. Were focusing on state-of-the-art Distributed Training solutions in the Data, Model and Hybrid Parallelism space. The team is also developing a Communication Library supporting GPU/CPU Collective and point-to-point primitives for step function change in performance, scalability, efficiency and reliability. We actively engage with the community through publications, open source software, participation in technical conferences and workshops, and collaborations with academia. Facebook AI researchers and engineers work from our offices around the globe.

Research Engineer, PyTorch at Scale (PhD) Responsibilities

* Develop high-performance and scalable algorithms for GPU/CPU collective and point-to-point communication.

* Analyze and improve efficiency, scalability, and stability of existing communication solutions.

* Collaborate with other research teams to evaluate and incorporate innovative solutions.

* Code using a mixture of C++ and Python.

* Define use cases and develop methodology and benchmarks to evaluate different approaches and tradeoffs.

Minimum Qualifications

* 4+ years of experience or currently has, or in process of obtaining PhD in Computer Science, Electrical Engineering or equivalent field.

* Experience in deep learning algorithms and techniques (e.g., convolutional neural networks, recurrent networks, etc.).

* Software design and programming experience in C, C++, Java, Python or similar language for development, debugging, testing and performance analysis.

* Experience crossing multi-disciplinary boundaries to drive efficient system solutions.

* Must obtain work authorization in country of employment at the time of hire and maintain ongoing work authorization during employment.

Preferred Qualifications

* Experience with Distributed Systems and High-Performance Computing.

* Experience in driving and delivering state-of-the-art solutions.

* Experience in developing a mainstream machine-learning framework, e.g. PyTorch, TensorFlow or Caffe.

Locations

About the Facebook company

Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities we're just getting started.

Facebook is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, you may contact us at accommodations-ext@fb.com.

Facebook

Menlo Park, CA

Facebook, Inc. provides various products to connect and share through mobile devices, personal computers, and other surfaces worldwide. The company’s products include Facebook that enables people to connect, share, discover, and communicate with each other on mobile devices and personal computers; Instagram, a community for sharing photos, videos, and messages; Messenger, a messaging application for people to connect with friends, family, groups, and businesses across platforms and devices; and WhatsApp, a messaging application for use by people and businesses to communicate in a private way. It also provides Oculus, a hardware, software, and developer ecosystem, which allows people to come together and connect with each other through its Oculus virtual reality products. As of December 31, 2018, it had approximately 1.52 billion daily active users. The company was founded in 2004 and is headquartered in Menlo Park, California.