The Network.AI group is a new team within Facebook Infrastructure. The charter of the new group spans the design and operations of the AI networking Infra including the network switches and the host side systems, as well as forward-looking projects such as transport evolution. Network Engineers at Facebook are a hybrid software/network engineers who design, build and operate our worldwide data center network. This team owns the complete lifecycle of the AI network in the data center from planning, design, product definition, QA, deployment and monitoring. Simple and scalable network design, automation and data analytics are the keys to meeting our demands. In this role, you will be responsible for conceiving, developing and deploying network software, systems and tools that keep the AI data center network operating at maximum reliability, scalability and efficiency. Do you like developing innovative solutions to some of the most complex scaling and reliability challenges out there? Do you want to build and operate the hyper-scale data center network that powers the worlds largest social network? Do you want to ship code in production that positively impacts the experience of billions of users worldwide? Then, this is the role for you.
Network.AI Engineer Responsibilities
* (Re)Design, deploy, manage and maintain the Facebook datacenter networks for AI infrastructure worldwide
* Develop software that improves the reliability, efficiency and velocity of building and operating the AI datacenter network
* Participate in the network on-call rotation and be an escalation contact for site events. Analyze data and identify root cause to network issues. Build monitoring systems and software robots that can debug and remediate network issues at scale
* Test new network platforms before they are deployed in production
* Build automation that improves the safety and reliability of our network software CI/CD pipeline
* Partner alongside the best engineers in the industry on the coolest stuff around - the code and systems you work on, will be in production and used by billions of users all around the world
Minimum Qualifications
* 2+ years of experience in one or more of higher level programming languages (Python, C, C++, Go, etc.)
* Understanding of TCP/IP
* 7+ years of experience with RoCE, Infiniband, RDMA - understanding of typical configurations and performance
* 7+ years of experience in configuration and maintenance of network devices and NMS systems, or applications such as web servers, load balancers, relational databases, storage systems and messaging systems
* Experience in developing and understanding network device configuration for at least one vendor (Arista, Juniper, Cisco, Brocade, Ciena, Infinera, Linux, etc.)
* Experience in understanding and mitigating network hardware and topology failures
Preferred Qualifications
* BS or MS in Computer Science or Computer Engineering or Electrical Engineering
* Experience in a service provider or hyper-scale network in engineering or design roles
* Knowledge in TCP/IP Congestion Control Algorithms (DCTCP/Cubic)
* Knowledge of Network QoS and Scheduling algorithms (WRR/SP)
* Understanding of the internals of a Router/Switch hardware, NPU/data planes and Optics
* Understanding of the design principles and troubleshooting of distributed systems
Locations
About the Facebook company
Facebook's mission is to give people the power to build community and bring the world closer together. Through our family of apps and services, we're building a different kind of company that connects billions of people around the world, gives them ways to share what matters most to them, and helps bring people closer together. Whether we're creating new products or helping a small business expand its reach, people at Facebook are builders at heart. Our global teams are constantly iterating, solving problems, and working together to empower people around the world to build community and connect in meaningful ways. Together, we can help people build stronger communities we're just getting started.
Facebook is committed to providing reasonable support (called accommodations) in our recruiting processes for candidates with disabilities, long term conditions, mental health conditions or who are neurodivergent, and to candidates with sincerely held religious beliefs or requiring pregnancy related support. If you need support, please reach out to accommodations-ext@fb.com.
Menlo Park, CA
Facebook, Inc. provides various products to connect and share through mobile devices, personal computers, and other surfaces worldwide. The company’s products include Facebook that enables people to connect, share, discover, and communicate with each other on mobile devices and personal computers; Instagram, a community for sharing photos, videos, and messages; Messenger, a messaging application for people to connect with friends, family, groups, and businesses across platforms and devices; and WhatsApp, a messaging application for use by people and businesses to communicate in a private way. It also provides Oculus, a hardware, software, and developer ecosystem, which allows people to come together and connect with each other through its Oculus virtual reality products. As of December 31, 2018, it had approximately 1.52 billion daily active users. The company was founded in 2004 and is headquartered in Menlo Park, California.