Tuesday, April 30, 2019 - 12:00pm
Location:8102 Gates Hillman Centers
Speaker:DAEHYEOK KIM, Ph.D. Student https://daehyeok.kim/
A Framework for Unleashing the Potential of In-network Computing
Recent advances in programmable network hardware devices such as programmable network switches and programmable network interface cards (NICs) with Field Programmable Gate Array (FPGA) or Remote Direct Memory Access (RDMA) capability enable the network data plane to directly perform in-network computation on packets beyond its traditional role, packet forwarding. While this technology trend with various types of devices creates new opportunities for accelerating a wide range of data center workloads including networking and distributed systems applications, we have observed that the heterogeneity of devices causes many practical challenges to exploit the full potential of in-network computing.
In particular, we have identified the following unexplored challenges. First, it is not trivial to allow to access computing and storage resources between different types of network devices without performance penalties due to their architectural differences. Second, it is challenging to handle failures of devices because hardware-specific computing and resource constraints prohibit from directly applying traditional methods for handling failures. Third, different types of network devices involve trade-offs between running different kinds of computation, and this can make it difficult for developers to decide how to divide the workload between different devices properly.
In this thesis, we address the above challenges by proposing a programming and runtime framework for an in-network computing platform. We argue that with an appropriate framework, we can effectively reduce the complexity that comes from the heterogeneity of devices by providing higher-level abstractions. In particular, our work will consist of three components. First, we propose a programming framework providing programming abstractions that help application developers easily explore a huge design space without facing the complexity. Second, we propose abstractions and runtime APIs that allow accessing remote computing and storage resources while hiding the complexity of communications between heterogeneous devices. Lastly, we propose abstractions and runtime APIs to support fault-tolerance for applications, which mask hardware-specific resource constraints. We demonstrate the effectiveness and feasibility of our approach with a prototype implementation and preliminary evaluation.
Srinivasan Seshan (Co-chair)
Vyas Sekar (Co-chair)
Nick McKeown (Stanford University)