Lluís Vilanova
Slashing the Disaggregation Tax in Heterogeneous Data Centers with FractOS
Disaggregated heterogeneous data centers promise higher efficiency, lower total cost of ownership, and more flexibility for data center operators. However, current software stacks can levy a high tax on application performance. Applications and OSes are designed for systems where local PCIe-connected devices are centrally managed by CPUs, but this centralization introduces unnecessary messages through the shared data center network in a disaggregated system.
This talk presents FractOS, a distributed OS designed to minimize the network overheads of disaggregation in heterogeneous data centers. FractOS elevates devices to first-class citizens in the system, enabling direct peer-to-peer data transfers and task invocations among them, without centralized application and OS control. FractOS achieves this through: (1) new abstractions to express distributed applications across services and disaggregated devices, (2) new mechanisms that enable devices to securely interact with each other and other data center services, and (3) a distributed and isolated OS layer that implements these abstractions and mechanisms, and can run on host CPUs and SmartNICs. By accelerating a heterogeneous application with FractOS, we can see 47% better performance, while reducing network traffic by 3x.
This talk presents FractOS, a distributed OS designed to minimize the network overheads of disaggregation in heterogeneous data centers. FractOS elevates devices to first-class citizens in the system, enabling direct peer-to-peer data transfers and task invocations among them, without centralized application and OS control. FractOS achieves this through: (1) new abstractions to express distributed applications across services and disaggregated devices, (2) new mechanisms that enable devices to securely interact with each other and other data center services, and (3) a distributed and isolated OS layer that implements these abstractions and mechanisms, and can run on host CPUs and SmartNICs. By accelerating a heterogeneous application with FractOS, we can see 47% better performance, while reducing network traffic by 3x.
back to overview
Watch Recording
Biography
Lluís Vilanova is a Lecturer (Assistant Professor) in the Department of Computing at Imperial College London, where he is a member of the Large-Scale Data & Systems (LSDS) group, and co-founder of the Cloud, Data and Exascale Computing Hub at Imperial-X. His work focuses on the intersection between computer architecture, systems, and security, with a special interest on using co-design to break the zero-sum game we typically see between security features and performance.Previously, Lluís was a post-doctoral researcher with Yoav Etsion at the Technion’s Electrical Engineering department. He received his PhD degree from the Computer Architecture Department at Universitat Politècnica de Catalunya, where he also participated as a researcher at the Barcelona Supercomputing Center.