University of California, San Diego
Monday, April 28th, 11am
CEPSR 750
Abstract:
It is now common for individual applications such as
search engines and social networks to serve a billion users. Enabling
that growth has been a relentless pursuit of scalability, in hardware
and software, that has culminated in expensive and power-hungry data
centers. Yet the delivered performance of these systems has lagged
behind their potential capability by an order of magnitude or more.
In this talk, I will describe two projects focused on improving data
center resource efficiency. The first is a MapReduce implementation
built upon an IO-optimized distributed sorting system called
TritonSort. When applied to the 100 TB GraySort benchmark, it improved
upon the absolute performance of the previous world record holder by
25%, using 66 times fewer servers. As a result, it has attained the 100
TB JouleSort record for energy-efficient data processing. The second
project focuses on the design of the underlying data center network.
Existing scalable data center network designs promise full bisection
bandwidth between all servers, though with significant cost,
complexity, and power consumption. Instead, we propose a hybrid
electrical/optical switch architecture that can deliver a nearly 3x
reduction in cost and 6x reduction in power consumption relative to the
state of the art.
Bio: George Porter is a Research Scientist at UCSD and the Associate Director of UCSD's Center for Networked Systems. His research interests include data-intensive computing and data center networking. He has received a Google Focused Research Award and a NetApp Faculty Fellowship. He received his B.S. from the University of Texas at Austin and his Ph.D. from the University of California, Berkeley.