Spring 2015

DISTINGUISHED LECTURE SERIES

Applying theory to practice (and practice to theory)
Ronald Fagin
IBM Research – Almaden
Tuesday, February 17

ABSTRACT: The speaker will talk about applying theory to practice, with a focus on two IBM case studies. In the first case study, the practitioner initiated the interaction. This interaction led to the following problem. Assume that there is a set of “voters” and a set of “candidates”, where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate’s scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Godel Prize, the top prize for a paper in theoretical computer science.

The interaction in the second case study was initiated by theoreticians, who wanted to lay the foundations for “data exchange”, in which data is converted from one format to another. Although this problem may sound mundane, the issues that arise are fascinating, and this work made data exchange a new subfield, with special sessions in every major database conference.

This talk will be completely self-contained, and the speaker will derive morals from the case studies. The talk is aimed at both theoreticians and practitioners, to show them the mutual benefits of working together.

BIOGRAPHY: Ronald Fagin is an IBM Fellow at IBM Research — Almaden. IBM Fellow is IBM’s highest technical honor. There are currently 86 active IBM Fellows (out of 430,000 IBM employees worldwide), and there have been only 257 IBM Fellows in the 51-year history of the program. Ron received his B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics from the University of California at Berkeley. He is a Fellow of IEEE, ACM, and AAAS (American Association for the Advancement of Science). He has co-authored three papers that won Best Paper Awards and three papers that won Test-of-time Awards, all in major conferences. One of his papers won the 2014 Godel Prize. He was named Docteur Honoris Causa by the University of Paris. He won the IEEE Technical Achievement Award, IEEE W. Wallace McDowell Award, and ACM SIGMOD Edgar F. Codd Innovations Award (a lifetime achievement award in databases). He is a member of the US National Academy of Engineering and the American Academy of Arts and Sciences.
Progress Toward an Engineering Discipline of Software
Mary Shaw
Carnegie Mellon University
Wednesday, March 11
ABSTRACT: Is “software engineering” really engineering? The term was coined in 1968 to call attention to problems with software production. Both theory and practice for software have evolved since then, but do we yet have a true engineering discipline? Classical engineering disciplines have emerged from craft practice and commercialization through the infusion of codified knowledge and science. Using this emergence pattern as a point of reference, I will sketch the evolution of software engineering, drawing on civil engineering and software architecture for examples that show the progressive codification of informal knowledge toward rigorous models and tools. This will provide the basis for assessing the maturity of the field and identifying our next challenges.
BIOGRAPHY: Mary Shaw is the Alan J. Perlis University Professor of Computer Science in the Institute for Software Research at Carnegie Mellon University. Her research interests lie in the area of software engineering, particularly software architecture and design of systems used by real people. She has received the United States’ National Medal of Technology and Innovation, the ACM SIGSOFT Outstanding Research Award (with David Garlan), the IEEE Computer Society TCSE’s Distinguished Educator Award, and CSEE&T’s Nancy Mead Award for Excellence in Software Engineering Education. She is a fellow of the ACM, the IEEE, and the AAAS.
Combining Statistics and Semantics using Probabilistic Soft Logic
Lise Getoor
University of California Santa Cruz
Tuesday, April 28
ABSTRACT: One of the challenges in big data analytics is to efficiently learn and reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. Collective reasoning requires the ability to exploit both the logical and relational structure in the data and the probabilistic dependencies. In this talk I will overview our recent work on probabilistic soft logic (PSL), a framework for collective, probabilistic reasoning in relational domains. PSL is able to reason holistically about both entity attributes and relationships among the entities. The underlying mathematical framework, which we refer to as a hinge-loss Markov random field, supports extremely efficient, exact inference. This family of graphical models captures logic-like dependencies with convex hinge-loss potentials. I will survey applications of PSL to diverse problems ranging from information extraction to computational social science. Our recent results show that by building on state-of-the-art optimization methods in a distributed implementation, we can solve large-scale problems with millions of random variables orders of magnitude faster than existing approaches.
BIOGRAPHY: Lise Getoor is a professor in the Computer Science Department at the University of California, Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She is a AAAI Fellow, serves on the Computing Research Association and International Machine Learning Society boards, was cochair of ICML 2011, is a recipient of an NSF Career Award and eight best paper and best student paper awards. She received her PhD from Stanford University, her MS from the University of California, Berkeley, and her BS from the University of California, Santa Barbara, and was a professor at the University of Maryland, College Park from 2001-2013.

CS SEMINAR

Fast Flipped Algorithms for Inference on Big Data
Sasha Aravkin
IBM Research
Thursday, March 12

ABSTRACT: Inference for large-scale problems in machine learning and big data often requires imposition of additional structural assumptions via regularization. The resulting optimization problems are challenging both because of the scale of the data and the nature of regularization.

We present a general framework and efficient algorithms for denoising problems (where a functional is minimized subject to a constraint on fitting the observed data). This framework captures a wide range of structured inference problems, including sparse formulations, matrix completion, and robust principal component analysis.

We highlight the importance of efficient first-order solvers, and illustrate the practical impact of the approach with synthetic and real examples using Netflix datasets, geophysical data interpolation, and cloud removal applications.

BIOGRAPHY: Aleksandr Aravkin is a research staff member at the IBM T.J. Watson Research Center, and an adjunct professor with Computer Science and IEOR departments at Columbia University. He received his Ph.D. in optimization and M.S in statistics from the University of Washington in 2010. His current research interests include convex and variational analysis, and algorithm design with applications to data science, robust and sparse inference and learning, dynamic systems, PDE constrained optimization, log-linear models, and matrix decomposition and recovery problems.
Hardware and Software for Approximate Computing
Adrian Sampson
University of Washington
Monday, March 23
ABSTRACT: Correctness is a fundamental tenet in computer systems. In many cases, however, perfect answers are unnecessary or even impossible: small errors can be acceptable in applications such as vision, machine learning, speech recognition, search, graphics, and physical simulation. Approximate computing is a new research direction that improves efficiency by carefully relaxing correctness constraints. My research seeks to get the most out of approximate computing using collaboration between software and hardware. In hardware, I have proposed CPU designs, accelerators, and storage systems that can gracefully trade off quality for efficiency. In software, I have designed type systems, debugging tools, compilers, and a language construct called a probabilistic assertion for controlling approximation’s impact. Together, software and hardware for approximate computing can exploit applications’ latent resiliency to yield new performance and energy benefits.
BIOGRAPHY: Adrian Sampson is a Ph.D. candidate at the University of Washington, where his advisors are Luis Ceze and Dan Grossman. His research combines computer architecture, programming languages, and compilers. His work on approximate computing has been supported by fellowships from Facebook, Qualcomm, and Google.

FACULTY CANDIDATE SEMINARS

Preventing Information Leaks with Jeeves
Jean Yang
MIT CSAIL
Monday, February 23

ABSTRACT: Information leaks are more and more prevalent. This should be no surprise: not only are users sharing more information, but there is a growing amount of code that handles sensitive data. Unless we make it easier to create secure programs, information leaks will only get worse.

I present a language, Jeeves, where programs preserve privacy and security properties by construction. In Jeeves, the programmer specifies expressive information flow policies separately from other functionality and relies on the language runtime to automatically differentiate behavior. Jeeves is the first language to factor out information flow, preventing information leaks by reducing the opportunity for programmer error. To help programmers reason about Jeeves programs, I defined what it means for Jeeves to enforce programmer-specified policies and proved that Jeeves correctly does so.

To demonstrate the feasibility of this programming model in practice, I implemented Jeeves in Scala and Python and created Jacqueline, a Jeeves-based web framework that enforces policies end-to-end. I describe how I used Jacqueline to implement a course manager, a health record manager, and a conference management system that has been used for an academic workshop.

BIOGRAPHY: Jean Yang received an AB from Harvard University and is a PhD candidate in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL). Jean’s research interests are in programming language design and software verification. Jean is a recipient of the National Science Foundation Graduate Research Fellowship, the Facebook Fellowship, and the Levine Fellowship. Jean’s work on Verve, an operating system automatically verified end-to-end for type safety, won Best Paper Award at PLDI 2010.
Improving Resource Efficiency in Cloud Computing
Christina Delimitrou
Stanford University
Wednesday, February 25

ABSTRACT: Cloud computing promises flexibility, high performance, and low cost. However, despite its prevalence, most datacenters hosting cloud services still operate at very low utilization, posing serious scalability concerns.

The goal of my work is to improve the scalability of these systems, by increasing their utilization, while guaranteeing high performance for each submitted application. A crucial system component to achieve this goal is the cluster manager; the system that orchestrates where applications are placed and how many resources they receive. In this talk, I will describe a new approach in cluster management that relies on two main insights. First, it automates resource management by leveraging practical data mining techniques. Second, it provides the user with a high-level, declarative interface that centers around performance, not resource reservations. Using these insights, I designed and built a datacenter scheduler (Paragon), a cluster manager (Quasar), and scalable provisioning systems for public clouds. In settings with several hundred servers, I demonstrated that this approach achieves high application performance and improves system utilization. Several production systems, including Twitter and AT&T, have since adopted similar cluster management approaches.

BIOGRAPHY: Christina Delimitrou is a PhD candidate in the Electrical Engineering Department at Stanford University, working in computer architecture and systems. As part of her PhD work, she built practical systems for cluster management and scheduling in large-scale datacenters. She is the recipient of a Facebook Research Fellowship and a Stanford Graduate Fellowship. Christina has earned an MS from Stanford and a diploma in Electrical and Computer Engineering from the National Technical University of Athens.
Algorithmic design via efficient data representations
Alexandr Andoni
UC Berkeley
Monday, March 2

ABSTRACT: Modern datasets are at scales that would be unthinkable just a decade ago. This demands rethinking the principles of algorithmic design used to handle such datasets. In this talk, I will describe how such principles emerge from the methods of efficient data representations.

The first illustration will be the Nearest Neighbor Search (NNS) problem — an ubiquitous massive datasets problem that is of key importance in machine learning and other areas. A central technique for tackling this problem is Locality Sensitive Hashing (LSH), a data representation method that has seen a lot of success in both theory and practice. I will present the best possible LSH-based algorithm for NNS under the Euclidean distance. Then, I will show a new method that, for the first time, provably outperforms the LSH-based algorithms.

Taking a broader perspective, I will also describe how the lens of “efficient data representation” leads to new fast algorithms for other longstanding questions. Specifically, I will discuss a classic dynamic programming problem (edit distance) and a Linear Programming problem (earth-mover distance). Finally, I will briefly mention how the above ideas enable an algorithmic framework for parallel computation systems such as MapReduce.

BIOGRAPHY: Alexandr Andoni is a computer scientist focused on advancing algorithmic foundations of massive data. His research interests broadly revolve around sublinear algorithms, high-dimensional geometry, and theoretical machine learning. Alexandr graduated from MIT in 2009, with a PhD thesis on Nearest Neighbor Search, under the supervision of Piotr Indyk. During 2009–2010, he was a postdoc at the Center for Computational Intractability at Princeton, as well as a visitor at NYU and IAS. Alexandr then joined Microsoft Research Silicon Valley, where he was a researcher until 2014. Currently, Alexandr is a visiting scientist at the Simons Institute for the Theory of Computing at UC Berkeley.
When Logic Meets Physics: Compositional Design of Cyber-Physical Systems Using Contracts
Pierluigi Nuzzo
University of California at Berkeley
Wednesday, March 4

ABSTRACT: In cyber-physical systems (CPS), computation, communication and control are tightly combined with physical processes of different nature. Without a comprehensive modeling formalism and a rigorous, unifying design framework, CPS integration remains ad hoc. In this talk, we introduce a design methodology that addresses the complexity and heterogeneity of cyber-physical systems by using assume-guarantee contracts to formalize the design process and enable the realization of system architectures and control algorithms in a hierarchical and compositional way.

In our methodology, the design is carried out as a sequence of refinement steps from a high-level specification to an implementation built out of a library of components at the lower level. To enable formal requirement analysis and early detection of inconsistencies, we represent the top-level system requirements as contracts, by leveraging a front-end pattern-based specification language and a set of back-end languages, including mixed integer-linear constraints and temporal logic. We show how key analysis tasks, such as refinement checking, can indeed be made more efficient when a system is described based on a precharacterized library of components and contracts. We then illustrate how top-level contracts can be refined to achieve independent implementation of system architecture and control algorithm, by combining synthesis from requirements, optimization and simulation-based design space exploration methods. At the heart of our architecture design framework, we propose two optimization-based algorithms to tackle the complexity of exact reliability computation, and enable scalable co-design of large, safety-critical systems for cost and fault tolerance.

We demonstrate, for the first time, the effectiveness of a contract-based approach on a real-life example of industrial relevance, namely the design of aircraft electric power distribution systems (EPS). In our framework, optimal selection of large, industrial-scale EPS architectures can be performed in a few minutes, while design validation of EPS controllers based on linear temporal logic contracts shows up to two orders of magnitude improvement in terms of execution time with respect to conventional techniques.

BIOGRAPHY: Pierluigi Nuzzo is a Ph.D. candidate in Electrical Engineering and Computer Sciences at University of California at Berkeley. He received the Laurea degree in Electrical Engineering (summa cum laude) from the University of Pisa and the Sant’Anna School of Advanced Studies, Pisa, Italy. His research interests include: methodologies and tools for cyber-physical system and mixed-signal system design; contracts, interfaces and compositional methods for embedded system design; energy-efficient analog and mixed-signal circuit design. Before joining U.C. Berkeley, he held research positions at the University of Pisa and IMEC, Leuven, Belgium, working on the design of energy-efficient A/D converters, frequency synthesizers for reconfigurable radio, and design methodologies for mixed-signal integrated circuits. Pierluigi received First Place in the operational category and Best Overall Submission in the 2006 DAC/ISSCC Design Competition, a Marie Curie Fellowship from the European Union in 2006, the University of California at Berkeley EECS departmental fellowship in 2008, the U.C. Berkeley Outstanding Graduate Student Instructor Award in 2013, and the IBM Ph.D. Fellowship in 2012 and 2014.
Performance and Reliability in Modern Storage Systems
Vijay Chidambaram
University of Wisconsin-Madison
Monday, March 9

ABSTRACT: Storage services form the platform on which widely-used cloud services, mobile applications, data analytics engines, and transactional databases are built. Such services are trusted with irreplaceable personal and commercial information by users, companies, and even governments.

The designers of storage services often have to choose between performance and reliability. If the developer makes the system reliable, performance is often significantly reduced. If the developer instead maximizes performance, a crash could lead to data loss and corruption.

In this talk, I describe how to build systems that achieve both strong reliability and high performance. In many systems, reliability is maintained by carefully ordering updates to storage. The key insight is that the low-level mechanism used to enforce ordering is overloaded: it provides durability as well as ordering. I introduce a new primitive, osync(), that decouples ordering from durability of writes. I present Optimistic Crash Consistency, a new crash-recovery protocol that builds on osync() to provide strong reliability guarantees and high performance. I implement these techniques in the Optimistic File System (OptFS) and show that it provides 10X increased performance for some workloads. With researchers in Microsoft, I employ the principles of Optimistic Crash Consistency in a distributed storage system, resulting in 2-5X performance improvements.

BIOGRAPHY: Vijay Chidambaram is a Ph.D candidate in the Department of Computer Sciences at the University of Wisconsin-Madison. His current research focus is to ensure the reliability of applications in the rapidly changing landscape of storage and cloud computing. Specifically, he has contributed new reliability techniques in (local and distributed) storage systems, and built frameworks for finding reliability bugs in applications. His work has resulted in patent applications by Samsung and Microsoft. He was awarded the Microsoft Research Fellowship in 2014, and the University of Wisconsin-Madison Alumni Scholarship in 2009.
Teaching Computer Science through Research and Real-World Applications
Paul Blaer
Columbia University
Tuesday, March 24
ABSTRACT: Computer Science instruction is most effective when interesting applications are presented together with the basic course material. Using connections to real-world systems and to research is critical for keeping students engaged and excited. In this talk, I present a variety of problems which have been used in my teaching of undergraduate classes. These problems allow students to create versions of existing real-world systems, to address particular parts of research questions, and to work with actual data obtained in the field. I also use examples from my doctoral and subsequent research to design projects for students at the introductory, intermediate, and advanced levels.
BIOGRAPHY: Paul Blaer is Director of Computing Research Facilities and Adjunct Assistant Professor in Columbia University’s Department of Computer Science. He received his Ph.D. in Computer Science at Columbia University in 2008. His research interests include mobile robotics, 3-D vision, and computer science education.
Anticensorship in the Network
Eric Wustrow
University of Michigan
Wednesday, March 25

ABSTRACT: Many countries restrict the information their citizens can access on the Internet by using a sophisticated censorship infrastructure to block websites, keywords, or communication that the government deems inappropriate. Although proxies can occasionally be used to circumvent such blocks, these proxies can themselves be discovered and blocked by the censor, resulting in an arms race between censors and circumventors.

In this talk, I will describe a fundamentally new approach to anticensorship that leverages the unique network perspective of ISPs to create proxies that make censoring an all-or-nothing decision. These proxies are designed to resist a censor’s attempt to detect or prevent usage, even when all information for using the proxy is made public. Blocking access to such a system involves additionally blocking economically or politically desirable sites unrelated to the censorship system, a cost that most censors are unwilling to pay. I will also describe the challenges to deploying this type of system in practice, as well as ongoing work to overcome these obstacles. Ultimately, these efforts enable a state-level response to state-sponsored censorship, giving sympathetic governments the tools needed to encourage Internet freedom.

Deep Learning for Decision Making and Control
Sergey Levine
UC Berkeley
Monday, March 30
ABSTRACT: A remarkable feature of human and animal intelligence is the ability to autonomously acquire new behaviors. My work is concerned with designing algorithms that aim to bring this ability to robots and simulated characters. A central challenge in this field is to learn behaviors with representations that are sufficiently general and expressive to handle the wide range of motion skills that are necessary for real-world applications, such as general-purpose household robots. These representations must also be able to operate on raw, high-dimensional inputs and outputs, such as camera images, joint torques, and muscle activations. I will describe a class of guided policy search algorithms that tackle this challenge by transforming the task of learning control policies into a supervised learning problem, with supervision provided by simple, efficient trajectory-centric methods. I will show how this approach can be applied to a wide range of tasks, from locomotion and push recovery to robotic manipulation. I will also present new results on using deep convolutional neural networks to directly learn policies that combine visual perception and control, learning the entire mapping from rich visual stimuli to motor torques on a real robot. I will conclude by discussing future directions in deep sensorimotor learning and how advances in this emerging field can be applied to a range of other areas.
BIOGRAPHY: Sergey Levine is a postdoctoral researcher working with Professor Pieter Abbeel at UC Berkeley. He completed his PhD in 2014 with Vladlen Koltun at Stanford University. His research focuses on robotics, machine learning, and computer graphics. In his PhD thesis, he developed a novel guided policy search algorithm for learning rich, expressive locomotion policies. In later work, this method enabled learning a range of robotic manipulation tasks, as well as end-to-end training of policies for perception and control. He has also developed algorithms for learning from demonstration, inverse reinforcement learning, and data-driven character animation.
Beyond Worst Case Analysis of Graph Partitioning Algorithms
Konstantin Makarychev
Microsoft Research
Monday, April 6
ABSTRACT: Many combinatorial optimization problems are much simpler in practice than in the worst-case. One of the challenges in the area of approximation algorithms is to explain this phenomena and to design algorithms that work well in real-life. In this talk, we will first discuss different ways of modelling “real-life” instances. Then, I will present a new semi-random semi-adversarial model for graph partitioning problems, a planted model with permutation-invariant random edges (PIE). This model is much more general than stochastic models considered previously. Finally, I will describe a “constant factor” approximation algorithm for finding the minimum balanced cut in graphs from this model.
BIOGRAPHY: Konstantin Makarychev is a researcher at Microsoft Research, focusing on algorithms and theoretical computer science. He received his PhD from Princeton University in 2007, advised by Moses Charikar.
Promise of Data Science to Advance Medicine & Education
Ansaf Salleb-Aouissi
Columbia University
Tuesday, April 7

ABSTRACT: Data Science is establishing itself as the groundbreaking field of the 21st century. A field in which BIG data plays the central role in re-shaping almost each and every aspect of life. A field that leverages the deluge of data being collected everywhere and that have the potential to be used for great benefit to society. Today, many problems that were so far considered insolvable can now be revisited with a data science perspective. At the core of the fields that will tremendously benefit from Data science are education and medicine.

Inspired by my ongoing “Introduction to Data Science” course, I will first give an overview of Data Science, success stories and process. I will then talk about three data science problems related to medicine and education, drawn from my research. I will share ongoing efforts in studying the topics of (1) understanding infant colic, (2) predicting preterm birth, and (3) optimizing online self-learning. I will also stress the interdisciplinary nature of Data Science and the importance of Computer Science as one of the main contributing disciplines. I will finally discuss integrating Data Science into the curriculum to train a data-aware next generation of computer scientists.

BIOGRAPHY: Ansaf Salleb-Aouissi joined Columbia University’s Center for Computational Learning Systems as an Associate Research Scientist in 2006 after a Postdoctoral Fellowship at INRIA Rennes (France). She is also an adjunct assistant professor at the Columbia University Computer Science department. Her research interests lie in machine learning and data science. She has done research on frequent patterns mining, rule learning, ranking, and action recommendation and has worked on data science projects including machine learning for the power grid. Her current research interest includes crowd sourcing, medical informatics and educational data mining. Ansaf has published peer-reviewed papers in top quality journals, conferences and books including JMLR, TPAMI, ECML, PKDD, COLT, IJCAI, ECAI and AISTAT. She has recently received a National Science Foundation award to study preterm birth and a multi-phase grant from Pearson Education to advance research on online self-learning.
Security and Privacy for Robots and Brains
Tamara Bonaci
University of Washington
Wednesday, April 8

ABSTRACT: My research interests focus on security and privacy of cutting edge cyberphysical technologies. For such systems, my goals are to identify the possible security and privacy threats that can arise, evaluate their impact, and develop tools to prevent the identified attacks. More specifically, I focus on the question of how cyber-physical systems leverage human idiosyncrasies, and how that opens systems to exploitations and attacks. In doing so, I draw on systems and control theory, information theory, machine learning and experimental studies.

Two of the most interesting systems I have been working with are teleoperated robots and brain-computer interfaces (BCIs). While these systems appear to be different at first glance, in this talk I will discuss how in the security and privacy context they are duals of each other. The motivation for my work with teleoperated robots comes from the observation that everyone controls robots in a unique way. My collaborators and I introduce the Fitts’ law as a novel way of quantifying the impact of cyber security attacks, and we show that cyber attacks do not impact all components of a telerobotic procedure equally. Typically, there is a difference in impact between a “free movement” component and a “fine motor (homing)” component of a teleoperated procedure.

I will then show to what extent it is possible to misuse brain-computer interfaces to extract private information about users, such as their preferences, PINs and passwords. Our research focuses on identifying what people recognize, and one of our main contributions is the demonstration that, through the use of visual stimuli, people can recognize sensitive things in their life without even realizing it. This leads to privacy risks when such stimuli are a part of a legitimate BCI application. Finally, I will present our proposed approaches to mitigate the identified attacks. We observe that two types of approaches are needed. The first approach is technical in nature, such as our proposed BCI Anonymizer and operator signature validation system. The second approach, however, is policy-based, and it involves working closely with other fields, including law, policy, and ethics.

BIOGRAPHY: Tamara Bonaci is a PhD candidate at the University of Washington in the, Department of Electrical Engineering. She is a member of the UW BioRobotics and the UW Tech Policy Labs. She received her B.Sc. in Electrical Engineering from the University of Zagreb, Croatia, and her M.Sc. from the University of Washington. Tamara’s research lies at the intersection of security and privacy, neural engineering, robotics, and control/dynamical systems theory. Her current research focuses on security and privacy issues of brain-computer interfaces and teleoperated robots. Tamara’s research and outreach activities have been recognized by the UW Society for Women Engineers Outstanding Female Award. Tamara has served as a Student Leadership Council Officer for the NSF Engineering Research Center for Sensorimotor Neural Engineering, and as a co-chair of the UW Electrical Engineering Graduate Students Association.
Building an Operating System for the Data Center
Simon Peter
University of Washington
Thursday, April 9

ABSTRACT: Data centers run a range of important applications with ever increasing performance demands, from cloud and server computing to Big Data and eScience. However, the scaling of CPU frequency has stalled in recent years, leading to hardware architectures that no longer transparently scale software performance. Two trends stand out: 1) Instead of frequency, hardware architectures increase the number of CPU cores, leaving complex memory system performance and CPU scheduling tradeoffs exposed to software. 2) Network and storage I/O performance continues to improve, but without matching improvements in CPU frequency. Software thus faces ever increasing I/O efficiency demands.

In my research, I address how operating systems (OSes) can handle these growing complexity and performance pressures to avoid becoming the limiting factor to performance. I first explain how current OS architecture is already inadequate to address these trends, limiting application performance. I then present how the OS can be redesigned to eliminate the performance limitations without compromising on existing features or application compatibility. I finish with an outlook on how these hardware trends affect software in the future and present ideas to address them.

BIOGRAPHY: Simon is a postdoctoral research associate at the University of Washington, where he leads research in operating systems and networks. His postdoctoral advisors are Tom Anderson and Arvind Krishnamurthy. Simon received a Ph.D. in Computer Science from ETH Zurich in 2012 and an MSc in Computer Science from the Carl-von-Ossietzky University Oldenburg, Germany in 2006.

Simon’s research focus is on data center performance issues. For his work on the Arrakis high I/O performance operating system, he received the Jay Lepreau best paper award (2014) and the Madrona prize (2014). Previously, Simon has worked on the Barrelfish multicore operating system and conducted further award-winning systems research at various locations, including Microsoft Research Silicon Valley and Cambridge, Intel Labs Germany, UC Riverside, and the NSF.

Quantum Computing and its Applications
Anargyros Papageorgiou
Columbia University
Monday, April 13
ABSTRACT: Quantum computers potentially have a significant advantage over classical computers for a number of important problems. Shor’s algorithm for integer factorization and Grover’s search algorithms are two early examples demonstrating the potential benefits. We give an overview of quantum computing, briefly introducing the model of computation and its properties. We provide insights by discussing a number of its applications.
BIOGRAPHY: Anargyros Papageorgiou is a research scientist in the Department of Computer Science, Columbia University. His research work deals with the complexity of multivariate problems, applications in computational finance and quantum chemistry, and quantum computing. He received the 2008 Information-Based Complexity Prize, and is an Associate Editor of the Journal of Complexity. In addition he has been teaching undergraduate and graduate courses in this department.
Learning Systems: Systems and Abstractions for Large-Scale Machine Learning
Joseph Gonzalez
UC Berkeley
Monday, April 20

ABSTRACT: The challenges of advanced analytics and big data cannot be address by developing new machine learning algorithms or new large-scale computing systems in isolation. Some of the recent advances in machine learning have come from new systems that can apply complex models to big data problems. Likewise, some of the recent advances in systems have exploited fundamental properties in machine learning to reach new points in the system design space. By considering the design of scalable learning systems from both perspectives, we can address bigger problems, expose new opportunities in algorithm and system design, and define the new fundamental abstractions that will accelerate research in these complementary fields.

In this talk, I will present my research in learning systems spanning the design of efficient inference algorithms, the development of graph processing systems, and the unification of graphs and unstructured data. I will describe how the study of graphical model inference and power-law graph structure shaped the common abstractions in contemporary graph processing systems, and how new insights in system design enabled order-of-magnitude performance gains over general purpose data-processing systems. I will then discuss how lessons learned in the context of specialized graph-processing systems can be lifted to more general data-processing systems enabling users to view data as graph and tables interchangeably while preserving the performance gains of specialized systems. Finally, I will present a new direction for the design of learning systems that looks beyond traditional analytics and model fitting to the entire machine learning life-cycle spanning model training, serving, and management.

BIOGRAPHY: Joseph Gonzalez is a postdoc in the UC Berkeley AMPLab and cofounder of GraphLab. Joseph received his PhD from the Machine Learning Department at Carnegie Mellon University where he worked with Carlos Guestrin on parallel algorithms and abstractions for scalable probabilistic machine learning. Joseph is a recipient of the AT&T Labs Graduate Fellowship and the NSF Graduate Research Fellowship.
Rise of the Planet of the Apps: Security and Privacy in the Age of Bad Code
Suman Jana
Stanford University
Wednesday, April 22

ABSTRACT: Computing is undergoing a major shift. Third-party applications hosted in online software markets have become ubiquitous on all kinds of platforms: mobile phones, Web browsers, gaming devices, even household robots. These applications often include yet more third-party code for advertising, analytics, etc. These trends have dramatically increased the amount of bad code throughout the software stack – buggy code, malicious code, code that overcollects private information intentionally or by accident, overprivileged code vulnerable to abuse – as well as the amount of sensitive data processed by bad code.

In this talk, I will demonstrate that existing application platforms are ill-suited to dealing with bad code, thus causing security and privacy problems. I will then show how to isolate bad code without affecting its useful functionality, by redesigning the interfaces across the software stack and controlling the information released to the applications by the platform. I will also show how automated testing can identify bad code and help developers improve their applications.

BIOGRAPHY: Suman Jana is a postdoctoral researcher at Stanford University. He earned his PhD in 2014 from the University of Texas, where he was supported by the Google PhD Fellowship. He is broadly interested in identifying fundamental flaws in existing systems and building new systems with strong security and privacy guarantees. Suman received the 2014 PET Award for Outstanding Research in Privacy-Enhancing Technologies, Best Practical Paper Award from the 2014 IEEE Symposium on Security and Privacy (Oakland), Best Student Paper Award from the 2012 IEEE Symposium on Security and Privacy, and the 2012 Best Applied Security Paper Award. Suman’s research has been widely covered in popular media, and his code has been deployed at Google, Mozilla, and Apache.