Spring 2008

DISTINGUISHED LECTURE SERIES

Recent Evolutionary Genomics of the Human Genome
David Haussler
UC Santa Cruz
Tuesday, May 6th, 2008
ABSTRACT: With our ability to sequence entire genomes, we have for the first time the opportunity to compare the genomes of present day species, and deduce the trajectories by which they diversified from a common ancestral genome. Starting with a small shrew-like ancestor in the Cretaceous period approximately 100 million years ago, the different species of placental mammals radiated outward, creating a stunning diversity of forms from whales to armadillos to humans. From the genomes of present-day species, it is possible to computationally reconstruct what most of the DNA bases in the genome of the common ancestor of placental mammals must have looked like. We can then deduce most of the changes that lead to humans. In so doing, we discover how Darwinian evolution has shaped us at the molecular level.

Because most random mutations to functionally important regions of DNA reduce fitness, these changes usually disappear over time in a process known as negative selection. From its unusually high conservation between species, it is immediately evident that at least 5% of the human genome has been under negative selection during most of mammalian evolution, and is hence likely to be functionally important. Protein-coding genes and structural RNA genes stand out among the negatively selected regions because of their distinctive pattern of restricted DNA base substitutions, insertions and deletions. However, most of the DNA under negative selection in mammalian genomes, and indeed vertebrate genomes in general, does not appear to be part of protein-coding genes, and shares no sequence similarity with any DNA in the genomes of invertebrates. Experimental evidence suggests that many of these unclassified functional elements serve to regulate genes involved in embryonic development.

Overlaid on the background of negative selection, we occasionally see a short segment of widely conserved DNA that has rapidly changed in a particular lineage, suggesting possible positive selection for a modified function in that lineage. The most dramatic example of this in the last 5 million years of human evolution occurs in a previously unstudied RNA gene expressed in the developing cerebral cortex, known as Human Accelerated Region 1 (HAR1). This gene is turned on only in a select set of neurons, during the time in fetal development when these neurons orchestrate the formation of the substantially larger cortex of the human brain. It will be many years before the biology of such examples is fully understood, but right now we relish the opportunity to get a first peek at the molecular tinkering that transformed our animal ancestors into humans.

BIOGRAPHY: David Haussler’s research lies at the interface of mathematics, computer science, and molecular biology. He develops new statistical and algorithmic methods to explore the molecular evolution of the human genome, integrating cross-species comparative and high-throughput genomics data to study gene structure, function, and regulation.

Haussler is credited with pioneering the use of mathematical computer models-hidden Markov models (HMMs), stochastic context-free grammars, and discriminative kernel methods-for analyzing DNA, RNA, and protein sequences. As a collaborator on the international Human Genome Project, his team posted the first publicly available computational assembly of the human genome sequence on the internet. Most recently, Haussler has focused on broadly exploring the functional elements of the human genome, primarily through interspecies comparisons. His findings have shed light on the possible functionality of what was once considered to be “junk” DNA, and his lab has identified and explored the function of genomic elements that have remained conserved for millions of years and then undergone rapid evolution in newer species.

Haussler received his PhD in computer science from the University of Colorado at Boulder. He is a member of the National Academy of Sciences and the American Academy of Arts and Sciences and a fellow of AAAS and AAAI. He has won a number of awards, including the 2006 Dickson Prize for Science from Carnegie Mellon University and the 2003 ACM/ AAAI Allen Newell Award.

 

Recent directions in nonparametric Bayesian machine learning
Zoubin Ghahramani
University of Cambridge
Wednesday, March 26, 2008
ABSTRACT: Machine learning is an interdisciplinary field which seeks to develop both the mathematical foundations and practical applications of systems that learn, reason and act. Machine learning draws from many fields, ranging from Computer Science, to Engineering, Psychology, Neuroscience, and Statistics. Because uncertainty, data, and inference play a fundamental role in the design of systems that learn, statistical methods have recently emerged as one of the key components of the field of machine learning.

In particular, Bayesian methods, based on the work of Reverend Thomas Bayes in the 1700s, describe how probabilities can be used to represent the degrees of belief of a rational agent. Bayesian methods work best when they are applied to models that are flexible enough to capture to complexity of real-world data. Recent work on non-parametric Bayesian methods provides this flexibility. I will touch upon key developments in the field, including Gaussian processes, Dirichlet processes, and the Indian buffet process (IBP). Focusing on the IBP, I will describe how this can be used in a number of applications such as collaborative filtering, bioinformatics, cognitive modelling, independent components analysis, and causal discovery. Finally, I will outline the main challenges in the field: how to develop new models, new fast inference algorithms, and compelling applications.

BIOGRAPHY: Zoubin Ghahramani is Professor of Information Engineering at the University of Cambridge, UK, and is also Associate Research Professor of Machine Learning at Carnegie Mellon University, USA. He obtained BA and BSE degrees from University of Pennsylvania, and a PhD in 1995 from MIT working with Prof Mike Jordan. He did a postdoc in Computer Science at University of Toronto working with Prof Geoff Hinton. His work has included research on human sensorimotor control, cognitive science, statistics, and machine learning. His current focus is on Bayesian approaches to statistical machine learning, and has recently started a research programme on machine learning applications to information retrieval. He has published over 100 peer reviewed papers, and serves on the editorial boards of several leading journals in the field, including JMLR, JAIR, Annals of Statistics, Machine Learning, and Bayesian Analysis. He is Associate Editor in Chief of IEEE Transactions on Pattern Analysis and Machine Intelligence, currently the IEEE’s highest impact journal. He also serves on the Board of the International Machine Learning Society, and was Program Chair of the 2007 International Machine Learning Conference.

 

Wireless Sensing: From Ecosystems to Human Systems
Deborah Estrin
University of California, Los Angeles
Monday, February 18th, 2008
ABSTRACT: Miniaturization and Moore’s Law have enabled us to combine sensing, computation and wireless communication in integrated, low-power devices, and to embed networks of these devices in the physical world. By placing sensing devices up close to the physical phenomena, we are now able to study details in space and time that were previously unobservable. Looking back over the past few years, we have made significant progress toward the vision of programmable, multi-modal, multi-scale observatories. We have made our greatest strides in these applications using: judicious application of server-side and in situ processing, mobility at multiple scales, and multi-scale data and models as context for in situ measurements. We are now applying these lessons learned and technical approaches to human as well as natural systems, in particular by exploring use of the installed base of image, location, and acoustic sensors that we all carry around in our pockets or on our belts—mobile phones. In this talk I will draw upon experiences with pilots and prototypes at CENS.
BIOGRAPHY: Deborah Estrin (Ph.D. MIT, 1985; BSEE UCB, 1980) is a Professor of Computer Science at UCLA, holds the Jon Postel Chair in Computer Networks, and is Founding Director of the NSF-funded Center for Embedded Networked Sensing (CENS). CENS’ mission is to explore and develop innovative, end-to-end, distributed sensing systems, from ecosystems to human systems. Estrin’s earlier work addressed Internet protocol design and scaling, in particular, inter-domain and multicast routing. Since the late 90’s her work has focused on multi-disciplinary, experimental-systems research as applied to a variety of environmental monitoring challenges. Most recently this work includes participatory-sensing systems, at the personal and community level, leveraging the location, acoustic, image, and attached-sensor data streams increasingly available from mobile phones.

Estrin chaired the 1998 ISAT study on sensor networks and the 2001 NRC study on Networked Embedded Computing which produced the report Embedded Everywhere. She served as a founding member of the National Ecological Observatory Network (NEON) Advisory board, and is currently a member of the NRC Computer Science and Telecommunications Board (CSTB), and TTI/Vanguard. Estrin was selected as the first ACM-W Athena Lecturer in 2006, was awarded the Anita Borg Institute’s Women of Vision Award for Innovation in 2007, and was elected as a member of the American Academy of Arts and Sciences in 2007.

 

FACULTY CANDIDATE SEMINARS

Stabilizing Internet Routing: or, A Story of Heterogeneity
Brighten Godfrey
UC Berkeley
Monday, April 7, 2008
ABSTRACT: A significant cause of the unreliability of end-to-end communications on the Internet is route instability: dynamic changes in routers’ selected paths. Instability is becoming even more problematic due to the increasing prevalence of real-time applications and concerns about the scalability of the Internet routing architecture. Yet Route Flap Damping, the main mechanism for combating instability, has introduced unexpected pathologies and reduced availability.

This talk describes a more principled approach to stabilizing Internet routing. We identify general approaches to achieve stability, and quantify their inherent tradeoffs with other objectives via upper and lower bounds. I will describe Stable Route Selection (StaRS), a new approach which uses flexibility in route selection to improve stability without sacrificing availability. Simulation and experimental results show that StaRS improves stability and end-to-end reliability while deviating only slightly from preferred routes, and closely approaching our theoretical lower bound. These results indicate that StaRS is a promising, easily deployable way to safely stabilize Internet routing.

StaRS’s stability improvements are enabled by dramatic heterogeneity in route failure patterns. I will present the case that StaRS is an instance of a much more general principle: that heterogeneity — variation in reliability, processing speed, bandwidth, or other metrics — should quite often be viewed as an advantage. This thesis is supported by practical and theoretical results in a variety of settings including distributed hash tables, overlay multicast, and job scheduling.

BIOGRAPHY: Brighten Godfrey’s research concerns distributed and networked systems, including Internet routing architecture, distributed algorithms, analysis of networks, peer-to-peer systems and overlay networks. He is presently a Ph.D. candidate advised by Ion Stoica at UC Berkeley.

 

Fighting concurrency bugs
Shan Lu
University of Illinois, Urbana-Champaign
Wednesday, April 2, 2008
ABSTRACT: Driven by the hardware shift to multi-core architectures, concurrency is being brought into mainstream software development. Unfortunately, concurrent programs are prone to concurrency bugs, because of the inherent complexity of concurrency and the sequential thinking habits of programmers. Concurrency bugs’ non- deterministic property also brings a lot of trouble to developers. Improving the reliability of concurrent programs is a critical and urgent task.

This talk presents recent work on understanding, detecting, and exposing concurrency bugs. I will first present two techniques that detect concurrency bugs from the angle of programmers’ synchronization intention. The first one is AVIO (technology transfer to Intel). AVIO automatically infers programmers’ atomicity intentions from correct execution. It detects violations to the inferred intentions and reports concurrency bugs. The second technique is MUVI. MUVI automatically infers variable correlation relationship from the source code and detects unsynchronized concurrent accesses to correlated variables. During this talk, I will also briefly discuss some findings from our characteristics study of real-world concurrency bugs. These findings have motivated the above bug detection work and inspired us to improve concurrent program testing.

BIOGRAPHY: Shan Lu is a Ph.D. candidate in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Her research areas are software systems and computer architecture. She is interested in the software reliability problem, with focus on concurrent software systems. She received the W.J. Poppelbaum Memorial Award from University of Illinois in 2007 for research and academic accomplishments. Her recent work on concurrency bug detection is under technology transfer to Intel and was selected into IEEE Micro Top Picks 2006.

 

Building secure systems from buggy code with information flow control
Nickolai Zeldovich
Stanford University
Monday, March 31, 2008
ABSTRACT: Today, computer security resembles an arms race: the bad guys constantly find new ways to break in, and being safe requires staying one step ahead of them in cutting off avenues of attack. This strategy is simply too risky and too expensive in the long run. In this talk, I will argue that we need to address security at a much more fundamental level, and I will show how re-designing operating systems, network protocols, and hardware can provide a solid foundation for building applications in a way that eliminates or radically reduces vulnerabilities.

Much of the challenge in building secure applications stems from the fact that real systems are constantly evolving, and that most programmers are not security-conscious, resulting in code rife with bugs that cause security vulnerabilities. Instead of trying to fix all code, this talk will argue that we should protect data, by controlling how it can move through the system. The key insight is that data protection cuts across layers: any piece of data in an application can also be viewed as memory or files by the OS, or as physical pages by the hardware. Consequently, even data in buggy applications can be protected by the OS or by hardware, despite the latter two being at a much lower level of abstraction.

In particular, I will first describe how a low-level information flow control mechanism can be provided by a small OS kernel, hardware, or network protocol, and then show how the same mechanism can be used throughout the system to enforce security policies ranging from those traditionally found in Unix to those that can ensure the privacy of user data in a web server built from largely untrusted code.

BIOGRAPHY: Nickolai Zeldovich is a postdoc at Stanford University, where he recently received his Ph.D. Previously he received M.Eng. and S.B. degrees from MIT. His research interests are in security, operating systems, and networking.

 

Reining in Fabrication Cost with Brick and Mortar Chips
Martha Mercaldi Kim
University of Washington
Monday, March 10, 2008
ABSTRACT: Over the years Moore’s Law has provided exponential growth in the raw computational resources available to hardware architects. At the same time, however, chip fabrication costs have also skyrocketed, resulting in expenses that relatively few institutions can afford. As part of my dissertation, I have proposed “brick and mortar” chips to mitigate these high fabrication costs while offering the performance and integration of a modern ASIC. This work includes several architectural design choices and innovations, including a polymorphic on-chip network design.

I will demonstrate multiple modes of network customization that this design allows, including topology and buffering resources. This single polymorphic network fabric can be configured to mimic the topology and performance of optimally designed application-specific networks with no appreciable overhead. This network is not only a critical enabler of brick and mortar-based designs, but has broad applicability to any chip requiring an on-chip network. When used with brick and mortar, however, low-cost, high-performance custom chips can become a reality.

BIOGRAPHY: Martha Mercaldi Kim is a Ph.D. candidate in Computer Science and Engineering at the University of Washington. Prior to moving to Seattle she earned her B.A. in Computer Science at Harvard University and a M. Eng. in Embedded Systems Design from the University of Lugano in Lugano, Switzerland. Her research interests are in computer architecture, particularly its boundaries: in the hardware/ software interaction, as well as the architecture/circuit interaction. As a graduate student she was a member of the WaveScalar research group and developed an architecture to enable reduced-cost chip designs.

 

Mace: Systems and Language Support for Building Correct, High-Performance Networked Services
Charles (Chip) Killian
UC San Diego
Wednesday, March 5, 2008
ABSTRACT: Building distributed systems is particularly difficult because of the asynchronous, heterogeneous, and failure-prone environment where these systems must run. This asynchrony makes verifying the correctness of systems implementations even more challenging. Tools for building distributed systems must strike a compromise between reducing programmer effort and increasing system efficiency. Mace is a C++ language extension, compiler, runtime, and toolset, that translates a concise but expressive distributed system specification into a C++ implementation. Mace exploits a natural decomposition of distributed systems into a layered, event-driven state machine.

A key design principle of Mace is to separate each service algorithm from the implementation mechanics (serialization, dispatch, synchronization, etc.), debugging code (logging and property testing), and its utility services (lower-level services providing a specified interface). Our experience indicates that precisely because Mace imposes limits on the design structure of distributed systems, it supports the implementation of a wide variety of high-level supporting tools, including model checking, simulation, live debugging, and visualization.

Mace is fully operational, has been in development for four years, and has been used to build a wide variety of Internet-ready distributed systems. This talk will describe both the Mace programming language design and MaceMC, the first model checker that can find liveness violations in unmodified systems implementations.

 

New Primitives and Metrics for Distributed Systems
Byung-Gon Chun
University of California, Berkeley
Monday, March 3, 2008
ABSTRACT: With the advent of data centers and “cloud computing”, distributed systems are becoming much larger and far more sophisticated, with computation spread over thousands of hosts and complex execution paths. In this talk I will discuss new approaches to securing and understanding these complex systems.

I will first describe how we can build more robust systems using a new trusted primitive called Attested Append-Only Memory (A2M). We trade off assumptions on trusted components for improved Byzantine fault bounds of safety and liveness. I will then present a way of characterizing the complexity of general networked systems. I will describe a metric based on distributed state dependencies, and apply it to routing and classical distributed systems.

BIOGRAPHY: Byung-Gon Chun is a postdoctoral researcher at the International Computer Science Institute, funded by Intel Corporation. He received his Ph.D. in Computer Science in 2007 from the University of California at Berkeley. His research interests span distributed systems and networks with emphasis on fault tolerance, security, complexity, and system troubleshooting.

 

Intuitive Global Connectivity for Personal Mobile Devices
Bryan Ford
MIT
Monday, February 25th, 2008
ABSTRACT: Network-enabled mobile devices are quickly becoming ubiquitous in the lives of ordinary people, but current technologies for providing ubiquitous global connectivity between these devices still require experts to set up and manage. Users must allocate and maintain global domain names in order to connect to their devices globally via DNS, they must allocate a static IP address and run a home server to use Mobile IP or set up a virtual private network, they must configure firewalls to permit desired remote access traffic while filtering potentially malicious traffic from unknown parties, and so on. This model of “management by experts” works for organizations with administrative staff, but is infeasible for most consumers who wish to set up and manage their own personal networks.

The Unmanaged Internet Architecture (UIA) is a suite of design principles and experimental protocols that provide robust, efficient global connectivity among mobile devices while relying for configuration only on simple, intuitive management concepts. UIA uses “personal names” rather than traditional global names as handles for accessing personal devices remotely. Users assign these personal names via an ad hoc device introduction process requiring no central allocation. Once assigned, personal names bind securely to the global identities of their target devices independent of network location. Each user manages one namespace, shared among all the user’s devices and always available on each device. Users can also name other users to share resources with trusted acquaintances. Devices with naming relationships automatically arrange connectivity when possible, both in ad hoc networks and using global infrastructure when available. We built a prototype implementation of UIA that demonstrates the utility and feasibility of these design principles. The prototype includes an overlay routing layer that leverages the user’s social network to provide robust connectivity in spite of network failures and asymmetries such as NATs, a new transport protocol implementing a novel stream abstraction that more effectively supports the highly parallelized and media-oriented applications demanded on mobile devices, and a flexible security framework based on proof-carrying authorization (PCA) that provides “plug-in” interoperability with existing secure naming and authentication systems.

BIOGRAPHY: Bryan Ford began his systems research career as an undergraduate in the Flux group at the University of Utah, where he developed novel kernel structuring and component reuse techniques. After a break to join Phobos Inc., a successful networking startup, he returned to research as a graduate student at MIT, where he has pursued a diverse array of interests including programming languages, peer-to-peer and ubiquitous device networking, storage systems, and virtual machines.

 

Storage Systems for Global Scale Datacenters
Hakim Weatherspoon
Cornell University
Monday, February 11th, 2008
ABSTRACT: Digital information plays an increasingly critical role in scientific research, military systems and other enterprises, and this trend has important implications. First, many systems are more and more being distributed over a global network of datacenters, which is emerging as an important distributed systems paradigm. Second, storage systems in these environments must ensure the durability, integrity, and accessibility of digital data, and do so under potentially turbulent conditions. For example, in large scale distributed systems, servers continuously fail; data should remain durable despite constant failure.

Antiquity is a distributed storage system designed for these sorts of challenging environments. It maintains data securely, consistently, and with high availability in a dynamic wide-area environment. At the core of the system is a novel secure log structure that permits Antiquity to guarantee the integrity of stored data, even under extreme stress. Data is replicated on multiple servers in a manner that ensures that it can be retrieved later even when some replicas are inaccessible. Moreover, unlike prior fault-tolerant systems, the Antiquity fault-tolerance protocols can handle high levels of node churn, regenerating data on the fly when necessary to handle faults ranging from server outages to Byzantine (malicious) attacks.

Further, I will present SMFS, a remote mirroring solution targeted for settings where high-speed high-latency links connect a pair of datacenters. SMFS provides strong disaster tolerance guarantees with asynchronous performance-mirroring response times are more typical of high-speed LAN setting. Not only does the approach provide reliability through mirroring, but there are conditions under which it offers dramatic power savings. Longer term, we see SMFS and Antiquity as two examples of a family of innovative solutions addressing a range of demanding problems seen in turbulent, mission-critical, and power constrained settings.

BIOGRAPHY: Hakim Weatherspoon is currently a post-doctoral fellow at Cornell University. His work covers various aspects of information systems, distributed systems, network systems, and peer-to-peer systems with focus on fault-tolerance, reliability, security, and performance of Internet-scale systems. He previously received his PhD from University of California, Berkeley in Computer science.
Enabling Parallel Computing in the Mainstream
Sanjeev Kumar
Intel Research
Monday, February 4th, 2008
ABSTRACT: Chip multiprocessors (CMPs) are now commonplace and the number of processors on each machine is growing rapidly. While parallel machines are becoming ubiquitous, the difficulty of writing parallel programs remains a significant obstacle to its widespread adoption. In this talk, I will present two ideas that tackle this problem.

First, I will present a hybrid transactional memory scheme. One major source of difficulty in writing parallel programs is implementing correct synchronization between threads. Locks are the most common synchronization mechanism but have a number of well-known pitfalls. Transactional memory is an alternative mechanism that makes parallel programming easier. Early transactional memory proposals were implemented either entirely in hardware or in software. Both approaches have their limitations. We proposed the first hybrid hardware/software transactional memory scheme that approaches the performance of a hardware scheme in the common case and gracefully falls back to a software scheme when a transaction exceeds the capability of the hardware.

Second, I will present an architectural proposal that supports fine grained parallelism. To harness the parallelism on a CMP, applications must expose their thread-level parallelism to the hardware. One common approach to doing this is to decompose a program into parallel “tasks” and allow an underlying software layer to schedule these tasks to different threads. Software task scheduling can provide good parallel performance as long as tasks are large compared to the software overheads. However, a significant number of RMS applications have small tasks for which software task schedulers achieve only limited parallel speedups. This work proposed a hardware technique to accelerate dynamic task scheduling on scalable CMPs. This proposal has relatively simple hardware and delivers close to optimal performance of a number of RMS benchmarks with fine-grained parallelism.

BIOGRAPHY: Sanjeev Kumar is a Staff Researcher in Microprocessor Technology Labs at Intel Corporation. Prior to joining Intel in 2002, he received his Ph.D in Computer Science from Princeton University. The goal of his research is to make it easier to write correct parallel programs that achieve the desired performance objectives. His current research has two components: parallelizing mainstream applications to obtain an in-depth understanding of such applications, and proposing hardware and software techniques that enable such applications on mainstream parallel machines.