2014-2015 DISTINGUISHED LECTURE SERIES

February 17, 2015

Ronald Fagin, IBM Research - Almaden

Applying theory to practice (and practice to theory)

Bio:
Ronald Fagin is an IBM Fellow at IBM Research — Almaden. IBM Fellow is IBM's highest technical honor. There are currently 86 active IBM Fellows (out of 430,000 IBM employees worldwide), and there have been only 257 IBM Fellows in the 51-year history of the program. Ron received his B.A. in mathematics from Dartmouth College and his Ph.D. in mathematics from the University of California at Berkeley. He is a Fellow of IEEE, ACM, and AAAS (American Association for the Advancement of Science). He has co-authored three papers that won Best Paper Awards and three papers that won Test-of-time Awards, all in major conferences. One of his papers won the 2014 Godel Prize. He was named Docteur Honoris Causa by the University of Paris. He won the IEEE Technical Achievement Award, IEEE W. Wallace McDowell Award, and ACM SIGMOD Edgar F. Codd Innovations Award (a lifetime achievement award in databases). He is a member of the US National Academy of Engineering and the American Academy of Arts and Sciences.

Abstract:

The speaker will talk about applying theory to practice, with a focus on two IBM case studies. In the first case study, the practitioner initiated the interaction. This interaction led to the following problem. Assume that there is a set of "voters" and a set of "candidates", where each voter assigns a numerical score to each candidate. There is a scoring function (such as the mean or the median), and a consensus ranking is obtained by applying the scoring function to each candidate's scores. The problem is to find the top k candidates, while minimizing the number of database accesses. The speaker will present an algorithm that is optimal in an extremely strong sense: not just in the worst case or the average case, but in every case! Even though the algorithm is only 10 lines long (!), the paper containing the algorithm won the 2014 Godel Prize, the top prize for a paper in theoretical computer science.

The interaction in the second case study was initiated by theoreticians, who wanted to lay the foundations for "data exchange", in which data is converted from one format to another. Although this problem may sound mundane, the issues that arise are fascinating, and this work made data exchange a new subfield, with special sessions in every major database conference.

This talk will be completely self-contained, and the speaker will derive morals from the case studies. The talk is aimed at both theoreticians and practitioners, to show them the mutual benefits of working together.

March 11, 2015

Mary Shaw, Carnegie Mellon University

Progress Toward an Engineering Discipline of Software

Bio:
Mary Shaw is the Alan J. Perlis University Professor of Computer Science in the Institute for Software Research at Carnegie Mellon University. Her research interests lie in the area of software engineering, particularly software architecture and design of systems used by real people. She has received the United States' National Medal of Technology and Innovation, the ACM SIGSOFT Outstanding Research Award (with David Garlan), the IEEE Computer Society TCSE's Distinguished Educator Award, and CSEE&T's Nancy Mead Award for Excellence in Software Engineering Education. She is a fellow of the ACM, the IEEE, and the AAAS.

Abstract:

Is "software engineering" really engineering? The term was coined in 1968 to call attention to problems with software production. Both theory and practice for software have evolved since then, but do we yet have a true engineering discipline? Classical engineering disciplines have emerged from craft practice and commercialization through the infusion of codified knowledge and science. Using this emergence pattern as a point of reference, I will sketch the evolution of software engineering, drawing on civil engineering and software architecture for examples that show the progressive codification of informal knowledge toward rigorous models and tools. This will provide the basis for assessing the maturity of the field and identifying our next challenges.

April 28, 2015

Lise Getoor, University of California Santa Cruz

Combining Statistics and Semantics using Probabilistic Soft Logic

Bio:
Lise Getoor is a professor in the Computer Science Department at the University of California, Santa Cruz. Her research areas include machine learning, data integration and reasoning under uncertainty, with an emphasis on graph and network data. She is a AAAI Fellow, serves on the Computing Research Association and International Machine Learning Society boards, was cochair of ICML 2011, is a recipient of an NSF Career Award and eight best paper and best student paper awards. She received her PhD from Stanford University, her MS from the University of California, Berkeley, and her BS from the University of California, Santa Barbara, and was a professor at the University of Maryland, College Park from 2001–2013.

Abstract:

One of the challenges in big data analytics is to efficiently learn and reason collectively about extremely large, heterogeneous, incomplete, noisy interlinked data. Collective reasoning requires the ability to exploit both the logical and relational structure in the data and the probabilistic dependencies. In this talk I will overview our recent work on probabilistic soft logic (PSL), a framework for collective, probabilistic reasoning in relational domains. PSL is able to reason holistically about both entity attributes and relationships among the entities. The underlying mathematical framework, which we refer to as a hinge-loss Markov random field, supports extremely efficient, exact inference. This family of graphical models captures logic-like dependencies with convex hinge-loss potentials. I will survey applications of PSL to diverse problems ranging from information extraction to computational social science. Our recent results show that by building on state-of-the-art optimization methods in a distributed implementation, we can solve large-scale problems with millions of random variables orders of magnitude faster than existing approaches.

Other Lectures