Spring 2014



Deep Learning and the Representation of Natural Data
Yann LeCun
Facebook AI Research and Center for Data Science, New York University
Friday, February 21, 2014
ABSTRACT: The combined emergence of very large datasets, powerful parallel computers, and new machine learning methods, has enabled the deployment of highly-acurate computer perception systems, and is opening the door to a wide deployment of AI systems.

A key component in systems that can understand natural data is a module that turns the raw data into an suitable internal representation. But designing and building such a module, often called a feature extractor, requires a considerable amount of engineering efforts and domain expertise.

The main objective of ‘Deep Learning’ is to come up with learning methods that can automatically produce good representations of data from labeled or unlabeled samples. Deep learning allows us to construct systems that are trained end to end, from raw inputs to ultimate output. Instead of having a separate feature extractor and perdictor, deep architectures have multiple stages in which the data is represented hierarchically: features in successive stages are increasingly global, abstract, and invariant to irrelevant transformations of the input.

The convolutional network model (ConvNet) is a particular type of deep architecture that is somewhat inspired by biology, and consist of multiple stages of filter banks, interspersed with non-linear operations, and spatial pooling. ConvNets, have become the record holder for a wide variety of benchmarks and competition, including object detection, localization, and recognition in image, semantic image segmentation and labeling (2D and 3D), acoustic modeling for speech recognition, drug design, handwriting recognition, biological image segmentation, etc.

The most recent speech recognition and image understanding systems deployed by Facebook, Google, IBM, Microsoft, Baidu, NEC and others use deep learning, and many use convolutional networks. Such systems use very large and very deep ConvNets with billions of connections, trained using backpropagation with stochastic gradient, with heavy regularization. But many new applications require the use of unsupervised feature learning methods. A number of methods based on sparse auto-encoder will be presented.

Several applications will be shown through videos and live demos, including a category-level object recognition system that can be trained on the fly, a system that can label every pixel in an image with the category of the object it belongs to (scene parsing), a pedestrian detector, and object localization and detection systems that rank first on the ImageNet Large Scale Visual Recognition Challenge data. Specialized hardware architecture that run these systems in real time will also be described.

BIOGRAPHY: Yann LeCun is Director of AI Research at Facebook, and Silver Professor of Data Science, Computer Science, Neural Science, and Electrical Engineering at New York University, affiliated with the NYU Center for Data Science, the Courant Institute of Mathematical Science, the Center for Neural Science, and the Electrical and Computer Engineering Department.

He received the Electrical Engineer Diploma from Ecole Superieure d’Ingenieurs en Electrotechnique et Electronique (ESIEE), Paris in 1983, and a PhD in Computer Science from Universite Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories in Holmdel, NJ in 1988. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU as a professor in 2003, after a brief period as a Fellow of the NEC Research Institute in Princeton. From 2012 to 2014 he directed NYU’s initiative in data science and became the founding director of the NYU Center for Data Science. He was named Director of AI Research at Facebook in late 2013 and retains a part-time position on the NYU faculty.

His current interests include AI, machine learning, computer perception, mobile robotics, and computational neuroscience. He has published over 180 technical papers and book chapters on these topics as well as on neural networks, handwriting recognition, image processing and compression, and on dedicated circuits and architectures for computer perception. The character recognition technology he developed at Bell Labs is used by several banks around the world to read checks and was reading between 10 and 20% of all the checks in the US in the early 2000s. His image compression technology, called DjVu, is used by hundreds of web sites and publishers and millions of users to access scanned documents on the Web. Since the mid 1980’s he has been working on deep learning methods, particularly the convolutional network model, which is the basis of many products and services deployed by companies such as Facebook, Google, Microsoft, Baidu, IBM, NEC, AT&T and others for image and video understanding, document recognition, human-computer interaction, and speech recognition.

LeCun has been on the editorial board of IJCV, IEEE PAMI, and IEEE Trans. Neural Networks, was program chair of CVPR’06, and is chair of ICLR 2013 and 2014. He is on the science advisory board of Institute for Pure and Applied Mathematics, and has advised many large and small companies about machine learning technology, including several startups he co-founded. He is the lead faculty at NYU for the Moore-Sloan Data Science Environment, a $36M initiative in collaboration with UC Berkeley and University of Washington to develop data-driven methods in the sciences. He is the recipient of the 2014 IEEE Neural Network Pioneer Award.



Teaching the Craft of Code
David Pritchard
Princeton University
Thursday, Feb. 27, 2014
ABSTRACT: Programming is a skill that is best learned actively. I will discuss several websites that I have developed with this aim: Computer Science Circles (Python), Websheets (Java), and a Java execution visualizer.

While these are student-centric tools, the instructor’s perspective is also important. What do instructors learn about their students from these tools? How can we maximally enable creativity and efficiency on the part of the educator? Along the way I’ll discuss open-source software and correctional institutes.

BIOGRAPHY: David Pritchard is a Lecturer in the Department of Computer Science at Princeton University. His main interests are combinatorics, linear programs, approximation algorithms, probabilistic methods, and computational methods. In 2010, he defended his Ph.D. in the Combinatorics and Optimization department at the University of Waterloo. My work was on approximation algorithms: the design of polynomial-time algorithms that find provably good approximate solutions to NP-hard problems. Pritchard has 21 publications in journals and conference proceedings; these publications span additional areas such as computational geometry, bioinformatics, and education.

David Pritchard received a B.S. in Mathematics and Computer Science and a M. Eng in Computer Science from MIT in 2005. He received a PhD from the University of Waterloo, Department of Combinatorics and Optimization in January 2010.

Machine Learning Paves the Way for Prediction of Preterm Birth
Ansaf Salleb-Aouissi
Columbia University
Tuesday, March 4, 2014
ABSTRACT: Huge amounts of data are being collected everywhere when we browse the web, go to the doctor’s office, visit the supermarket, or watch a movie, we are providing information that fills in records on a database. Advances in fields like machine learning have shown promise with respect to digging through the data to make it more useful.

In the first part of my talk, I will present ongoing research by my group in medical informatics with emphasis on the application of machine learning to the prediction of preterm birth. I will present our analysis of a dataset collected by the NIH-NICHD Maternal Fetal Medicine Units (MFMU) Network, a high-quality data for over 3,000 singleton pregnancies having detailed study visits and biospecimen collection at 24, 26, 28 and 30 weeks gestation. Multiple processing steps were required to prepare this rich and highly structured data. We faced several challenges including: (1) presence of missing data, (2) varying sample size over time, (3) skewed class distributions due to preterm birth rate, and (4) the need to consider different subsets of data, such as nulliparous (first-time) mothers. In the second part, I will describe our efforts toward harnessing Electronic Health Records to prepare data for machine learning. I will show our preliminary work on prediction of preterm birth using a 5-year snapshot of data for mothers and babies from the New York Presbyterian Hospital EHR systems.

I will conclude my talk with my education and research objectives and how they intertwine with each other.

BIOGRAPHY: Ansaf Salleb-Aouissi joined Columbia University’s Center for Computational Learning Systems as an Associate Research Scientist in 2006 after a Postdoctoral Fellowship at INRIA (France). Her research interests lies in Machine Learning. She has worked on large-scale projects including the power grid. Her current research includes pattern discovery, crowdsourcing and medical informatics. Ansaf has published several peer-reviewed papers in high quality journals, conferences and books including TPAMI, ECML, PKDD, COLT, IJCAI, ECAI and AISTAT. Ansaf received an Engineer degree in Computer Science from the University of Science and Technology Houari Boumediene, Algeria, an M.S. and Ph.D. degrees from University of Orleans (France).
Matching on-the-fly: Sequential Allocation with Higher Power & Efficiency
Adam Kapelner
Wharton School of the University of Pennsylvania
Thursday, March 6, 2014
ABSTRACT: We propose a dynamic allocation procedure that increases power and efficiency when measuring an average treatment effect in fixed sample randomized trials with sequential allocation. Subjects arrive iteratively and are either randomized or paired via a matching criterion to a previously randomized subject and administered the alternate treatment. We develop estimators for the average treatment effect that combine information from both the matched pairs and unmatched subjects as well as an exact test. Simulations illustrate the method’s higher efficiency and power over several competing allocation procedures in both simulations and in data from a clinical trial.
BIOGRAPHY: Adam Kapelner is a P.h.D. candidate in Statistics at Wharton School of the University of Pennsylvania. He received a B.S. in Mathematical and Computational Science from Stanford University in 2006 and a A.M. in Statistics from Wharton School of the University of Pennsylvania in 2012. Kapelner received a National Science Foundation Graduate Research Fellowship from May 2012 to April 2013 and a J. Parker Bursk Memorial Award for Excellence in Teaching in December 2013. Adam has a passion for teaching and mentoring on both the undergraduate and graduate levels. His research focuses on statistical methodology in collaboration with Professor Abba Krieger and Bayesian non-parametric machine learning in collaboration with Professor Ed George.
Lower Bounds for the Capacity of the Deletion Channel
Eleni Drinea
Columbia University
Thursday, March 27, 2014
ABSTRACT: The capacity of a communication channel is the maximum rate at which information can be reliably transmitted over the channel. In this work I consider the capacity of the binary deletion channel, where bits are deleted independently with a certain probability. This represents perhaps the simplest channel with synchronization errors but a characterization of its capacity remains an open question. I will present several techniques to lower bound the capacity, including Markov chain methods, Poisson-repeat channels, and ideas from renewal theory. The quality of these lower bounds is evaluated via numerical simulations.
BIOGRAPHY: Eleni Drinea received a B.S. degree in computer engineering in 1999 from the University of Patras, Greece. She obtained her Ph.D in computer science from Harvard University in 2005. She then joined the New England Complex Systems Institute in 2006 as a postdoctoral fellow working on information theoretic tools for data analysis. From 2007 to 2009 she was a research associate with the school of computer and communication sciences at EPFL, Switzerland, where her interests centered around reliable wireless communication and network coding. She is currently an adjunct professor at Columbia University.
Machine Learning for College Counseling
Zephaniah Grunschlag
Riverdale Country School
Friday, March 28th, 2014
ABSTRACT: In recent decades college graduation rates have soared for wealthier young adults but have stagnated for the poor. While many colleges are committed to admitting and supporting poorer students, they have been stymied by a dearth of qualified low income applicants. Studies have shown that a majority of highly qualified low income students employ inadequate strategies in the college application process and simply do not apply to the elite institutions that would likely accept them and offer support. A paucity of college counseling resources is largely to blame. Admitster.com is a web app that we hope will help bridge the college counseling gap by enabling students to more easily assess and improve their chances of gaining admissions at American universities. I will describe Admitster.com in the context of computer aided decision making. I’ll talk about technical issues that must be overcome in designing such a system. One particular issue that often occurs is when the variables for a dataset are only partially known for each observation. In such situations we expect machine learning results to get less robust as we increase the number of variables and therefore decrease the number of available observations. I’ll propose a pairwise variable ensembling technique that attempts to increase the reliability of predictions in these situations without imputing missing values.
BIOGRAPHY: Zeph Grunschlag is a mathematics and computer science teacher at the Riverdale Country School. He is also a cofounder of Admitster.com which is a personalized college counseling tool for high school students. He received his A.B. in Mathematics from Princeton University in 1992. He completed his Ph.D. in Mathematics at U. C. Berkeley in 1999 where he wrote a dissertation on Algorithms in Geometric Group Theory. He was a lecturer at Columbia’s Computer Science Department from 1999 to 2006. Later he worked as a software engineer in industry before returning to teaching in 2010.