October 23, 2017
John R. Smith, IBM T. J. Watson Research Center
Huge amounts of images and video are being generated and consumed across all industries. For decades, this type of visual data has been the darkest of "dark data" and eluded our ability to effectively understand it. However, this is no longer the case. Ongoing developments in computer vision, deep learning and AI are producing dramatic advances in technology, where we are now able to learn effective representations of images and video that allow us to accurately recognize, describe, search, and answer questions from this data. In this talk, we discuss opportunities and challenges for scaling visual comprehension for industries where the price of not seeing is simply too high.
We show how these new capabilities can provide transformational impact for industry problems related to Cloud, IoT, Healthcare, Media, and Safety and Security. We present some of the recent projects at IBM Research on visual comprehension and discuss future directions.
Dr. John R. Smith is IBM Fellow and Manager of Multimedia and Vision at IBM T. J. Watson Research Center. He leads IBM's Research & Development on image and video analysis including: IBM Watson Visual Recognition, IBM Multimedia Analysis and Retrieval System (IMARS), Intelligent Video Analytics (IVA), Skin Cancer Image Analysis, and Video Understanding for Augmented Creativity.
Dr. Smith served as co-General Chair of ACM Intl. Conf. on Multimedia Retrieval (ICMR-2016) in New York City. Previously, he was Editor-in-Chief of IEEE Multimedia from 2010 - 2014, Associate Editor-in-Chief from 2006 - 2010 and Standards Editor from 2003 - 2006. Earlier Dr. Smith led IBM's participation in MPEG-7 / MPEG-21 standards and served as a Chair of the MPEG Multimedia Description Schemes Sub-Group and co-project Editor of MPEG-7 Standard.
Historically, with Prof. Shih-Fu Chang at Columbia University, Dr. Smith developed some of the earliest approaches for content-based image and video retrieval, including VisualSEEk - content-based image retrieval system, and WebSEEk - one of the first image and video search engines for the Web in 1995!
Dr. Smith is a Fellow of IEEE.
November 13, 2017
Dan Spielman, Yale
The Laplacian matrices of graphs arise in many fields, including Machine Learning, Computer Vision, Optimization, Computational Science, and of course Network Analysis. We will explain what these matrices are and why they appear in so many applications. We then survey recent ideas that allow us to solve systems of linear equations in Laplacian matrices in nearly linear time, emphasizing the utility of graph sparsification---the approximation of a graph by a sparser one---and a recent algorithm of Kyng and Sachdeva that uses random sampling to accelerate Gaussian Elimination.
Daniel Alan Spielman received his B.A. in Mathematics and Computer Science from Yale in 1992, and his Ph.D in Applied Mathematics from M.I.T. in 1995. He spent a year as a NSF Mathematical Sciences Postdoc in the Computer Science Department at U.C. Berkeley, and then taught in the Applied Mathematics Department at M.I.T. until 2005. Since 2006, he has been a Professor at Yale University. He is presently the Henry Ford II Professor of Computer Science, Statistics and Data Science, Mathematics, and Applied Mathematics.
He has received many awards, including the 1995 ACM Doctoral Dissertation Award, the 2002 IEEE Information Theory Paper Award, the 2008 and 2015 Godel Prize, the 2009 Fulkerson Prize, the 2010 Nevanlinna Prize, the 2014 Polya Prize, an inaugural Simons Investigator Award, and a MacArthur Fellowship. He is a Fellow of the Association for Computing Machinery and a member of the Connecticut Academy of Science and Engineering. His main research interests include the design and analysis of algorithms, network science, machine learning, digital communications and scientific computing.
November 20, 2017
Yann LeCun, Facebook AI Research & New York University
Deep learning is at the root of revolutionary progress in visual and auditory perception by computers, and is pushing the state of the art in natural language understanding, dialog systems and language translation. Deep learning systems are deployed everywhere from self-driving cars to content filtering, search, and medical image analysis. Almost all of the deployed applications of deep learning use supervised learning in which the machine is trained to predict human-provided annotations. While reinforcement learning has been very successful in games and a few real-world applications, it requires an inordinately large number of trials to learn complex concepts. In contrast, humans and animals learn vast amounts of knowledge about the world by observation, with very little feedback from intelligent teachers and very few interactions with the environment. Humans (and many animals) construct complex predictive models of the world that give them "common sense", allowing them to interpret percepts, to fill in missing information, to predict future events, and to plan a course of actions. Enabling machines to learn predictive models of the world is a major obstacle towards significant progress in AI. I will describe a number of promising approaches towards learning predictive models that can handle the intrinsic uncertainty of the natural world, particularly variations of adversarial training.
Yann LeCun is Director of AI Research at Facebook and Silver Professor at New York University, affiliated with the Courant Institute, the Center for Neural Science and the Center for Data Science, for which he served as founding director until 2014. He received an EE Diploma from ESIEE (Paris) in 1983, a PhD in Computer Science from Université Pierre et Marie Curie (Paris) in 1987. After a postdoc at the University of Toronto, he joined AT&T Bell Laboratories. He became head of the Image Processing Research Department at AT&T Labs-Research in 1996, and joined NYU in 2003 after a short tenure at the NEC Research Institute. In late 2013, LeCun became Director of AI Research at Facebook, while remaining on the NYU Faculty part-time. He was visiting professor at Collè ge de France in 2016. His research interests include machine learning and artificial intelligence, with applications to computer vision, natural language understanding, robotics, and computational neuroscience. He is best known for his work in deep learning and the invention of the convolutional network method which is widely used for image, video and speech recognition. He is a member of the US National Academy of Engineering, the recipient of the 2014 IEEE Neural Network Pioneer Award, the 2015 IEEE Pattern Analysis and Machine Intelligence Distinguished Researcher Award, the 2016 Lovie Award for Lifetime Achievement, and a honorary doctorate from IPN, Mexico.
November 29, 2017
Aviv Regev, MIT
Reconstructing the circuits that control how cells detect environmental triggers and adopt specific fates is a fundamental challenge across all areas of biology. Genomic research on circuitry has initially used observational approaches that infer regulation from correlations in molecular profiles, but cannot distinguish correlation from causation. More recently, approaches that use single perturbations helped determine the function of individual components. However, because interactions in circuits are non-linear, we cannot predict how the circuit will function simply by testing individual effects. In principle, a massive number of combinations of perturbations would have to be tested, even though the vast majority of genes do not interact, and such a massive search space is beyond the current scale of experimental biology. Here, I will describe new emerging approaches to tackle this problem in the context of gene expression programs, regulatory sequences and high order genetic interactions, but considering the fact that biological circuits are both sparse and structured. These approaches combine new experimental designs, focused on random sampling of the relevant biological space (perturbations, expression programs etc), and associated computational approaches for their analysis.
Aviv Regev, a computational and systems biologist, is a professor of biology at MIT, a Howard Hughes Medical Institute Investigator, the Chair of the Faculty and the director of the Klarman Cell Observatory and Cell Circuits Program at the Broad Institute of MIT and Harvard, and co-chair of the organizing committee for the international Human Cell Atlas project. She studies the molecular circuitry that governs the function of mammalian cells in health and disease and has pioneered many leading experimental and computational methods for the reconstruction of circuits, including in single-cell genomics.
Regev is a recipient of the NIH Director's Pioneer Award, a Sloan fellowship from the Sloan Foundation, the Overton Prize from the International Society for Computational Biology (ISCB), the Earl and Thressa Stadtman Scholar Award from the American Society of Biochemistry and Molecular Biology, and the ISCB Innovator Award, and she is a ISCB Fellow (2016). Regev received her M.Sc. from Tel Aviv University, studying biology, computer science, and mathematics in the Interdisciplinary Program for the Fostering of Excellence. She received her PhD in computational biology from Tel Aviv University.
December 04, 2017
Rupal Patel, Founder & CEO, VocaliD; Professor at Northeastern University
Digital voices today continue to sound generic and robotic. Not only do they lack the clarity and naturalness of the human voice, they lack personality. At VocaliD, we can now reverse engineer a voice by taking speech recordings from a healthy talker and vocal samples from those who are unable to speak. That's because we've discovered that the prosodic cues in residual vocalizations contain enough vocal DNA to seed the personalization process. People of all ages from around the world are sharing their voice on our Human Voicebank platform. Recorded using everyday technology like your computer, encrypted to protect confidentiality, stored on the cloud, typed for a match and blended to create a unique vocal persona. For all the worry about how technology is depersonalizing us, here's a way in which technology can make us all a little more human. In this talk I will describe the advantages and challenges of gathering a crowdsourced speech corpus and our plans to make a portion of it available to the scientific community. I hope it will also spark a conversation about collaborations and extensions to the platform to fuel future innovation.
Rupal Patel is a Professor at Northeastern University with joint appointments in the Department of Communication Sciences and Disorders and the College of Computer and Information Science. A native of Canada, she earned her bachelor's degree from University of Calgary, her master's and PhD from University of Toronto and post-doctoral training at Massachusetts Institute of Technology. Rupal's research focuses on speech motor control in healthy talkers and those with neuromotor speech impairment; this empirical evidence is then applied to the design of technologies that enable, enrich and enhance communication. She has 60+ peer-reviewed journal articles, presented at several hundred national and international conferences, and garnered $8M+ in research funding from the National Institutes of Health, the National Science Foundation and private foundations. In 2014 Rupal founded VocaliD, a technology company that is at the forefront of voice preservation, restoration, and analytics. Today, over 25,000 speakers from around the world are sharing their voice on The Human Voicebank platform to power the creation of unique vocal identities for those living with voicelessness. Rupal's work has been featured on TED, NPR and in leading news and technology outlets such as The Wall Street Journal, Wired, Bloomberg, and BuzzFeed and she was recently named one of FastCompany's 100 Most Creative people in Business.