E6998 Advanced Computer Vision

Information

Instructor: Carl Vondrick
Time: TR 4:10pm-5:25pm
Location: 227 Seeley Mudd
Term: Spring 2019

Office Hours

Carl: Tue/Thr 5:30pm-6pm, CEPSR 611
Dave: Mon 5:30pm-7:30pm, CS TA Room
Hassan: Tues 2:15pm-3:15pm

Grading

Class Participation 20%
Class Presentation 40%
Final Project 40%

Overview

This is an advanced seminar course that will focus on the latest research in computer vision and related fields. Students will read, present, and discuss papers, and a complete a semester long project. Topics will include visual recognition, self-supervised learning, cross-modal transfer, neural network interpretation, commonsense reasoning, vision and language, and embodied vision. Experience in deep learning is strongly recommended.

Announcements

Enrollment is capped at 30 students. You must have instructor approval to take the course.
The course website is under construction and subject to change.

Syllabus

The syllabus is subject to change as the course evolves.

List of suggested papers

Paper signup

Date Presenter 1 Paper 1 Presenter 2 Paper 2

Jan 22 Carl Vondrick Perception Beyond Measurement

Jan 24 Carl Vondrick Ecological Vision

Jan 29 Justin Chou What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012. Sagar Lal What Makes an Image Memorable?

Jan 31 Sebastian Cueva-Caro Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick. ICCV 2017 Simone Fobi Light-Head R-CNN: In Defense of Two-Stage Object Detector Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun. arXiv Nov 2017

Feb 5 Nathan Silberman Computer Vision for Healthcare ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations Understanding Equivalence and Noninferiority Testing

Feb 7 Parita Pooj Finding Tiny Faces James Shin PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas. arXiv Dec 2016

Feb 12 Project Pitch

Feb 14 Project Pitch

Feb 19 Dave Epstein Low-shot Learning from Imaginary Data. Yu-Xiong Wang, Ross Girshick, Martial Herbert, Bharath Hariharan. CVPR, 2018 (Spotlight). Jessie Liu Matching Networks for One Shot Learning

Feb 21 Yicun Liu Deformable Convolutional Networks Terence Conlon ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun. arXiv July 2017

Feb 26 Aashish Misraa BigTime: Learning Intrinsic Images by Watching the World. Zhengqi Li and Noah Snavely. CVPR 2018 Boyuan Chen What happens if... Learning to Predict the Effect of Forces in Images

Feb 28 Ian Huang From Recognition to Cognition: Visual Commonsense reasoning Yiliang Shi Inferring and Executing Programs for Visual Reasoning Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick. ICCV 2017

Mar 5 Noah Snavely 3D Computer Vision

Mar 7 Lauren Arnett Taskonomy: disentangling task transfer learning Dimitri Leggas Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks

Mar 12 Max Ogryzko Learning Features by Watching Objects Move. Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, Bharath Hariharan. CVPR 2017. Mayank Saxena Fighting Fake News: Image Splice Detection via Learned Self-Consistency Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros in ECCV'18

Mar 14 Yueqi Wang Cognitive Mapping and Planning for Visual Navigation Suhyun Kim Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition

Mar 19 Spring Break

Mar 21 Spring Break

Mar 26 Lahav Lipson Multimodal Unsupervised Image-to-Image Translation Chris Alberti Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba and James Glass

Mar 28 TBD TBD

Apr 2 Zheng Shou Video

Apr 4 Connor Goggins Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog Yiru Chen Network Dissection: Quantifying Interpretability of Deep Visual Representations David Bau*, Bolei Zhou*, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017.

Apr 9 Suman Mulumudi Playing Atari with Deep Reinforcement Learning Niles Christensen Investigating Human Priors for Playing Video Games

Apr 11 Roop Pal Learning to Poke by Poking: Experiential Learning of Intuitive Physics Vinay Ramesh Learning to Fly by Crashing. Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta

Apr 16 Hassan Akbari Neural architecture search with reinforcement learning. Akarsh Zingade Focal Loss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar. ICCV 2017 "

Apr 18 Sonam Goenka Unsupervised Image Captioning (Feng et al) TBD

Apr 23 Project Studio

Apr 25 Project Studio

Apr 30 Project Studio

May 2 Poster Presentation

Date	Presenter 1	Paper 1	Presenter 2	Paper 2
Jan 22	Carl Vondrick	Perception Beyond Measurement
Jan 24	Carl Vondrick	Ecological Vision
Jan 29	Justin Chou	What makes Paris look like Paris? Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. Siggraph 2012.	Sagar Lal	What Makes an Image Memorable?
Jan 31	Sebastian Cueva-Caro	Mask R-CNN Kaiming He, Georgia Gkioxari, Piotr Dollar, Ross Girshick. ICCV 2017	Simone Fobi	Light-Head R-CNN: In Defense of Two-Stage Object Detector Zeming Li, Chao Peng, Gang Yu, Xiangyu Zhang, Yangdong Deng, Jian Sun. arXiv Nov 2017
Feb 5	Nathan Silberman	Computer Vision for Healthcare	ExplainGAN: Model Explanation via Decision Boundary Crossing Transformations	Understanding Equivalence and Noninferiority Testing
Feb 7	Parita Pooj	Finding Tiny Faces	James Shin	PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas. arXiv Dec 2016
Feb 12	Project Pitch
Feb 14	Project Pitch
Feb 19	Dave Epstein	Low-shot Learning from Imaginary Data. Yu-Xiong Wang, Ross Girshick, Martial Herbert, Bharath Hariharan. CVPR, 2018 (Spotlight).	Jessie Liu	Matching Networks for One Shot Learning
Feb 21	Yicun Liu	Deformable Convolutional Networks	Terence Conlon	ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, Jian Sun. arXiv July 2017
Feb 26	Aashish Misraa	BigTime: Learning Intrinsic Images by Watching the World. Zhengqi Li and Noah Snavely. CVPR 2018	Boyuan Chen	What happens if... Learning to Predict the Effect of Forces in Images
Feb 28	Ian Huang	From Recognition to Cognition: Visual Commonsense reasoning	Yiliang Shi	Inferring and Executing Programs for Visual Reasoning Justin Johnson, Bharath Hariharan, Laurens van der Maaten, Judy Hoffman, Li Fei-Fei, C. Lawrence Zitnick, Ross Girshick. ICCV 2017
Mar 5	Noah Snavely	3D Computer Vision
Mar 7	Lauren Arnett	Taskonomy: disentangling task transfer learning	Dimitri Leggas	Learning to Look Around: Intelligently Exploring Unseen Environments for Unknown Tasks
Mar 12	Max Ogryzko	Learning Features by Watching Objects Move. Deepak Pathak, Ross Girshick, Piotr Dollar, Trevor Darrell, Bharath Hariharan. CVPR 2017.	Mayank Saxena	Fighting Fake News: Image Splice Detection via Learned Self-Consistency Minyoung Huh, Andrew Liu, Andrew Owens, Alexei A. Efros in ECCV'18
Mar 14	Yueqi Wang	Cognitive Mapping and Planning for Visual Navigation	Suhyun Kim	Visual Curiosity: Learning to Ask Questions to Learn Visual Recognition
Mar 19	Spring Break
Mar 21	Spring Break
Mar 26	Lahav Lipson	Multimodal Unsupervised Image-to-Image Translation	Chris Alberti	Jointly Discovering Visual Objects and Spoken Words from Raw Sensory Input David Harwath, Adrià Recasens, Dídac Surís, Galen Chuang, Antonio Torralba and James Glass
Mar 28	TBD		TBD
Apr 2	Zheng Shou	Video
Apr 4	Connor Goggins	Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog	Yiru Chen	Network Dissection: Quantifying Interpretability of Deep Visual Representations David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba. CVPR 2017.
Apr 9	Suman Mulumudi	Playing Atari with Deep Reinforcement Learning	Niles Christensen	Investigating Human Priors for Playing Video Games
Apr 11	Roop Pal	Learning to Poke by Poking: Experiential Learning of Intuitive Physics	Vinay Ramesh	Learning to Fly by Crashing. Dhiraj Gandhi, Lerrel Pinto, Abhinav Gupta
Apr 16	Hassan Akbari	Neural architecture search with reinforcement learning.	Akarsh Zingade	Focal Loss for Dense Object Detection Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, Piotr Dollar. ICCV 2017 "
Apr 18	Sonam Goenka	Unsupervised Image Captioning (Feng et al)	TBD
Apr 23	Project Studio
Apr 25	Project Studio
Apr 30	Project Studio
May 2	Poster Presentation

Course Policy

Class Participation: Discussion of papers will be a large component of this course. As such, class participation is a large part of the course grade (20%). Please come to class prepared to engage with the presenter and your peers in class. Remember, your peers have spent a significant amount of time preparing the lecture for the day, and participating in discussion will help them out. When you present your lecture, you will want your classmates to join in discussion too!

Presentation: During the course, you will have the chance to present your chosen paper to the class and lead discussion on it. In my experience, excellent talks are the result of extensive preparation. This means two things: a) everybody can give a great presentation because all it takes is practice, and b) you should practice your lecture many times before giving it to the class.

Late Policy: Since all assignments are discussion based and require the whole class to be present, there will be no extensions. If you are unable to present on your scheduled day, it is your responsibility to find a friend to swap with.

Academic Dishonesty: Plagiarism and cheating will result in a zero for the course. You are allowed to use images, code, slides, and material from papers and websites, however you must cite the source.

Course Projects: You may complete the course project individually or in groups. You are encouraged to start the course project from the first day. If you do not have access to a GPU, please contact the course instructors and we will help you find one. At the end of the course, you will have the chance to present your project to your classmates.

Acknowledgements

We gladly acknowledge several instructors for making their course material available, which this class is based on: James Hays, Philipp Krähenbühl, Devi Parikh, Abhinav Gupta, Alyosha Efros, Antonio Torralba.

Columbia University