We're sitting in the lounge overlooking the Manhattan skyline in the New York Times building. At the time of this interview, Jake is a data scientist with the Times, America's most famous newspaper (Jake now serves fulltime as Executive Director of DataKind, which he also founded). We met Jake in the lobby of the Times building, alongside a massive wall of quietly clicking screens, each displaying tiny snippets of text from the current edition of the newspaper. After meeting us amid the bustle of activity one would expect from the lobby of a newspaper, Jake has whisked us up many floors to sit in peace in one of the Times' many conference rooms.
JP: Sure. It's often hard to find ways to apply computer science and data analysis to "the greater good." Over the last few decades, we've seen the growth of a huge statistics and machine learning community. We've seen an evolution in how almost every decision in our world is made, and to sum it up, the geeks won. Data and technology are now more than ever driving every decision the world makes. It's exciting, and it's powerful! But we also have an issue where we have groups working to improve the world and have a lot of data, but don't know how to make sense of it or how to use it. We have statisticians and computer scientists who want to help out, but don't know exactly how. We try to bring these groups together to help do good things.
CS@CU: It seems like you're doing a lot of interesting things. Let's look back now on your time at Columbia. Are there any memorable CS-related experiences you had at Columbia?
JP: A big one was going to the Robocup in Italy. It was also great being taught directly by all of these famous, high-caliber professors.
CS@CU: Speaking of which, what were some of your favorite classes?
JP: Well, there was Prof. Nayer's Introduction to Computer Vision. I enjoyed taking natural language processing with Julia Hirschberg and classes with Steve Belhumeur. I also remember a great moment in Prof. Kender's class on artificial intelligence. He'd asked us whether all 2-variable Boolean expressions had been discovered yet or not. Everyone in the class sat there thinking for a bit, and he asked us "Are you computer scientists, or are you just hackers?" That class really taught me that computer science isn't just being able to write code. Really it's about a way to approach and solve problems.
CS@CU: What was your favorite part of living in New York as an undergraduate?
JP: There are so many cool groups in New York. There's Eyebeam, a great space for art and technology. There's Dorkbot, a monthly meeting of artists, scientists, designers, and more, who get together and make cool things. I'm also a huge comedy fan so the Upright Citizens Brigade got me through many a weekend in the city.
CS@CU: Thanks for your time, Jake!
JP: Thank you!
CS@CU: Let's start with a brief introduction. Where are you from, what year did you graduate from Columbia, and on what CS track?
JP: I'm originally from Longmeadow, in western Massachusetts. I graduated in 2004 from Columbia on the Intelligent Systems track. I was in SEAS. I really enjoyed computer science at Columbia because of the flexibility in course selection. I was able to take a bunch of liberal arts courses that I believe really added to my education.
CS@CU: Can you tell us a little bit about what you do at the New York Times?
JP: I work as the data scientist in the Research and Development Lab. The purpose of the lab is to "look around corners", to try and predict what media will look like in two or three years, and build prototypes of technologies based on those predictions. Currently, we do a lot of work with big data. We try to get data and tell stories around it. We both contribute to articles and do solid research.
CS@CU: Can you tell us a little bit about some of those projects?
JP: Sure, we have a few projects that I think are pretty cool.
- Project Cascade: Many people retweet stories that the New York Times releases. We try to figure out what caused them to tweet and other answer other questions. Who clicked on the retweet? Who shared it?
- Real-time Twitter monitoring: What is the ideal time to tweet a story? Once we've tweeted it, what are ways to keep it trending and keep it in the public eye?
- OpenPaths: It was discovered a while ago that Apple, via the iPhone, was keeping track of the location data of many of their users. When people found this out, a lot of users insisted that Apple delete their data. Some members of the R&D Lab realized that, before everyone deleted that data, there was huge potential to learn from this inadvertently collected database of human mobility information. We decided to do something about it. The end result was OpenPaths, a web service that lets users anonymously upload and store their location data. Users can also give access to the data to researchers, if they choose.
CS@CU: When you're not working at New York Times, you're also the founder of Data Without Borders (ed note: Data Without Borders has been renamed DataKind). Could you tell us a little bit about this project?