who am i

Gideon Mendels

Researcher / Software Engineer - Columbia University

I'm a Researcher, Software engineer and Entrepreneur currently working as a Graduate Research Assistant with Professor Julia Hirschberg. My research focuses on using web collected data for language modelling, code switch detection, sentiment analysis and building automated tools to do that. I'm also interested in other topics in NLP, Speech Recognition and Machine learning.

I love building stuff! I initiated and contributed to many software projects (mostly in Java, Python), started a successful ecommerce business and like to work on 3D designs in my spare time. I also interned at Google working on Deep Learning for Document Classification.

Publications

Improving speech recognition and keyword search for low resource languages using web data

Proc. Interspeech, Dresden, Germany 2015
Gideon Mendels, Erica Cooper, Victor Soto, Julia Hirschberg, Mark Gales, Kate Knill, Anton Ragni, Haipeng Wang

We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conversational telephone speech and obtain large reductions in out-of-vocabulary keywords.

Cross-Cultural Production and Detection of Deception from Speech

WMDD 2015
Sarah Ita Levitan, Guozhen An, Mandi Wang, Gideon Mendels, Julia Hirschberg, Michelle Levine and Andrew Rosenberg.

We describe the use of text data scraped from the web to augment language models for Automatic Speech Recognition and Keyword Search for Low Resource Languages. We scrape text from multiple genres including blogs, online news, translated TED talks, and subtitles. Using linearly interpolated language models, we find that blogs and movie subtitles are more relevant for language modeling of conversational telephone speech and obtain large reductions in out-of-vocabulary keywords.

Babler - Data Collection from the Web to Support Speech Recognition and Keyword Search

ACL 2016 WAC-X
Gideon Mendels, Erica Cooper, Julia Hirschberg.

We describe a system to collect web data for Low Resource Languages, to augment language model training data for Automatic Speech Recognition (ASR) and keyword search by reducing the Out-of-Vocabulary (OOV) rates–words in the test set that did not appear in the training set for ASR. We test this system on seven Low Resource Languages from the IARPA Babel Program: Paraguayan Guarani, Igbo, Amharic, Halh Mongolian, Javanese, Pashto, and Dholuo.

education

Columbia University in the City of New York

M.Sc. In Computer Science

Expected Graduation December 2016

Columbia University in the City of New York

Bachleor's in Computer Science

Magna Cum Laude
Dean's list every semester
GS Honor Society Membership

Location

contact info

Columbia University - Department of Computer Science
1214 Amsterdam Avenue
450 CS Building
New York, NY 10027

  • gm2597 _[at]_ columbia.edu