David Evans

Graduate Student, devans@cs.columbia.edu

Title: Steps wards Multilingual Newsblasting

Time: Thursday February 13, 12noon - 1pm

Place: CS Conference Room in MUDD

Abstract:

The general theme will be on integrating multilingual sources into Newsblaster.

In particular, I plan to talk about some recent work I've done on evaluating the multilingual version of SimFinder (SimFinderML) on Japanese data (3 sets of 5 articles each that a native speaker annotated for similarity) and some work that I am doing with a project student to test the new machine learning based "Article Extractor" with Russian news sites. I'll give an overview of the new Article Extractor, and talk about the performance of the new Russian trained models (vs. using models trained over English data for Russian article extraction.)

I'll then talk about ideas we have for integrating non-English data into Newsblaster, bring up a few issues we face in doing that (multilingual document clustering is the first) and talk about what I plan to do with summarizing non-English documents with our sentence extraction based summarizer.

I hope this meeting will be more interactive, because I'm interested in getting feedback about some of the ideas and directions that I'll be presenting. That should be ok for this meeting though, right?