Treebank Transfer

Speaker Name:	Martin Jansche
Speaker Info:	Research Scientist, CCLS; jansche@cs.columbia.edu
Date:	Thursday December 1
Time:	11:30am-12:30pm
Location:	CCLS Conference Room (Interchurch)

Abstract:

I will discuss a method for transferring annotation from a syntactically annotated corpus (a so-called treebank) in a source language to a target language. This approach assumes only that an (unannotated) text corpus exists for the target language, and does not require that the parameters of the mapping between the two languages are known. This general probabilistic approach, based on Data Augmentation, has applications beyond syntax and NLP for transferring knowledge between closely related datasets. I will discuss the model and the algorithmic challenges, and will present a novel algorithm for sampling from certain posterior distributions over trees.