I will discuss a method for transferring annotation from a
syntactically annotated corpus (a so-called treebank) in a source
language to a target language. This approach assumes only that an
(unannotated) text corpus exists for the target language, and does not
require that the parameters of the mapping between the two languages
are known. This general probabilistic approach, based on Data
Augmentation, has applications beyond syntax and NLP for transferring
knowledge between closely related datasets. I will discuss the model
and the algorithmic challenges, and will present a novel algorithm for
sampling from certain posterior distributions over trees.
|