Generation-Heavy Hybrid Machine Translation (GHMT) is an asymmetrical
hybrid approach that addresses the issue of MT resource poverty in
source-poor/target-rich language pairs by exploiting available
symbolic and statistical target-language (TL) resources. This talk
presents a specific implementation of this approach where the expected
source-language (SL) resources include a syntactic parser and a simple
one-to-many translation dictionary. Expensive parallel resources,
such as transfer rules, complex interlingual lexicons, or even bitexts
are not used. Rich TL symbolic resources such as word lexical
semantics, categorial variations and subcategorization frames are used
to overgenerate multiple structural variations from a TL-glossed
syntactic dependency representation of SL sentences. This
SL-independent symbolic overgeneration accounts for possible
translation divergences, cases where the underlying concept or "gist"
of a sentence is distributed differently in two languages. The
overgeneration is constrained by multiple statistical TL models
including surface n-grams and structural n-grams. The first
implementation of this approach focused on Spanish-English MT. An
evaluation of this system will be presented together with issues with
ongoing work on retargeting to Chinese and Arabic.
|