Speaker Name: | Smaranda Muresan |
Speaker Info: | Graduate Student, NLP Group; smara@cs.columbia.edu |
Date: | Thursday February 3 |
Time: | 10:30am-11:30pm |
Location: | Computer Science Conference Room (MUDD) |
Abstract:
Language understanding is an intrinsic component of many Natural Language
Processing applications, such as question answering, text mining, machine
translation and text summarization. Nevertheless, the majority of the
state-of-the-art systems deployed for these applications use little, if
any actual "understanding". This is because there are several
challenges that need to be addressed: what representation is appropriate
to encode the complexity of natural language semantics, but it is also
simple enough to allow inferencing? What properties should the grammar
have to be learnable? What learning paradigm is needed?
In my thesis I have focused on developing a unified approach that captures and investigates these questions. In this talk I will present a new type of constraint-based grammars, which capture both aspects of syntax and semantics, having an ontology-based semantic interpretation as a constraint at the grammar rule level. These grammars, lexicalized well-founded grammars, are learnable from a small number of positive, semantically annotated examples. I will discuss the semantic representation, the grammar properties and formalism, and the relational learning algorithm together with the type of the annotated data required for the grammar induction. I will end the talk with a discussion of linguistic relevance and the presentation of an application context: acquisition of medical terminological knowledge from text. |