Learning Constraint-based Grammars using a Domain Ontology

Speaker Name:	Smaranda Muresan
Speaker Info:	Graduate Student, NLP Group; smara@cs.columbia.edu
Date:	Thursday February 3
Time:	10:30am-11:30pm
Location:	Computer Science Conference Room (MUDD)

Abstract:

Language understanding is an intrinsic component of many Natural Language Processing applications, such as question answering, text mining, machine translation and text summarization. Nevertheless, the majority of the state-of-the-art systems deployed for these applications use little, if any actual "understanding". This is because there are several challenges that need to be addressed: what representation is appropriate to encode the complexity of natural language semantics, but it is also simple enough to allow inferencing? What properties should the grammar have to be learnable? What learning paradigm is needed?
In my thesis I have focused on developing a unified approach that captures and investigates these questions. In this talk I will present a new type of constraint-based grammars, which capture both aspects of syntax and semantics, having an ontology-based semantic interpretation as a constraint at the grammar rule level. These grammars, lexicalized well-founded grammars, are learnable from a small number of positive, semantically annotated examples. I will discuss the semantic representation, the grammar properties and formalism, and the relational learning algorithm together with the type of the annotated data required for the grammar induction. I will end the talk with a discussion of linguistic relevance and the presentation of an application context: acquisition of medical terminological knowledge from text.