Speaker Name: | Smaranda Muresan |
Speaker Info: | PhD student, NLP Group; smara@cs.columbia.edu |
Date: | Thursday October |
Time: | 11:30am-12:30pm |
Location: |
Abstract:
The question "What does it mean to learn language?" is one of the
greatest topics of scientific inquiry. The problem has attracted many
researchers in linguistics, computer science, and cognitive science,
and attempts to answer this question vary greatly from discipline to
discipline. In my thesis, I take language learning as a grammar
learning problem. The grammar encodes both syntax and semantics and an
ontology is used during learning to provide access to meaning.
In this talk, I will present a new computationally efficient model for language learning, called Grammar Approximation by Representative Sublanguage (GARS). In this model, the language is taken to be a set of strings together with their syntactic-semantic representations. The learner is presented with a set of positive representative examples of the target language, and an additional set of positive examples used for generalization, which we refer to as a representative sublanguage. The task of the learner is to induce a grammar that generates the target language. Constraint-based grammar formalisms have been widely used to capture natural language. We defined a new type of constraint-based grammars, Lexicalized Well-Founded Grammars (LWFG), which are always learnable under the GARS model, i.e., the learning always converges to the target grammar. We have shown that the search space is a grammar lattice and have provided polynomial algorithms for grammar induction, proving they are correct. |