Collocations and Lexical Resources for Natural Language Processing

talk by Margarita Alonso Ramos, Universidade da Coruña (Spain)


The aim of this talk is to present how collocations can be incorporated into lexical resources commonly used in Natural Language Processing. Nowadays, it is increasingly acknowledged within the NLP community that collocations are necessary for applications such as Machine Translation, Text Generation, Computer Assisted Language Learning, and Information Retrieval. However, most of the common lexical resources are still far from being sufficiently refined with respect to collocations. In this talk, we present how EuroWordNet and FrameNet can be enriched by a collocation database currently under construction (Diccionario de colocaciones del español, DiCE). We begin by introducing the concept of collocation and the theoretical instrument used in our framework for its representation, the lexical functions. Then, we show examples of the use of collocations in different NLP applications. Finally, we present DiCE and the ways we link it to EuroWordNet and FrameNet.


Towards Automatic Summarization of Patent Documentation

talk by Leo Wanner, ICREA and Universitat Pompeu Fabra (Spain)


Almost no written documentation is as opaque and hard to understand, and at the same time as important, as patent documentation. It is thus not surprising that patent (application) summaries of the most important features and novelty of the invention in question are of high demand. So far, the summaries are written manually by highly qualified specialists. One of the goals of the EC-funded PATExpert project is to develop a multilingual automatic patent summarization service. In our talk, we will first discuss the specifics of the patent documentation, which prevent the application of certain popular extraction-oriented summarization techniques. Then, we present some surface-oriented and deep summarization techniques being worked on in PATExpert. The focus will be on surface-oriented summarization which is based on syntactic dependency and discourse structure criteria and which relies upon a prior simplification/paraphrasing of the material in question.