Extracting and Merging Glossaries from the Web

Speaker Name: Min San Co
Speaker Info: Masters Student; mtc38@cs.columbia.edu
Date: Thursday September 30th
Time: 11:30am-12:30pm
Location: CS Conference Room

Abstract:
Glossaries are excellent sources of definitions for technical and domain-specific terms. The web contains an abundance of glossaries. In this talk, I will characterize the structure of the web's glossary space. I will introduce MetaGloss, a system that generates a meta-glossary by aggregating glossaries from different internet domains. The main benefit of MetaGloss is that it provides users with a single coherent view of a collected terminology. I will describe the MetaGloss system that automatically crawls, identifies and extracts glossaries from the web. I will also show how boosting is used in identifying glossaries.