Clairlib: a perl library for NLP, IR, and graph analysis

talk by Dragomir Radev, Nov. 30



INTRODUCTION

The University of Michigan's CLAIR (Computational Linguistics And
Information Retrieval) group (http://tangra.si.umich.edu/clair) is
happy to present the second release of clairlib, the Clair library.

The Clair library is written in Perl and is intended to simplify a
number of generic tasks in Natural Language Processing (NLP),
Information Retrieval (IR), and Lexical Network Analysis. Its
architecture also allows for external software to be plugged in with
very little effort.

Clairlib features a tiered architecture with a core shared by all
applications and subject-specific libraries (currently in political
science and bioinformatics).

FUNCTIONALITY

Native: Tokenization, Summarization, LexRank, Biased LexRank, Document
Clustering, Document Indexing, PageRank, Biased Pagerank, Web Graph
Analysis, Bioinformatics Text Analysis, Political Science Text
Analysis, Network Building, Power Law Distribution Analysis, Network
Analysis and Computation (Watts-Strogatz Clustering Coefficient,
Cosines, Random Walks), Tf, Idf

Imported: Stemming, Sentence Segmentation, Web Page Download, Web
Crawling, XML Parsing, XML Tree Building, XML Writing

FUNDING

This work has been supported in part by grants R01 LM008106
"Representing and Acquiring Knowledge of Genome Regulation" and U54
DA021519 "National center for integrative bioinformatics", both from
the National Institutes of Health as well as grants IDM 0329043
"Probabilistic and link-based Methods for Exploiting Very Large Textual
Repositories" and DHB 0527513 "The Dynamics of Political Representation
and Political Rhetoric," both from the National Science Foundation.

ABOUT

The Clair Library is developed by the Clair group at the University of
Michigan. It encompasses the functionality of MEAD and perltree, two
of CLAIR's earlier releases.

Project design: Dragomir R. Radev
Main implementers: Anthony Fader, Mark Hodges, and Dragomir R. Radev

Additional code by: Timothy Allison, Michael Dagitses, Aaron Elkiss,
Gunes Erkan, Scott Gifford, Mark Joseph, Samuela Pollack, and Adam
Winkel