The following is a description of this database release of the Enron email corpus. This database was created by leveraging Arbre, a python framework designed to serve as an intermediary between users and the databases in which they store their datasets. Arbre contains general models which can be inherited from and extended for a wide variety of applications while providing the benefits of a consistent, common foundation.
The goal of this release is to maximize interoperability of analytical tools across the Enron corpus and other corpora by establishing a uniform terminology and format for representing their common features. Doing so should minimize the need for one to familiarize one's self with the idiosyncracies of a particular dataset before being able to perform useful analysis on it.
For a description of this corpus, please see:
Agarwal, Apoorv and Omuya, Adinoyi and Harnly, Aaron and Rambow, Owen. A Comprehensive Gold Standard for the Enron Organizational Hierarchy. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), July 2012, Jeju Island, Korea.
We ask that you cite this paper if you use the corpus.
This work was supported by the National Science Foundation under grant IIS-0713548.