Setup

The corpus is supplied in binary encoded format that can be rendered to human-readable format by restoring to a running instance of a MongoDB database (available here). You will need to have the database binaries to restore this corpus.

Restore

Here, we provide information on how to restore the data to a human readable format and a few sample queries.

The compressed release contains BSON files of our Enron email corpus. To restore all the data, extract the file to any desired location e.g. /home/user/enron

Then run the following command to restore the database:

$ mongorestore --dbpath %DATABASE_DIRECTORY /home/user/enron

%DATABASE_DIRECTORY is where you want your database located.

You will have a couple of enron.X files in %DATABASE_DIRECTORY once this command has completed. You can now start your database with:

$ mongod --dbpath %DATABASE_DIRECTORY &

Connect to the mongo shell with:

$ mongo

You can now run commands from the shell. For example, to peruse an email:

$ use enron

$ db.emails.findOne()

To find the number of emails in the database:

$ db.emails.find().count()

Table Of Contents

Previous topic

Overview

Next topic

Database

This Page