The corpus is supplied in binary encoded format that can be rendered to human-readable format by restoring to a running instance of a MongoDB database (available here). You will need to have the database binaries to restore this corpus.
Here, we provide information on how to restore the data to a human readable format and a few sample queries.
The compressed release contains BSON files of our Enron email corpus. To restore all the data, extract the file to any desired location e.g. /home/user/enron
Then run the following command to restore the database:
$ mongorestore --dbpath %DATABASE_DIRECTORY /home/user/enron
%DATABASE_DIRECTORY is where you want your database located.
You will have a couple of enron.X files in %DATABASE_DIRECTORY once this command has completed. You can now start your database with:
$ mongod --dbpath %DATABASE_DIRECTORY &
Connect to the mongo shell with:
$ mongo
You can now run commands from the shell. For example, to peruse an email:
$ use enron
$ db.emails.findOne()
To find the number of emails in the database:
$ db.emails.find().count()