Kanishk Vashisht (kv2295)
Sambhav Anand (sa3433)

We implemented a map reduce based algorithm to calculate word counts across a huge corpus (400 million words)
There were around 200 thousand unique words across the entire corpus.

We used amazon CD and Vinyl review data - http://jmcauley.ucsd.edu/data/amazon/
To make sure the work is split evenly across to make use of parallel threads we used the unix split function to split
the file into multiple chunks of 10000 line each. The complete uncompressed Data is not attached because of the large
size. Instructions on how to run this:

1)Go to the link http://jmcauley.ucsd.edu/data/amazon/
2)Download any of the review data files.
3)Uncompress the File and change the resulting file extension to .txt
4)Move the file into a directory called data in the same root structure as the project.
5)call the following Unix command: split -l 10000 DOWNLOADEDFILE.txt amazon_review
6)Delete DOWNLOADEDFILE.txt
7)Make sure the buildWordList part of the main Function isn't commented out
8)Run it using 'cabal run'

In total, the file data we used contains 400 million words (200k unique words). We saw a huge difference in run time
across different threads - more on this in the report.

We save the wordcounts to a file so that we can run the wordCompletion part without having to calculate wordCounts each
time.
