## MapReduce Word Frequency Search (WFS)
COMS W4995 Parallel Functional Programming Project

Patricia Luc (pbl2116), Arush Sarda (as6785), Sarah Yang (sy3038)

### To Build and Run
Install depedencies:
```stack install parallel```

Build executables:
```stack ghc -- -O2 -Wall -rtsopts -threaded [program-name]```

To run wfsSequential, wfsParallel, wfsParallelSeparateSteps:
```./[program-name] 100-0.txt +RTS -N1 -s -ls```

To run wfsSequentialTest, wfsParallelTest:
```./[program-name] 100-0.txt test/[test-file] +RTS -N4 -s -ls```

Note: ```[program-name]``` can be: wfsSequential, wfsSequentialTest, wfsParallel,
wfsParallelTest, wfsParallelSeparateSteps

### Code Listing
- wfsSequential.hs
   - Sequential implementation with user-inputted search words
- wfsSequentialTest.hs
   - Sequential implementation with test file inputted as search words
- wfsParallel.hs
   - Parallel implementation with user-inputted search words
- wfsParallelTest.hs
   - Parallel implementation with test file inputted as search words
- wfsParallelSeparateSteps.hs
   - Different implmentation of word frequency parallelization that yields worse performance
   - Each step of MapReduce (data cleaning, map, reduce) paralellized separately

- test/testEasy.txt
   - contains 50 words that exist in the input file, tests MapReduce word count only
- test/testMixed.txt
   - contains 25 words that exist in the input file, and 25 words that do not
- test/testHardShort.txt
   - contains 50 words under 15 characters each that do not exist in the input file
- test/testHardLong.txt
   - contains 50 90-character words that do not exist in the input file

### Known Bugs
In the case that the search word does not exist in the input file, there are sometimes different closest words outputted from the sequential and parallel versions. This is due to the difference in how Levenshtein distance ties are broken in the two versions.
