NEW: Please note that you are expected to divide the message data yourself
into training and test sets (about 80% training/20% test is a good proportion).

Homework Submission Guidelines

The submission for hw3 should include the following files:
     1. your script for deriving unigram and bigram statistics
     2. the top 20 probability unigrams and the top 20 bigrams, both for your own corpus and  the class corpus.
     3. your readme should contain full  explanations  of what you did for  problem 1. "See the script explanations" will be worth 0 points.
     4.  as in 3., include discussion of the features you decided to use
     5. Submit the scripts for extracting the chosen features from the corpus. Submit the files with ripper features + the Ripper .hyp
file
    6. In the readme file, include a short analysis of the results.


                        $ tar cvf - . | compress | uuencode temp_file | Mail -s "submit cs4705 hw3" ani@cs.columbia.edu

After a short time you will get an automatic acknowledgement of your submission. Please note:
If you submit once, and then decide to submit again, then your second submission overwrites the first. All the files from your first submission will automatically be wiped out.  


Late Homework Policy

No late submissions will be accepted to HW3.