Main Page > Standalone Approach

Spam Analysis and Reputation Project

2. Standalone Approach

Design

In Standalone approach, email messages from the existing mailboxes were used for analysis. A Graphical User Interface was used to perform tests. The user is asked to select and categorize the folders in the mailbox as spam folders, non spam folders and sent folders. Then all the message headers of the messages from the selected folders were retrieved using IMAP and the tests were performed on those messages to obtain corresponding results. The tests were performed on 22 different mailboxes from Gmail and Cubmail (Columbia University's Email Service) which consisted about 25,000 non spam messages and about 3000 spam messages.

Following are the statistics of the mailboxes that were used to perform tests:

MailBox # MailBox # of Mails # of Non Spam mails # of Spam mails
1 aditi_columbia 1818 1818 0
2 aditi_gmail 593 497 96
3 deepti_columbia 1174 1174 0
4 deepti_gmail 641 576 65
5 dhrumin_gmail 5105 5002 103
6 pinank_gmail 1682 1418 264
7 Preetinarayan_columbia 1230 1230 0
8 Preetinarayan_gmail 1992 1788 204
9 sneha_gmail 360 133 227
10 spinank_gmail 879 524 355
11 vasa_columbia 168 168 0
12 dms2169_columbia 1322 1301 21
13 nirav_gmail 1408 1360 48
14 nns_2108 934 934 0
15 manish_gmail 459 414 45
16 pragni_gmail 2183 1999 184
17 preetimalik_columbia 527 527 0
18 preetimalik_gmail 380 380 0
19 sak2144 749 749 0
20 shradha_columbia 140 140 0
21 shradha_gmail 1522 1151 371
22 vasa_gmail 3316 2370 946
Total 28582 25653 2929

Following are the tests and observations that were run on the above mailboxes and will be discussed in this report:

2.1 Email Source Analysis

2.2 Attachment Analysis

Next:
Email Source Analysis


Last updated: 2008-08-19 by Nirav Shah