E-mail: my first name [at] cs [dot] columbia [dot] edu
Office: 726 CEPSR Phone: 646-775-6027


Ranking in a Domain Specific Search Engine
CS6998-03 - NLP for the Web, Spring 2008, Semester Project
Sara Stolbach, ss3067 [at] columbia.edu

Final Report and Presentation

Report: [pdf]
Presentataion: [pdf] [ppt]
Search Engine:
web interface

Interim Report

Due: March 13th
Report: [pdf]
Data, Report, and Code: [tar.gz]

Corpus

Code

The code is included in the interim report (see above)
Javadocs: http://www1.cs.columbia.edu/~sara/nlpForWeb/doc/

Stats

Important Features

This is a sample of some of the important features in the clothing domain:

#featurefrequency#featurefrequency#featurefrequency#featurefrequency
1blue9902button8583pants8424white839
5men8236pink6617girls6158red609
9women2054

Initial Project Proposal

Due: February 7th
Proposal: [doc]

Resources Used