Thumbs up? Sentiment Classification using Machine Learning Techniques
Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan.
Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 79--86, 2002.

Abstract: We consider the problem of classifying documents not by topic, but by overall sentiment, e.g., determining whether a review is positive or negative. Using movie reviews as data, we find that standard machine learning techniques definitively outperform human-produced baselines. However, the three machine learning methods we employed (Naive Bayes, maximum entropy classification, and support vector machines) do not perform as well on sentiment classification as on traditional topic-based categorization. We conclude by examining factors that make the sentiment classification problem more challenging.

Paper formats: ps, pdf, other

Data: http://www.cs.cornell.edu/people/pabo/movie-review-data/

BibTeX entry:

@InProceedings{Pang+Lee+Vaithyanathan:02a,
  author =       {Bo Pang and Lillian Lee and Shivakumar Vaithyanathan},
  title =        {Thumbs up?  {Sentiment} Classification using Machine Learning Techniques},
  booktitle =    "Proceedings of the 2002 Conference on Empirical Methods in Natural
Language Processing (EMNLP)",
  pages = {79--86},
  year =         2002
}


Back links: Lillian Lee's home page or papers page; Cornell NLP page.