Mona Talat Diab


Mona T. Diab
Center for Computational Learning Systems
Columbia University

Address: Interchurch Center
               475 Riverside Dr. MC 7717
               New York, NY 10115

office:    Room A
email:    mdiab AT cs DOT columbia DOT edu
phone:   +1 (212) 870 1290
fax:        +1 (212) 870 1285



Welcome to my NEW homepage!

I am a research scientist at Columbia University. I work at CCLS in the NLP group with Owen Rambow, Nizar Habash, and Martin Jansche.   

I finished my postdoctoral work at Stanford University in the linguistics department working with Daniel Jurafsky. I was also part of the Natural language Processing lab .

Before that, I worked in the Center for Spoken Language Research (CSLR) at the University of Colorado at Boulder for five months as a research associate after graduation, then I moved to Stanford, California in January of 2004. I worked with really nice people, Jim Martin, Kadri Hacioglu and Wayne Ward.

I finished my PhD in the University of Maryland, College Park, where I was in the linguistics department and was part of the CLIP lab in the University of Maryland Institute of Advanced Computer Studies . I worked under the supervision of a great advisor Philip Resnik. My thesis, defended in May 2003, is titled Word Sense Disambiguation within a Multilingual Framework.

Earlier on, 1995-1997, I earned an MSc. degree in Artificial Intelligence (Machine Learning) from the George Washington University under the supervision of Professor Peter Bock.

Here is my CV.

If you are interested, here is my research statement and my teaching Statement

Research Interests

My main research area is statistical natural language processing. I am specifically involved in computational semantics, Arabic computational linguistics, semantic processing and machine learning.

I am interested in cross linguistic similarities and divergences in language use and how these types of relations can be exploited to solve some of the language processing problems.


  • Diab, Mona. Relieving the data acquisition bottleneck for Word Sense Disambiguation. Proceedings of ACL 2004.[pdf].
  • Diab, Mona, Kadri Hacioglu and Daniel Jurafsky. Automatic Tagging of Arabic Text: From raw text to Base Phrase Chunks. Proceedings of HLT-NAACL 2004.[pdf].
  • Diab, Mona. The Feasibility of Bootstrapping an Arabic WordNet leveraging Parallel Corpora and an English WordNet. Proceedings of the Arabic Language Technologies and Resources, NEMLAR, Cairo 2004. [pdf].
  • Diab, Mona. An Unsupervised Approach for bootstrapping Arabic Sense Tagging. Proceedings of Arabic Script Based Languages Workshop, Coling 2004.[pdf].
  • Diab, Mona and Philip Resnik, An Unsupervised Method for Word Sense Tagging using Parallel Corpora, Proceedings of ACL, 2002.[ps].
  • Diab, Mona. An Unsupervised Method for Word Sense Tagging using Parallel Corpora: A Preliminary Investigation. Special Interest Group in Lexical Semantics (SIGLEX) Workshop, Association for Computational Linguistics, 2000.[pdf].
  • Diab, Mona and Steven Finch. A Statistical Word-Level Translation Model for Comparable Corpora. Proc. of Conference on Content-based Multimedia Information Access (RIAO2000), 2000.[ps].
  • Resnik, Philip and Mona Diab, Measuring Verb Similarity, Cognitive Science Society (CogSci2000), 2000.[pdf].
  • Dorr, Bonnie, Gina Levow, Douglas Oard, Philip Resnik, Amy Weinberg, Mona Diab, Maria Katsova. MADLIBS: An Event Translingual Lexical Conceptual Structure Based Information Retrieval System. North American Association for Computational Linguistics, NAACL 2000.
  • Resnik, Philip, Mari B. Olsen and Mona Diab, The Bible as a Parallel Corpus: Annotating the `Book of 2000 Tongues', Computers and the Humanities, 33(1-2), 1999.
  • Diab, Mona, John Schuster and Peter Bock. A Preliminary Statistical Investigation into the impact of an N-Gram Analysis Approach based on Word Syntactic Categories toward Text Author Classification, Proc. of 6th International Conference on Artificial Intelligence & Applications, Egypt 1998 [ps].
  • Riopka, Terry, Mona Diab and Peter Bock. Quantifying and Interpreting the Effect of Intelligent Information. Proc. of 6th International Conference on Artificial Intelligence & Applications, Egypt 1998 [ps].


·        When I was at Stanford, we developed a set of Arabic basic processing tools in conjunction with our NAACL'04 [paper].

·        The tools utilize the Yamcha SVM tools to tokenize, POS tag and Base Phrase Chunk Arabic text.

·        You may download our tarred and compressed -- (41.6mb) [package].

·        The tools are compiled for a linux platform. For questions or comments contact me.

Last updated on August 8th, 2005 By Mona T Diab

My says that you are visitor number