Ph.D. Computer Science, September 1997. Stanford
University, Stanford, California.
Dissertation: Querying Multiple Document Collections
across the Internet (advisor: Hector Garcia-Molina).
M.S. Computer Science, March 1994. Stanford University,
Stanford, California.
B.S. Computer Science ("Licenciatura en Informática"), July 1991.
Escuela Superior Latinoamericana de Informática (ESLAI), Argentina.
Professional Employment
July 2002 - Present: Associate Professor. Computer Science Department,
Columbia University, New York City, New York
September 1997 - June 2002: Assistant Professor. Computer
Science Department, Columbia University, New York City,
New York
January - August 2001: Senior Research Scientist. Google
Inc., Mountain View, California (on leave from
Columbia University)
August 2000; July 2002: Consulting Researcher. Microsoft Research,
Redmond, Washington
June - July 2000: Academic Consultant. Google Inc.,
Mountain View, California
July 2000: Consultant. Gigabeat Inc., Palo Alto,
California
July - August 1999: Visiting Professor. Computer Science
Department, University of Buenos Aires, Argentina
June - July 1999: Consulting Researcher. Microsoft
Research, Redmond, Washington
1992 - 1997: Research Assistant. Computer Science
Department, Stanford University, Stanford, California
1995 - 1996: Research Intern. Hewlett-Packard
Laboratories, Palo Alto, California
1995: Supplemental Research Associate. IBM Almaden
Research Center, San Jose, California
July 1994: Instructor. Computer Science Department,
University of Buenos Aires, Argentina
1990 - 1992: Resident Researcher. IBM Argentina, Buenos
Aires, Argentina
1990 - 1992: Student Visitor. IBM Almaden Research
Center, San Jose, California
Honors and Awards
"Best Paper" Award, 2006 ACM SIGMOD International Conference
on Management of Data (SIGMOD 2006), 2006
"Best Paper" Award, 21st IEEE International Conference on Data
Engineering (ICDE 2005), 2005
CAREER Award, National Science Foundation (NSF),
1998
"Most Original Paper" Award, International
Conference on Parallel Processing (ICPP '92), 1992
Grants and Gifts
Google Research Award. Google Desktop Meets DejaView: Display-Centric
Desktop Search, with Jason Nieh. $70,000 for September 2009-August 2010.
Yahoo! Faculty Research and Engagement Gift. User-Specific Extraction
of Entity Lists and Attributes. $7,500 for 2009-2010.
NSF. III-COR-Small: Beyond Keyword Search: Enabling Diverse
Structured Query Paradigms over Text Databases, IIS-08-11038. $448,976
for September 2008-August 2011.
Google Research Award. Searching for Events of All Sizes, Everywhere,
with Hila Becker. $72,779 for September 2008-August 2009.
Microsoft Research, Data Management, Exploration, and
Mining Group. Unrestricted Cash Gift. $35,000 for 2007-2008; $35,000 for 2006-2007; $35,000 for 2005-2006; $35,000 for 2004-2005; $50,000 for 2003-2004; $50,000 for 2002-2003; $50,000 for 2001-2002;
$50,000 for 2000-2001.
NSF. KDD: Tools for Monitoring Online Information Sources, with
K. McKeown, J. Hirschberg, and O. Rambow. Started in June 2002 (part of
project through December 2004). $695,000 for Year 1, $500,000 for
Year 2.
NSF. DLI-Phase 2: A Patient Care Digital Library:
Personalized Retrieval and Summarization of Multimedia
Information, IIS-98-17434, with K. McKeown et al.$5,002,375
for 1999-2004.
Lucent Technologies. Research Grant. $20,000 for 2000-2001.
NSF. Digital Government: The CARDGIS Energy Data
Collection, EIA-98-76739, with S. Stolfo et al. $1,631,623
for 1999-2002 (part of project from September 1999
through August 2000).
Microsoft Research. Unrestricted Cash Gift. $24,000 for
1999-2000.
Microsoft Research, Database Group. Unrestricted Cash
Gift. $42,000 for 1999-2000.
NSF. CAREER: Querying Information Sources Across the
Internet, IIS-97-33880. $299,985 for September 1998-August
2002.
NSF. An Environment for Illustrated Briefing and
Follow-up Search Over Live Multimedia Information,
IRI-96-19124, with K. McKeown and S.-F. Chang. $732,056
for March 1997-February 2000 (joined on-going project in
September 1997).
Patents
String Predicate Selectivity Estimation, S. Chaudhuri, V. Ganti,
and L. Gravano, United States Patent 7,149,735, issued December 12, 2006
Systems and Methods for Using Anchor Text as Parallel Corpora for
Cross-Language Information Retrieval, L. Gravano and M. Henzinger,
United States Patent 7,146,358, issued December 5, 2006
Method of Building Multidimensional Workload-Aware Histograms, S.
Chaudhuri, N. Bruno, and L. Gravano, United States Patent 7,007,039, issued
February 28, 2006
Method for Cost-Based Optimization over Multimedia
Repositories, S. Chaudhuri and L. Gravano, United
States Patent 5,806,061, issued September 8, 1998
Method of Packet Routing in Torus Networks with Two
Buffers per Edge, R. Cypher and L. Gravano, United
States Patent 5,444,701, issued August 22, 1995
Editorships
Associate Editor, ACM Transactions on Database
Systems, 2004-Present
Co-editor, SIGMOD Record Special Issue on "Managing Information
Extraction" (Editor: AnHai Doan), vol. 37, no. 4, December 2008
Associate Editor, ACM Transactions on Information
Systems, 1997-2005
Associate Editor, IEEE Data Engineering Bulletin, 2000-2002
Editor, special issue
on "Text and Databases," Vol. 24, No. 4,
December 2001
Editor, special issue on "Next-Generation
Web Search," Vol. 23, No. 3, September 2000
Associate Editor, ACM SIGMOD Digital Symposium
Collection (DiSC), 1998-2001
Program Committees
2010: 26th IEEE International Conference on Data Engineering (IEEE
ICDE 2010: Tutorials/Seminars Co-Chair)
2009: ACM SIGMOD International Conference on Management
of Data (ACM SIGMOD 2009), 18th International World Wide Web Conference
(WWW 2009, Search Track), 12th International Workshop on the Web and
Databases (WebDB 2009), VLDB 2009 Ph.D. Workshop
2008: 24th IEEE International Conference on Data Engineering (IEEE ICDE
2008: Program Committee Vice-Chair, for “Web Search and Deep Web” area;
member of Best Research Paper Selection Committee), 34th
International Conference on Very Large Databases (VLDB 2008)
2007: 33rd International Conference on Very Large Databases (VLDB 2007)
2006: ACM SIGMOD International Conference on Management
of Data (ACM SIGMOD 2006), Workshop on Information Integration (IIWorkshop)
2005: ACM SIGMOD International Conference on Management
of Data (ACM SIGMOD 2005)
2004: 7th International Workshop on the Web and Databases (WebDB 2004: Co-Chair), 13th ACM Conference on
Information and Knowledge Management (ACM CIKM 2004: Program Committee
Co-Chair, Databases), 20th IEEE International Conference on Data
Engineering (IEEE ICDE 2004), 13th International
World Wide Web Conference (WWW 2004), 27th International ACM SIGIR Conference on Research and
Development in Information Retrieval (ACM SIGIR 2004)
2003: 26th International ACM SIGIR Conference on Research and
Development in Information Retrieval (ACM SIGIR 2003), 12th ACM Conference on Information and Knowledge Management (ACM CIKM 2003), ACM
SIGIR 2003 Workshop on Distributed Information Retrieval
2002: 28th International Conference on Very Large Databases (VLDB 2002), ACM SIGMOD International Conference on Management
of Data (ACM SIGMOD 2002), 5th International Workshop on the Web and
Databases (WebDB 2002)
2001: 24th International ACM SIGIR Conference on Research
and Development in Information Retrieval (ACM SIGIR 2001),
17th IEEE International Conference on Data Engineering (IEEE ICDE 2001)
2000: 26th International Conference on Very Large
Databases (VLDB 2000), 16th IEEE International Conference
on Data Engineering (IEEE ICDE 2000), IEEE Advances in
Digital Libraries (IEEE ADL 2000)
1999: ACM SIGMOD International Conference on Management
of Data (ACM SIGMOD'99), 15th IEEE International
Conference on Data Engineering (IEEE ICDE'99), 22nd
International ACM SIGIR Conference on Research and
Development in Information Retrieval (ACM SIGIR'99), 4th
IFCIS Conference on Cooperative Information Systems (CoopIS'99),
Symposium on Software Technology (SoST'99)
1998: 3rd ACM International Conference on Digital
Libraries (ACM DL'98), Symposium on Software Technology (SoST'98)
1997: Symposium on Software Technology (SoST'97)
Invited Talks
2009: Querying Text Databases and the Web: Beyond Traditional Keyword
Search, June 2009, keynote talk at the First International Workshop on
Keyword Search on Structured Data (KEYS 2009), Providence, Rhode Island
2008: Information Extraction Over Text Databases: What's Ranking Got
To Do With It?, April 2008, keynote talk at the Second International
Workshop on Ranking in Databases (DBRank 2008), Cancun, Mexico;
Information Extraction Over Text Databases: What's Ranking Got To Do With
It?, July 2008, Pragma Consultores, Buenos Aires, Argentina
2004:
Hidden-Web Databases: Classification and Search, March 2004, Polytechnic
University, Brooklyn, New York
2003:
Hidden-Web Databases: Classification and Search, August 2003, IBM T. J.
Watson Research Center, Hawthorne, New York;
Hidden-Web Databases: Classification and Search, April 2003, University
of Waterloo, Waterloo, Ontario, Canada
2002:
Hidden-Web Databases: Classification and Search, December
2002, Stern School of Business, New York University, New York City, New
York;
Hidden-Web Databases: Classification and Search, November 2002,
Lucent-Bell Labs, Murray Hill, New Jersey;Text- and Web-Database
Research at Columbia University, July 2002, Microsoft Research, Redmond,
Washington; Web Mining Meets Web Search,
June 2002, keynote talk at the ACM SIGMOD Workshop on Research Issues in
Data Mining and Knowledge Discovery (DMKD 2002), Madison, Wisconsin;
Hidden-Web Databases: Classification and Search,
March 2002, University of Pennsylvania, Philadelphia,
Pennsylvania
2001: Characterizing Web Resources for Improved
Searching and Browsing, April 2001, IBM Almaden
Research Center, San Jose, California; Characterizing
Web Resources for Improved Searching and Browsing,
March 2001, Hewlett-Packard Laboratories, Palo Alto,
California; Probe, Count, and Classify: Categorizing
Hidden-Web Databases, February 2001, Google Inc.,
Mountain View, California
2000: Characterizing Web Resources for Improved
Searching and Browsing, September 2000, Princeton
University, Princeton, New Jersey; Computing
Geographical Scopes of Web Resources, and Other
Classification Problems, June 2000, Google Inc.,
Mountain View, California
1999: Internet Query Processing, December 1999,
NEC Research Institute, Princeton, New Jersey; Query
Processing and Data Quality over the Internet, with Martina
Marré, ECI'99, July 1999, University of Buenos Aires,
Argentina
1997: Searching over Autonomous Text Sources,
FedWeb'97, October 1997, Bethesda, Maryland
Invited Panels and Working Groups
Information Fusion in Counter-Terrorism. Participant and invited
speaker in workshop organized by the National Research Council's Computer
Science and Telecommunications Board, Washington D.C., June 2002
How Agencies and Universities Are Collaborating on
Research to Solve a Federal Statistical Data Integration
Problem. Panel in FedWeb 2000: Meeting the Growing
Demand for Government E-Service, Bethesda, Maryland, May
2000
Integrating Information Retrieval and Databases in the
WWW, Internet, Wireless Era. Panel in the 1999 NSF
Information and Data Management Workshop: Research Agenda
for the 21st Century, Los Angeles, California, March 1999
Resource Indexing and Discovery In a Globally
Distributed Digital Library. Second meeting of one of
five NSF-EU Digital Library Collaboratory Working Groups,
Washington DC (one of five participants from the US),
February 1998
Resource Indexing and Discovery In a Globally
Distributed Digital Library. First meeting of one of
five NSF-EU Digital Library Collaboratory Working Groups,
Budapest, Hungary (one of five participants from the US),
November 1997
InfoBus: Experience in Linking Heterogeneous Systems.
Panel in the 2nd ACM International Conference on Digital
Libraries (ACM DL'97), Philadelphia, Pennsylvania, July
1997
Other Professional Activities
North East DB/IR Day Workshop, April 18, 2008: Chaired one-day workshop,
bringing together database and information retrieval researchers in
the Northeastern United States.
Grant Panels: NSF, 1998; NSF, 2001; NSF, 2005; NSF, 2009
Journal Article Reviews (in addition to editorial
boards): Information Retrieval, 2004; Journal of Computer and System Sciences, 2001;
International Journal on Digital Libraries, 2000; ACM
Computing Surveys, 2000, 2002; VLDB Journal, 1999; ACM
Transactions on Database Systems, 1997; ACM Transactions
on Information Systems, 1997.
Conference Paper Refereeing (in addition to
program committees): 25th International Conference on
Very Large Databases (VLDB'99), ACM 1997 SIGMOD
International Conference on Management of Data (ACM
SIGMOD'97), ACM 1996 SIGMOD International Conference on
Management of Data (ACM SIGMOD'96), 1st ACM International
Conference on Digital Libraries (ACM DL'96), 21st
International Conference on Very Large Databases (VLDB'95).
STARTS Informal Standards Effort, 1996-1997:
Coordinated an informal standards effort for Internet
searching, which involved Netscape, Microsoft Network,
Infoseek, Fulcrum, and Verity, among others, and produced
STARTS, the Stanford Protocol Proposal for Internet
Retrieval and Search. Organized a workshop at Stanford
with 40 representatives of the participating
organizations to agree on the final proposal.
Student Representative to the Stanford Computer Forum
Committee, 1995-1996. (The Stanford Computer Forum is
an industrial affiliate program at Stanford University.)
Co-chaired a poster session at the Forum's annual
meeting, showing Computer-Science projects at Stanford.
Papers in Refereed Journals
Classification-Aware Hidden-Web Text Database Selection, P.
Ipeirotis and L. Gravano, in ACM Transactions on Information
Systems, vol. 26, no. 2, art. 6 (66 pages), Mar. 2008.
Towards a Query Optimizer for Text-Centric Tasks, P. Ipeirotis,
E. Agichtein, P. Jain, and L. Gravano, in ACM Transactions on
Database Systems, vol. 32, no. 4, art. 21 (46 pages), Nov. 2007.
Modeling and Managing Changes in Text Databases, P.
Ipeirotis, A. Ntoulas, J. Cho, and L. Gravano, in ACM Transactions
on Database Systems, vol. 32, no. 3, art. 14 (38 pages), Aug. 2007.
Optimizing Top-k Selection Queries over Multimedia Repositories,
S. Chaudhuri, L. Gravano, and A. Marian, in IEEE
Transactions on Knowledge and Data Engineering, vol. 16, no. 8, pages
992-1009, Aug. 2004.
Evaluating Top-k Queries over Web-Accessible Databases, A.
Marian, N. Bruno, and L. Gravano, in ACM
Transactions on Database Systems, vol. 29, no. 2, pages 319-362, June 2004.
Learning to Find Answers to Questions on the Web, E. Agichtein,
S. Lawrence, and L. Gravano, in ACM Transactions on
Internet Technology, vol. 4, no. 2, pages 129-162, May 2004.
QProber: A System for Automatic Classification of Hidden-Web
Databases, L. Gravano, P. Ipeirotis, and M. Sahami, in ACM Transactions on Information Systems,
vol. 21, no. 1, pages 1-41, Jan. 2003.
Top-k Selection Queries over Relational Databases: Mapping Strategies
and Performance Evaluation, N. Bruno, S. Chaudhuri, and L. Gravano,
in ACM Transactions on Database Systems, vol. 27, no. 2, pages 153-187, Jun. 2002.
GlOSS: Text-Source Discovery
over the Internet, L. Gravano, H. Garcia-Molina, A.
Tomasic, in ACM Transactions on Database Systems, vol. 24,
no. 2, pages 229-264, Jun. 1999.
The Stanford Digital Library
Metadata Architecture, M. Baldonado, C.-C. K. Chang,
L. Gravano, and A. Paepcke, in International Journal on
Digital Libraries, vol. 1, no. 2, pages 108-121, Sep. 1997.
Data Structures for Efficient
Broker Implementation, A. Tomasic, L. Gravano, C.
Lue, P. Schwarz, and L. Haas, in ACM Transactions on
Information Systems, vol. 15, no. 3, pages 223-253, Jul. 1997.
Storage-Efficient, Deadlock-Free
Packet Routing Algorithms for Torus Networks, R.
Cypher and L. Gravano, in IEEE Transactions on Computers,
vol. 43, no. 12, pages 1376-1385, Dec. 1994.
Requirements for Deadlock-Free,
Adaptive Packet Routing, R. Cypher and L. Gravano, in
SIAM Journal on Computing, vol. 23, no. 6, pages 1266-1274, Dec. 1994.
Adaptive Deadlock- and Livelock-Free
Routing with All Minimal Paths in Torus Networks, L.
Gravano, G. Pifarre, P. Berman, and J. Sanz, in IEEE
Transactions on Parallel and Distributed Systems, vol. 5,
no. 12, pages 1233-1251, Dec. 1994.
Adaptive Deadlock- and Livelock-Free
Routing in the Hypercube Network, G. Pifarre, L.
Gravano, G. Denicolay, J. Sanz, in IEEE Transactions on
Parallel and Distributed Systems, vol. 5, no. 11, pages 1121-1139, Nov.
1994.
Fully Adaptive Minimal Deadlock-Free
Packet Routing in Hypercubes, Meshes, and Other Networks:
Algorithms and Simulations, G. Pifarre, L. Gravano, S.
Felperin, and J. Sanz, in IEEE Transactions on Parallel
and Distributed Systems, vol. 5, no. 3, pages 247-263, Mar. 1994.
Book Chapter
XML & Data Streams, N. Bruno, L. Gravano, N. Koudas, and D.
Srivastava. Chapter 4 in "Stream Data Management," edited by N.
Chaudhry, K. Shaw, and M. Abdelguerfi, Series: Advances in Database Systems,
Volume 30, pages 59-81, Springer, 2005.
Papers in Refereed Conferences
Join Optimization of Information Extraction Output: Quality Matters!,
A. Jain, P. Ipeirotis, A. Doan, and L. Gravano, in Proc. of
the 25th IEEE International Conference on Data Engineering (ICDE 2009),
pages 186-197, 2009
(16.8% accepted in "long paper" category).
Answering General Time-Sensitive Queries, W. Dakka, L. Gravano,
and P. Ipeirotis, in Proc. of the 17th ACM Conference on Information and
Knowledge Management (CIKM 2008), pages 1437-1438, 2008 (short 2-page
"poster" paper; 16% accepted in "poster" paper category).
Optimizing SQL Queries over Text Databases, A. Jain, A. Doan, and
L. Gravano, in Proc. of the 24th IEEE International Conference on
Data Engineering (ICDE 2008), pages 636-645, 2008
(12.1% accepted in "full presentation" category).
Efficient Summarization-Aware Search for Online News Articles, W.
Dakka and L. Gravano, in Proc. of the 2007 ACM+IEEE Joint
Conference on Digital Libraries (JCDL 2007), pages 63-72, 2007.
Efficient Keyword Search Across Heterogeneous Relational Databases,
M. Sayyadian, H. LeKhac, A. Doan, and L. Gravano, in Proc. of the
23rd IEEE International Conference on Data Engineering (ICDE 2007), pages
346-355, 2007
(19% accepted).
SQL Queries Over Unstructured Text Databases, A. Jain, A. Doan,
and L. Gravano, in Proc. of the 23rd IEEE International Conference
on Data Engineering (ICDE 2007), pages 1255-1257, 2007 (short 3-page "poster" paper).
To Search or to Crawl? Towards a Query Optimizer for Text-Centric
Tasks, P. Ipeirotis, E. Agichtein, P. Jain, and L. Gravano, in
Proc. of the 2006 ACM SIGMOD International Conference on Management of Data,
pages 265-276, 2006 ("Best Paper" Award; 13% accepted).
Modeling and Managing Content Changes in Text Databases, P.
Ipeirotis, A. Ntoulas, J. Cho, and L. Gravano, in
Proc. of the 21st IEEE International Conference on Data Engineering (ICDE
2005), pages 606-617, 2005 ("Best Paper" Award; 13% accepted).
When one Sample is not Enough: Improving Text Database Selection
Using Shrinkage, P. Ipeirotis and L. Gravano, in Proc. of the 2004 ACM SIGMOD
International Conference on Management of Data, pages 767-778, 2004 (16%
accepted).
Selectivity Estimation for String Predicates: Overcoming the
Underestimation Problem, S. Chaudhuri, V. Ganti, and L. Gravano,
in Proc. of the 20th IEEE International Conference
on Data Engineering (ICDE 2004), pages 227-238, 2004 (14% accepted).
Categorizing Web Queries According to Geographical Locality, L.
Gravano, V. Hatzivassiloglou, and R. Lichtenstein, in Proc. of the 12th ACM Conference on Information and
Knowledge Management (CIKM 2003), pages 325-333, 2003
(15% accepted).
Efficient IR-Style Keyword Search over Relational Databases, V.
Hristidis, L. Gravano, and Y. Papakonstantinou, in
Proc. of the 29th International Conference on Very
Large Data Bases (VLDB 2003), pages 850-861, 2003 (15% accepted).
Text Joins in an RDBMS for Web Data Integration, L. Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava,
in Proc. of the 12th International
World Wide Web Conference (WWW 2003), pages 90-101, 2003 (13% accepted).
Querying Text Databases for Efficient Information Extraction, E. Agichtein and L. Gravano, in Proc. of the 19th IEEE International
Conference on Data Engineering (ICDE 2003), pages 113-124, 2003 ("Best Student Paper"
Award; 14% accepted).
Navigation- vs. Index-Based XML Multi-Query Processing, N. Bruno,
L. Gravano, N. Koudas, and D. Srivastava, in Proc. of the 19th
IEEE International Conference on Data Engineering (ICDE 2003), pages
139-150, 2003 (14%
accepted).
Text Joins for Data Cleansing and Integration in an RDBMS, L.
Gravano, P. Ipeirotis, N. Koudas, and D. Srivastava, in Proc. of
the 19th IEEE International Conference on Data Engineering (ICDE 2003),
pages 729-731, 2003
(short 3-page "poster" paper).
Distributed Search over the Hidden-Web: Hierarchical Database
Sampling and Selection, P. Ipeirotis and L. Gravano, in
Proc. of the 28th International Conference on Very
Large Data Bases (VLDB 2002), pages 394-405, 2002 (16% accepted).
Evaluating Top-k Queries over
Web-Accessible Databases, N. Bruno, L. Gravano, and A.
Marian, in Proc. of the 18th IEEE International
Conference on Data Engineering (ICDE 2002), pages 369-380, 2002 (19%
accepted).
Extending SDARTS: Extracting
Metadata from Web Databases and Interfacing with the Open
Archives Initiative, P. Ipeirotis, T. Barry, and L.
Gravano, in Proc. of the Second ACM+IEEE Joint
Conference on Digital Libraries (JCDL 2002), pages 162-170, 2002 (33%
accepted).
Approximate String Joins in a
Database (Almost) for Free, L. Gravano, P. Ipeirotis,
H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D.
Srivastava, in Proc. of the 27th International Conference
on Very Large Data Bases (VLDB 2001), pages 491-500, 2001 (17%
accepted).
Probe, Count, and Classify:
Categorizing Hidden Web Databases, P. Ipeirotis, L.
Gravano, and M. Sahami, in Proc. of the 2001 ACM SIGMOD
International Conference on Management of Data, pages 67-78, 2001 (15%
accepted).
STHoles: A Multidimensional
Workload-Aware Histogram, N. Bruno, S. Chaudhuri, and
L. Gravano, in Proc. of the 2001 ACM SIGMOD International
Conference on Management of Data, pages 211-222, 2001 (15% accepted).
SDLIP + STARTS = SDARTS: A
Protocol and Toolkit for Metasearching, N. Green, P.
Ipeirotis, and L. Gravano, in Proc. of the First ACM+IEEE
Joint Conference on Digital Libraries (JCDL 2001), pages 207-214, 2001.
PERSIVAL, a System for
Personalized Search and Summarization over Multimedia
Healthcare Information, K. McKeown, S.-F. Chang, J.
Cimino, S. Feiner, C. Friedman, L. Gravano, V.
Hatzivassiloglou, S. Johnson, D. Jordan, J. Klavans, A.
Kushniruk, V. Patel, and S. Teufel, in Proc. of the First
ACM+IEEE Joint Conference on Digital Libraries (JCDL 2001),
pages 331-340, 2001.
Learning Search Engine Specific
Query Transformations for Question Answering, E.
Agichtein, S. Lawrence, and L. Gravano, in Proc. of the
10th International World Wide Web Conference (WWW10),
pages 169-178, 2001 (20% accepted).
Computing Geographical Scopes of
Web Resources, J. Ding, L. Gravano, and N.
Shivakumar, in Proc. of the 26th International Conference
on Very Large Data Bases (VLDB'00), pages 545-556, 2000 (15% accepted).
An Investigation of Linguistic
Features and Clustering Algorithms for Topical Document
Clustering, V. Hatzivassiloglou, L. Gravano, and A.
Maganti, in Proc. of the 23rd ACM SIGIR Conference on
Research and Development in Information Retrieval (SIGIR'00),
pages 224-231, 2000 (25% accepted).
Snowball: Extracting Relations
from Large Plain-Text Collections, E. Agichtein and L.
Gravano, in Proc. of the 5th ACM International Conference
on Digital Libraries (DL'00), pages 85-94, 2000 (<33% accepted).
Evaluating Top-k Selection
Queries, S. Chaudhuri and L. Gravano, in Proc. of the
25th International Conference on Very Large Data Bases (VLDB'99),
pages 399-410, 1999 (15% accepted).
Merging Ranks from Heterogeneous
Internet Sources, L. Gravano and H. Garcia-Molina, in
Proc. of the 23rd International Conference on Very Large
Data Bases (VLDB'97), pages 196-205, 1997 (15% accepted).
Metadata for Digital Libraries:
Architecture and Design Rationale, M. Baldonado, C.-C.
K. Chang, L. Gravano, and A. Paepcke, in Proc. of the 2nd
ACM International Conference on Digital Libraries (DL'97),
pages 47-56, 1997 (27% accepted).
STARTS: Stanford Proposal for
Internet Meta-Searching, L. Gravano, C.-C. K. Chang,
H. Garcia-Molina, and A. Paepcke, in Proc. of the 1997
ACM SIGMOD International Conference on Management of
Data, pages 207-218, 1997 (21% accepted).
dSCAM: Finding Document Copies
across Multiple Databases, H. Garcia-Molina, L.
Gravano, and N. Shivakumar, in Proc. of the 4th
International Conference on Parallel and Distributed
Information Systems (PDIS'96), pages 68-79, 1996 (18% accepted).
Optimizing Queries over
Multimedia Repositories, S. Chaudhuri and L. Gravano,
in Proc. of the 1996 ACM SIGMOD International Conference
on Management of Data, pages 91-102, 1996 (16% accepted).
Generalizing GlOSS to Vector-Space
Databases and Broker Hierarchies, L. Gravano and H.
Garcia-Molina, in Proc. of the 21st International
Conference on Very Large Data Bases (VLDB'95), pages 78-89, 1995.
Precision and Recall of GlOSS
Estimators for Database Discovery, L. Gravano, H.
Garcia-Molina, and A. Tomasic, in Proc. of the 3rd
International Conference on Parallel and Distributed
Information Systems (PDIS'94), pages 103-106, 1994 (short paper).
The Effectiveness of GlOSS for
the Text-Database Discovery Problem, L. Gravano, H.
Garcia-Molina, and A. Tomasic, in Proc. of the 1994 ACM
SIGMOD International Conference on Management of Data,
pages 126-137, 1994 (15% accepted).
Requirements for Deadlock-Free,
Adaptive Packet Routing, R. Cypher and L. Gravano, in
Proc. of the 11th ACM Symposium on Principles of
Distributed Computing (PODC '92), pages 25-33, 1992.
Adaptive, Deadlock-Free Packet
Routing in Torus Networks with Minimal Storage, R.
Cypher and L. Gravano, in Proc. of the 1992 International
Conference on Parallel Processing (ICPP '92), pages 204-211, 1992
("Most Original Paper" Award; 13% accepted).
Adaptive Deadlock- and Livelock-Free
Routing with All Minimal Paths in Torus Networks, P.
Berman, L. Gravano, G. Pifarre, and J. Sanz, in Proc. of
the 4th Annual ACM Symposium on Parallel Algorithms and
Architectures (SPAA '92), pages 3-12, 1992.
Adaptive Deadlock-Free Worm-Hole
Routing in Hypercubes, L. Gravano, G. Pifarre, G.
Denicolay, and J. Sanz, in Proc. of the 6th International
Parallel Processing Symposium (IPPS '92), pages 512-515, 1992 (short
paper).
Fully-Adaptive Routing: Packet
Switching Performance and Worm-Hole Algorithms, S.
Felperin, L. Gravano, G. Pifarre, and J. Sanz, in Proc.
of Supercomputing '91, pages 654-663, 1991.
Fully-Adaptive Minimal Deadlock-Free
Packet Routing in Hypercubes, Meshes, and Other Networks,
G. Pifarre, L. Gravano, S. Felperin, and J. Sanz, in Proc.
of the 3rd Annual ACM Symposium on Parallel Algorithms
and Architectures (SPAA '91), pages 278-290, 1991 (19% accepted).
Papers in Refereed Workshops and
Demonstration Sessions
Event Identification in Social Media, H. Becker, M. Naaman, and
L. Gravano, in Proc. of the ACM SIGMOD Workshop on the Web and
Databases (WebDB 2009), 2009 (33% accepted).
Modeling Query-Based Access to Text Databases, E. Agichtein, P.
Ipeirotis, and L. Gravano, in Proc. of the
ACM SIGMOD
Workshop on the Web and Databases (WebDB 2003), pages 87-92, 2003 (25%
accepted).
QXtract: A Building Block for Efficient Information Extraction from
Text Databases (demonstration), E. Agichtein and L. Gravano, in Proc. of the 2003 ACM SIGMOD International Conference on Management of Data,
page 663, 2003 (30% accepted).
Snowball: A Prototype System for
Extracting Relations from Large Text Collections (demonstration),
E. Agichtein, L. Gravano, J. Pavel, V. Sokolova, and
A. Voskoboynik, in Proc. of the 2001 ACM SIGMOD
International Conference on Management of Data, page 612, 2001 (~50%
accepted).
PERSIVAL Demo: Categorizing
Hidden-Web Resources (demonstration), P. Ipeirotis, L.
Gravano, and M. Sahami, in Proc. of the First ACM+IEEE
Joint Conference on Digital Libraries (JCDL 2001), page 454, 2001.
Automatic Classification of Text
Databases through Query Probing, P. Ipeirotis, L.
Gravano, and M. Sahami, in Proc. of the ACM SIGMOD
Workshop on the Web and Databases (WebDB'00), pages 117-122, 2000 (29%
accepted). Also in LNCS Series no. 1997, Springer,
pages 245-255, 2001.
Combining Strategies for
Extracting Relations from Text Collections, E.
Agichtein, E. Eskin, and L. Gravano, in Proc. of the ACM
SIGMOD Workshop on Research Issues in Data Mining and
Knowledge Discovery (DMKD 2000), pages 86-95, 2000 (58% accepted).
Exploiting Geographical Location
Information of Web Pages, O. Buyukkokten, J. Cho, H.
Garcia-Molina, L. Gravano, and N. Shivakumar, in Proc.
of the ACM SIGMOD Workshop on the Web and Databases
(WebDB'99), pages 91-96, 1999 (29% accepted).
Invited Papers
Building Query Optimizers for Information Extraction: The SQoUT
Project, A. Jain, P. Ipeirotis, and L. Gravano, in SIGMOD
Record, Special Issue on "Managing Information Extraction," vol. 37, no. 4,
pages 28-34, December 2008.
Query- vs. Crawling-based Classification of
Searchable Web Databases, L. Gravano, P. Ipeirotis, and M. Sahami,
in IEEE Data Engineering Bulletin, vol. 25, no. 1, pages 43-50, March 2002.
Using q-grams in a DBMS for
Approximate String Processing, L. Gravano,
P. Ipeirotis, H. V. Jagadish, N. Koudas, S.
Muthukrishnan, L. Pietarinen, and D. Srivastava,
in IEEE Data Engineering Bulletin, vol. 24, no. 4,
pages 28-34, December 2001.
Simplifying Data Access: The
Energy Data Collection Project, J. L. Ambite, Y.
Arens, E. Hovy, A. Philpot, L. Gravano, V.
Hatzivassiloglou, and J. Klavans, in IEEE Computer, vol.
34, no. 2, pages 47-54, February 2001.
Database Research at Columbia
University, S.-F. Chang, L. Gravano, G. Kaiser, K.
Ross, and S. Stolfo, in SIGMOD Record, vol. 27, no. 3,
pages 75-80, September 1998.
Mediating and Metasearching on
the Internet, L. Gravano and Y. Papakonstantinou, in
IEEE Data Engineering Bulletin, vol. 21, no. 2, pages 28-36, June 1998.
The Stanford InfoBus and Its
Service Layers: Augmenting the Internet with Higher-Level
Information Management Protocols, M. Roscheisen, M.
Baldonado, C.-C. K. Chang, L. Gravano, S. Ketchpel, and A.
Paepcke, in Digital Libraries in Computer Science: The
MeDoc Approach, LNCS Series no. 1392, Springer, pages 213-230, 1998.
Optimizing Queries over
Multimedia Repositories, S. Chaudhuri and L. Gravano,
in IEEE Data Engineering Bulletin, vol. 19, no. 4,
pages 45-52, December 1996.
Routing Techniques for Massively
Parallel Communication, S. Felperin, L. Gravano, G.
Pifarre, and J. Sanz, in Proceedings of the IEEE, vol. 79,
no. 4, pages 488-503, April 1991.
Position Papers, Meeting Reports, and Miscellaneous
Publications
Characterizing Web Resources for Improved Search,
L. Gravano. Position paper for the First NSF-DELOS
Workshop on Information Seeking, Searching, and Querying
in Digital Libraries, Zurich, Switzerland, December 2000.
Resource Indexing and Discovery In a Globally
Distributed Digital Library, L. Gravano. Position
paper for the NSF-EU Digital Library Collaboratory
Working Group, Budapest, Hungary, November 1997.
Informal Internet Standards at Stanford, L.
Gravano, C.-C. K. Chang, H. Garcia-Molina,
A. Paepcke. Position paper for the 1996 World-Wide Web
Consortium (W3C) Distributed Indexing/Searching Workshop,
May 1996.
Ph.D. Thesis Advising
Active Ph.D. Advisees: Pablo Barrio, Hila Becker, Matthew Solomon
Graduated Ph.D. Advisees:
Eugene Agichtein
Defended thesis: November 2004
Deposited thesis: May 2005
First employment: Postdoc Researcher, Microsoft Research,
Redmond, Washington
Current employment: Assistant Professor of Computer Science, Department
of Mathematics and Computer Science, Emory University, Atlanta, Georgia
Nicolás Bruno
Defended thesis: April 2003 (with distinction)
Deposited thesis: May 2003
First and current employment: Researcher, Microsoft Research, Redmond, Washington
Wisam Dakka
Defended thesis: May 2008
Deposited thesis: November 2008
First and current employment: Software Engineer, Search Quality, Google,
New York
Panagiotis Ipeirotis
Defended thesis: July 2004 (with distinction)
Deposited thesis: September 2004
First and current employment: Assistant Professor, Department of
Information, Operations, and Management Sciences, Stern School of Business,
New York University, New York
Alpa Jain
Defended thesis: May 2008
Deposited thesis: September 2008
First and current employment: Scientist, Yahoo! Labs, Santa Clara,
California
Amélie Marian
Defended thesis: June 2005
Deposited thesis: September 2005
First and current employment: Assistant Professor, Department of
Computer Science, Rutgers University, New Jersey
Teaching at Columbia University
COMS W3998, W4901, E6901 Projects in Computer Science, Fall
1997-Present (1 Ph.D. student, 29 M.S. students, and 10 undergraduates)
COMS E6111 Advanced Database Systems (graduate level),
Spring 2000 (58 students), Spring 2002 (66 students), Spring 2003 (46
students), Spring 2004 (52 students), Spring 2006 (27 students), Spring 2007
(34 students), Spring 2008 (41 students), Spring 2009 (46 students)
COMS W4111 Introduction to Databases (advanced undergraduate and
graduate level), Fall 2008 (108 students)
COMS W4111 Database Systems (advanced undergraduate and
graduate level), Fall 1999 (121 students), Fall 2000 (174
students), Fall 2001 (133 students), Fall 2002 (82 students), Fall 2003
(93 students), Fall 2005 (73 students), Fall 2006 (62 students), Fall 2007
(51 students)
COMS W3139 Data Structures and Algorithms (undergraduate
level), Fall 1998 (52 students)
COMS E6113 Advanced Database Systems (graduate level),
Spring 1998 (23 students), Spring 1999 (42 students)
COMS W3203 Discrete Mathematics (undergraduate level),
Spring 1998 (76 students)
COMS E6998 Topics in Digital Libraries (graduate level),
Fall 1997 (12 students), co-taught with Dragomir Radev
Other Educational Activities
Ph.D. Dissertation Committees: Julia Stoyanovich, Search and
Ranking in Semantically Rich Applications, Columbia University, Computer
Science Department, October 2009; John Cieslewicz,
Architecture-Sensitive Database Query Processing on Chip
Multiprocessors, Columbia University, Computer Science Department,
December 2008; Alpa Jain, Query Processing
over Relations Extracted from Text Databases, Columbia University,
Computer Science Department, May 2008; Wisam Dakka, Faceted Searching and
Browsing Over Large Collections of Textual and Text-Annotated Objects,
Columbia University, Computer Science Department, May 2008; Knarig Arabshian,
Ontology-based Context-aware Service Discovery in a Globally Distributed
Network, Columbia University, Computer Science Department, May 2008; Sameer Maskey, Automatic
Broadcast News Speech Summarization, Columbia University, Computer
Science Department, December 2007; Hassan Malik, Efficient
Algorithms for Clustering and Classifying High Dimensional Data Using
Interesting Patterns, Columbia University, Computer Science Department,
November 2007; Sasha Blair-Goldensohn,
Long-Answer Question Answering and Rhetorical-Semantic Relations,
Columbia University, Computer Science Department, January 2007; Luo Si, Federated Search of
Text Search Engines in Uncooperative Environments, Carnegie Mellon
University, School of Computer Science, May 2006; Noemie Elhadad, User-sensitive
Text Summarization, Columbia University, Computer Science Department,
January 2006; Amélie Marian, Evaluation of
Top-k Queries over Structured and Semi-structured Data, Columbia
University, Computer Science Department, June 2005; Eugene Agichtein, Extracting
Relations from Large Text Collections, Columbia University, Computer
Science Department, November 2004; Panagiotis Ipeirotis,
Classifying and Searching Hidden-Web Text Databases, Columbia
University, Computer Science Department, July 2004; Jingren Zhou,
Architecture-Sensitive Database Query Processing, Columbia University,
Computer Science Department, May 2004; Nicolás Bruno, Statistics on
Query Expressions in Relational Database Management Systems, Columbia
University, Computer Science Department, April 2003; Min-Yen Kan, Automatic Text
Summarization as Applied to Information Retrieval: Using Indicative and
Informative Summaries, Columbia University, Computer Science Department,
November 2002; Eleazar Eskin, Sparse Sequence
Modeling Applied to Computational Biology and Intrusion Detection,
Columbia University, Computer Science Department, April 2002; James Shaw, Clause
Combining: An Approach to Generating Concise Text,
Columbia University, Computer Science Department,
September 2001; Hongyan Jing, Cut and Paste Based Text
Summarization, Columbia University, Computer Science
Department, April 2001; David W. Fan, Cost-sensitive,
Scalable and Adaptive Learning Using Ensemble-based
Methods, Columbia University, Computer Science
Department, December 2000; Jun Rao, Advanced Query
Processing in Databases, Columbia University,
Computer Science Department, May 2000; Jingshuang Yang, Extensible
Transaction Service for WWW-based Collaborative Systems,
Columbia University, Computer Science Department,
November 1999; Andreas Prodromidis, Management of
Intelligent Learning Agents in Distributed Data Mining
Systems, Columbia University, Computer Science
Department, September 1999; Michelle Zhou, Automated
Generation of Visual Discourse, Columbia University,
Computer Science Department, October 1998; Akira
Kawaguchi, Implementation Techniques for Materialized
Views, Columbia University, Computer Science
Department, October 1997
Ph.D. Thesis Proposal Committees: Corey Goldfeder, March 2008; Julia Stoyanovich, December
2007; John Cieslewicz, December 2007; Sameer Maskey, June 2007; Wisam Dakka, February 2007;
Alpa Jain, January 2007; Hassan Malik, April 2006; Sasha Blair-Goldensohn,
February 2006; Luo Si, July 2004; Noemie Elhadad, March 2004; Amélie Marian, April 2003; Panagiotis Ipeirotis, May 2002;
Jingren Zhou, May 2002; Eugene Agichtein, April 2002; Nicolás Bruno, April 2002; Eleazar Eskin,
December 2000; Giuseppe Valetto, May 2000; Min-Yen Kan,
December 1999; Hongyan Jing, February 1999; David W. Fan,
September 1998; Jun Rao, May 1998; Andreas Prodromidis,
January 1998; Steve Dossick, December 1997; James Shaw,
December 1997
Ph.D. Candidacy Exam Committees: John Cieslewicz, April 2006; Julia Stoyanovich, October 2005; Alpa Shah, May 2004; Wisam Dakka, May 2004; Elena Filatova, December 2002; Amélie Marian, April 2002; Panagiotis
Ipeirotis, April 2001; Jingren Zhou, April 2001; Eugene
Agichtein, January 2001; Nicolás Bruno, December 2000;
Junyan Ding, May 2000; Junxin Zhang, May 2000; Kazi
Zaman, December 1998
External Student Supervision: Richard Lichtenstein (undergraduate
student at Harvard University), Summer and Fall 2002, towards Harvard
University's CS
91r-Supervised Reading and Research
Short Courses: Universidad Torcuato Di Tella, Data Mining para
Business Intelligence, July 2008; Universidad de Buenos Aires, Distributed
Databases, July 1994 (approximately 100 students)