Luis Gravano's
Publications and Patents
This material is
presented to ensure timely dissemination of scholarly and
technical work. Copyright and all rights therein are retained
by authors or by other copyright holders. All persons copying
this information are expected to adhere to the terms and
constraints invoked by each author's copyright. In most cases,
these works may not be reposted without the explicit
permission of the copyright holder.
Patents
- String Predicate Selectivity Estimation, S. Chaudhuri,
V. Ganti, and L. Gravano, United States Patent 7,149,735, issued
December 12, 2006
- Systems and Methods for Using Anchor Text as Parallel
Corpora for Cross-Language Information Retrieval, L.
Gravano and M. Henzinger, United States Patent 7,146,358, issued
December 5, 2006
- Method of Building Multidimensional Workload-Aware
Histograms, S. Chaudhuri, N. Bruno, and L. Gravano, United
States Patent 7,007,039, issued February 28, 2006
- Method for Cost-Based Optimization over Multimedia
Repositories, S. Chaudhuri and L. Gravano, United States
Patent 5,806,061, issued September 8, 1998
- Method of Packet Routing in Torus Networks with Two Buffers
per Edge, R. Cypher and L. Gravano, United States Patent
5,444,701, issued August 22, 1995
Papers in Refereed Journals
- Answering General Time-Sensitive Queries,
W. Dakka, L. Gravano, and P. Ipeirotis, in IEEE Transactions on
Knowledge and Data Engineering, vol. 24, no. 2, pages 220-235,
Feb. 2012.
- Hip and Trendy: Characterizing Emerging Trends on
Twitter, M. Naaman, H. Becker, and L. Gravano, in
Journal of the American Society for Information Science and
Technology, vol. 62, no. 5, pages 902–918, May 2011.
- Classification-Aware
Hidden-Web Text Database Selection, P. Ipeirotis and
L. Gravano, in ACM Transactions on Information Systems, vol. 26,
no. 2, art. 6, Mar. 2008.
- Towards a Query Optimizer
for Text-Centric Tasks, P. Ipeirotis, E. Agichtein,
P. Jain, and L. Gravano, in ACM Transactions on Database
Systems, vol. 32, no. 4, Nov. 2007.
- Modeling and Managing
Changes in Text Databases, P. Ipeirotis, A. Ntoulas,
J. Cho, and L. Gravano, in ACM Transactions on Database Systems,
vol. 32, no. 3, Aug. 2007.
- Optimizing Top-k Selection
Queries over Multimedia Repositories, S.
Chaudhuri, L. Gravano, and A. Marian, in IEEE Transactions on
Knowledge and Data Engineering, vol. 16, no. 8, Aug. 2004.
- Evaluating Top-k Queries
over Web-Accessible Databases, A. Marian, N. Bruno,
and L. Gravano, in ACM Transactions on Database Systems, vol.
29, no. 2, June 2004.
- Learning to Find Answers
to Questions on the Web, E. Agichtein, S. Lawrence,
and L. Gravano, in ACM Transactions on Internet Technology, vol.
4, no. 2, May 2004.
- QProber: A System for
Automatic Classification of Hidden-Web Databases, L.
Gravano, P. Ipeirotis, and M. Sahami, in ACM Transactions on
Information Systems, vol. 21, no. 1, Jan. 2003.
- Top-k Selection Queries
over Relational Databases: Mapping Strategies and
Performance Evaluation, N. Bruno, S. Chaudhuri, and
L. Gravano, in ACM Transactions on Database Systems, vol. 27,
no. 2, Jun. 2002.
- GlOSS: Text-Source
Discovery over the Internet, L. Gravano, H.
Garcia-Molina, A. Tomasic, in ACM Transactions on Database
Systems, vol. 24, no. 2, Jun. 1999.
- The Stanford Digital
Library Metadata Architecture, M. Baldonado, C.-C.
K. Chang, L. Gravano, and A. Paepcke, in International Journal
on Digital Libraries, vol. 1, no. 2, Sep. 1997.
- Data Structures for
Efficient Broker Implementation, A. Tomasic, L.
Gravano, C. Lue, P. Schwarz, and L. Haas, in ACM Transactions on
Information Systems, vol. 15, no. 3, Jul. 1997.
- Storage-Efficient,
Deadlock-Free Packet Routing Algorithms for Torus Networks,
R. Cypher and L. Gravano, in IEEE Transactions on Computers,
vol. 43, no. 12, Dec. 1994.
- Requirements for
Deadlock-Free, Adaptive Packet Routing, R. Cypher
and L. Gravano, in SIAM Journal on Computing, vol. 23, no. 6,
Dec. 1994.
- Adaptive Deadlock- and
Livelock-Free Routing with All Minimal Paths in Torus
Networks, L. Gravano, G. Pifarre, P. Berman, and J.
Sanz, in IEEE Transactions on Parallel and Distributed Systems,
vol. 5, no. 12, Dec. 1994.
- Adaptive Deadlock- and
Livelock-Free Routing in the Hypercube Network, G.
Pifarre, L. Gravano, G. Denicolay, J. Sanz, in IEEE Transactions
on Parallel and Distributed Systems, vol. 5, no. 11, Nov. 1994.
- Fully Adaptive Minimal
Deadlock-Free Packet Routing in Hypercubes, Meshes, and
Other Networks: Algorithms and Simulations, G.
Pifarre, L. Gravano, S. Felperin, and J. Sanz, in IEEE
Transactions on Parallel and Distributed Systems, vol. 5, no. 3,
Mar. 1994.
Book Chapter
- XML & Data Streams, N. Bruno, L. Gravano, N.
Koudas, and D. Srivastava. Chapter 4 in "Stream Data
Management," edited by N. Chaudhry, K. Shaw, and M.
Abdelguerfi, Series: Advances in Database Systems, Volume 30,
pages 59-81, Springer, 2005.
Papers in Refereed Conferences
- Identifying
Content for Planned Events Across Social Media Sites, H. Becker, D. Iter, M. Naaman, and L.
Gravano, in Proc. of the 2012 ACM International Conference on
Web Search and Data Mining (WSDM 2012), 2012.
- Beyond Trending Topics: Real-World Event
Identification on Twitter, H. Becker, M. Naaman,
and L. Gravano, in Proc. of the Fifth International AAAI
Conference on Weblogs and Social Media (ICWSM 2011), 4 pages,
2011 (short 4-page "poster" paper).
- Selecting Quality Twitter Content for Events,
H. Becker, M. Naaman, and L. Gravano, in Proc. of the Fifth
International AAAI Conference on Weblogs and Social Media (ICWSM
2011), 4 pages, 2011 (short 4-page "poster" paper).
- Learning Similarity Metrics for Event
Identification in Social Media, H. Becker, M.
Naaman, and L. Gravano, in Proc. of the 2010 ACM International
Conference on Web Search and Data Mining (WSDM 2010), 2010.
- Join Optimization of
Information Extraction Output: Quality Matters!, A.
Jain, P. Ipeirotis, A. Doan, and L. Gravano, in Proc. of the
25th IEEE International Conference on Data Engineering (ICDE
2009), 2009.
- Answering General
Time-Sensitive Queries, W. Dakka, L. Gravano, and
P. Ipeirotis, in Proc. of the 17th ACM Conference on Information
and Knowledge Management (CIKM 2008), 2008 (short 2-page
"poster" paper).
- Optimizing SQL Queries
over Text Databases, A. Jain, A. Doan, and L.
Gravano, in Proc. of the 24th IEEE International Conference on
Data Engineering (ICDE 2008), 2008.
- Efficient
Summarization-Aware Search for Online News Articles,
W. Dakka and L. Gravano, in Proc. of the 2007 ACM+IEEE Joint
Conference on Digital Libraries (JCDL 2007), 2007.
- Efficient Keyword Search
Across Heterogeneous Relational Databases, M.
Sayyadian, H. LeKhac, A. Doan, and L. Gravano, in Proc. of the
23rd IEEE International Conference on Data Engineering (ICDE
2007), 2007.
- SQL Queries Over
Unstructured Text Databases, A. Jain, A. Doan, and
L. Gravano, in Proc. of the 23rd IEEE International Conference
on Data Engineering (ICDE 2007), 2007 (short 3-page "poster"
paper).
- To Search or to Crawl?
Towards a Query Optimizer for Text-Centric Tasks ("Best
Paper"
Award), P. Ipeirotis, E. Agichtein, P. Jain, and L.
Gravano, in Proc. of the 2006 ACM SIGMOD International
Conference on Management of Data, 2006.
- Modeling and Managing
Content Changes in Text Databases ("Best Paper" Award),
P. Ipeirotis, A. Ntoulas, J. Cho, and L. Gravano, in Proc. of
the 21st IEEE International Conference on Data Engineering (ICDE
2005), 2005.
- When one Sample is not
Enough: Improving Text Database Selection Using Shrinkage,
P. Ipeirotis and L. Gravano, in Proc. of the 2004 ACM SIGMOD
International Conference on Management of Data, 2004.
- Selectivity Estimation for
String Predicates: Overcoming the Underestimation Problem,
S. Chaudhuri, V. Ganti, and L. Gravano, in Proc. of the 20th
IEEE International Conference on Data Engineering (ICDE 2004),
2004.
- Categorizing Web Queries
According to Geographical Locality, L.
Gravano, V. Hatzivassiloglou, and R. Lichtenstein, in Proc. of
the 12th ACM Conference on Information and Knowledge Management
(CIKM 2003), 2003.
- Efficient IR-Style Keyword
Search over Relational Databases, V. Hristidis, L.
Gravano, and Y. Papakonstantinou, in Proc. of the 29th
International Conference on Very Large Data Bases (VLDB 2003),
2003.
- Text Joins in an RDBMS for
Web Data Integration, L. Gravano, P. Ipeirotis, N.
Koudas, and D. Srivastava, in Proc. of the 12th
International World-Wide Web Conference (WWW 2003), 2003.
- Querying Text Databases
for Efficient Information Extraction ("Best Student
Paper" Award), E. Agichtein and L. Gravano, in Proc. of
the 19th IEEE International Conference on Data Engineering (ICDE
2003), 2003 [errata].
- Navigation- vs.
Index-Based XML Multi-Query Processing, N. Bruno, L.
Gravano, N. Koudas, and D. Srivastava, in Proc. of the 19th IEEE
International Conference on Data Engineering (ICDE 2003), 2003.
- Text Joins for Data
Cleansing and Integration in an RDBMS, L. Gravano,
P. Ipeirotis, N. Koudas, and D. Srivastava, in Proc. of the 19th
IEEE International Conference on Data Engineering (ICDE 2003),
2003 (short 3-page "poster" paper).
- Distributed Search over
the Hidden-Web: Hierarchical Database Sampling and Selection,
P. Ipeirotis and L. Gravano, in Proc. of the 28th International
Conference on Very Large Data Bases (VLDB 2002), 2002.
- Evaluating Top-k Queries
over Web-Accessible Databases, N. Bruno, L.
Gravano, and A. Marian, in Proc. of the 18th IEEE International
Conference on Data Engineering (ICDE 2002), 2002.
- Extending SDARTS:
Extracting Metadata from Web Databases and Interfacing with
the Open Archives Initiative, P. Ipeirotis, T.
Barry, and L. Gravano, in Proc. of the Second ACM+IEEE Joint
Conference on Digital Libraries (JCDL 2002), 2002.
- Approximate String Joins
in a Database (Almost) for Free, L. Gravano, P.
Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, and D.
Srivastava, in Proc. of the 27th International Conference on
Very Large Data Bases (VLDB 2001), 2001 [errata].
- Probe, Count, and
Classify: Categorizing Hidden Web Databases, P.
Ipeirotis, L. Gravano, and M. Sahami, in Proc. of the 2001 ACM
SIGMOD International Conference on Management of Data, 2001.
- STHoles: A
Multidimensional Workload-Aware Histogram, N. Bruno,
S. Chaudhuri, and L. Gravano, in Proc. of the 2001 ACM SIGMOD
International Conference on Management of Data, 2001.
- SDLIP + STARTS = SDARTS:
A Protocol and Toolkit for Metasearching, N. Green,
P. Ipeirotis, and L. Gravano, in Proc. of the First ACM+IEEE
Joint Conference on Digital Libraries (JCDL 2001), 2001.
- PERSIVAL, a System for
Personalized Search and Summarization over Multimedia
Healthcare Information, K. McKeown, S.-F. Chang, J.
Cimino, S. Feiner, C. Friedman, L. Gravano, V. Hatzivassiloglou,
S. Johnson, D. Jordan, J. Klavans, A. Kushniruk, V. Patel, and
S. Teufel, in Proc. of the First ACM+IEEE Joint Conference on
Digital Libraries (JCDL 2001), 2001.
- Learning Search Engine
Specific Query Transformations for Question Answering,
E. Agichtein, S. Lawrence, and L. Gravano, in Proc. of the 10th
International World-Wide Web Conference (WWW10), 2001.
- Computing Geographical
Scopes of Web Resources, J. Ding, L. Gravano, and N.
Shivakumar, in Proc. of the 26th International Conference on
Very Large Data Bases (VLDB'00), 2000. (PDF version)
- An Investigation of
Linguistic Features and Clustering Algorithms for Topical
Document Clustering, V. Hatzivassiloglou, L.
Gravano, and A. Maganti, in Proc. of the 23rd ACM SIGIR
Conference on Research and Development in Information Retrieval
(SIGIR'00), 2000. (PDF
version)
- Snowball: Extracting
Relations from Large Plain-Text Collections, E.
Agichtein and L. Gravano, in Proc. of the 5th ACM International
Conference on Digital Libraries (DL'00), 2000. (PDF version)
- Evaluating Top-k Selection
Queries, S. Chaudhuri and L. Gravano, in Proc. of
the 25th International Conference on Very Large Data Bases
(VLDB'99), 1999. (PDF
version)
- Merging Ranks from
Heterogeneous Internet Sources, L. Gravano
and H. Garcia-Molina, in Proc. of the 23rd International
Conference on Very Large Data Bases (VLDB'97), 1997. (PDF version)
- Metadata for Digital
Libraries: Architecture and Design Rationale, M.
Baldonado, C.-C. K. Chang, L. Gravano, and A. Paepcke, in Proc.
of the 2nd ACM International Conference on Digital Libraries
(DL'97), 1997.
- STARTS: Stanford Proposal
for Internet Meta-Searching, L. Gravano, C.-C. K.
Chang, H. Garcia-Molina, and A. Paepcke, in Proc. of the 1997
ACM SIGMOD International Conference on Management of Data, 1997.
- dSCAM: Finding Document
Copies across Multiple Databases, H.
Garcia-Molina, L. Gravano, and N. Shivakumar, in Proc. of the
4th International Conference on Parallel and Distributed
Information Systems (PDIS'96), 1996.
- Optimizing Queries over
Multimedia Repositories, S. Chaudhuri and L.
Gravano, in Proc. of the 1996 ACM SIGMOD International
Conference on Management of Data, 1996.
- Generalizing GlOSS to
Vector-Space Databases and Broker Hierarchies, L.
Gravano and H. Garcia-Molina, in Proc. of the 21st International
Conference on Very Large Data Bases (VLDB'95), 1995.
- Precision and Recall of
GlOSS Estimators for Database Discovery, L. Gravano,
H. Garcia-Molina, and A. Tomasic, in Proc. of the 3rd
International Conference on Parallel and Distributed Information
Systems (PDIS'94), 1994 (short paper).
- The Effectiveness of
GlOSS for the Text-Database Discovery Problem, L.
Gravano, H. Garcia-Molina, and A. Tomasic, in Proc. of the 1994
ACM SIGMOD International Conference on Management of Data, 1994.
- Requirements for
Deadlock-Free, Adaptive Packet Routing, R. Cypher
and L. Gravano, in Proc. of the 11th ACM Symposium on Principles
of Distributed Computing (PODC '92), 1992. (PDF version)
- Adaptive, Deadlock-Free Packet Routing in Torus Networks
with Minimal Storage, R. Cypher and L. Gravano, in Proc.
of the 1992 International Conference on Parallel Processing
(ICPP '92), 1992.
- Adaptive Deadlock- and
Livelock-Free Routing with All Minimal Paths in Torus
Networks, P. Berman, L. Gravano, G. Pifarre, and J.
Sanz, in Proc. of the 4th Annual ACM Symposium on Parallel
Algorithms and Architectures (SPAA '92), 1992. (PDF version)
- Adaptive Deadlock-Free
Worm-Hole Routing in Hypercubes, L. Gravano, G.
Pifarre, G. Denicolay, and J. Sanz, in Proc. of the 6th
International Parallel Processing Symposium (IPPS '92), 1992
(short paper).
- Fully-Adaptive Routing:
Packet Switching Performance and Worm-Hole Algorithms,
S. Felperin, L. Gravano, G. Pifarre, and J. Sanz, in Proc. of
Supercomputing '91, 1991. (PDF version)
- Fully-Adaptive Minimal
Deadlock-Free Packet Routing in Hypercubes, Meshes, and
Other Networks, G. Pifarre, L. Gravano, S. Felperin,
and J. Sanz, in Proc. of the 3rd Annual ACM Symposium on
Parallel Algorithms and Architectures (SPAA '91), 1991. (PDF version)
Papers in Refereed Workshops and
Demonstration Sessions
- Quality
Impact of Value Matching and Scoring in Top-k Entity Attribute
Extraction, M. Solomon, L. Gravano, and C. Yu, in Proc. of
the 5th International Workshop on Ranking in Databases (DBRank
2011), 2011.
- Automatic Identification and Presentation of
Twitter Content for Planned Events (demonstration),
H. Becker, F. Chen, D. Iter, M. Naaman, and L. Gravano, in
Proc. of the Fifth International AAAI Conference on Weblogs
and Social Media (ICWSM 2011), 2 pages, 2011.
- Popularity-Guided Top-k Extraction of Entity
Attributes, M. Solomon, C. Yu, and L. Gravano, in
Proc. of the ACM SIGMOD Workshop on the Web and Databases (WebDB
2010), 2010.
- Exploiting Social Links for
Event Identification in Social Media (poster), H.
Becker, B. Xiao, M. Naaman, and L. Gravano, in Proc. of the 3rd
Annual Workshop on Search in Social Media (SSM 2010), 2010.
- Event Identification in
Social Media, H. Becker, M. Naaman, and L. Gravano,
in Proc. of the ACM SIGMOD Workshop on the Web and Databases
(WebDB 2009), 2009.
- Modeling Query-Based
Access to Text Databases, E. Agichtein, P.
Ipeirotis, and L. Gravano, in Proc. of the ACM SIGMOD Workshop
on the Web and Databases (WebDB 2003), 2003.
- QXtract: A Building
Block for Efficient Information Extraction from Text
Databases (demonstration), E. Agichtein and L.
Gravano, in Proc. of the 2003 ACM SIGMOD International
Conference on Management of Data, 2003.
- Snowball: A Prototype
System for Extracting Relations from Large Text Collections
(demonstration), E. Agichtein, L. Gravano, J. Pavel,
V. Sokolova, and A. Voskoboynik, in Proc. of the 2001 ACM SIGMOD
International Conference on Management of Data, 2001.
- PERSIVAL Demo:
Categorizing Hidden-Web Resources (demonstration),
P. Ipeirotis, L. Gravano, and M. Sahami, in Proc. of the First
ACM+IEEE Joint Conference on Digital Libraries (JCDL 2001),
2001.
- Automatic Classification
of Text Databases through Query Probing, P.
Ipeirotis, L. Gravano, and M. Sahami, in Proc. of the ACM SIGMOD
Workshop on the Web and Databases (WebDB'00), 2000. (PDF version) Also in
LNCS Series no. 1997, Springer, 2001.
- Combining Strategies for
Extracting Relations from Text Collections,
E. Agichtein, E. Eskin, and L. Gravano, in Proc. of the ACM
SIGMOD Workshop on Research Issues in Data Mining and Knowledge
Discovery (DMKD 2000), 2000. (PDF
version)
- Exploiting Geographical
Location Information of Web Pages, O. Buyukkokten,
J. Cho, H. Garcia-Molina, L. Gravano, and N. Shivakumar, in
Proc. of the ACM SIGMOD Workshop on the Web and Databases
(WebDB'99), 1999. (PDF
version)
Invited Papers
- Building Query
Optimizers for Information Extraction: The SQoUT Project,
A. Jain, P. Ipeirotis, and L. Gravano, in SIGMOD Record,
Special Issue on "Managing Information Extraction," vol. 37, no.
4, December 2008.
- Query- vs. Crawling-based
Classification of Searchable Web Databases, L.
Gravano, P. Ipeirotis, and M. Sahami, in IEEE Data Engineering
Bulletin, vol. 25, no. 1, March 2002.
- Using q-grams in a DBMS for
Approximate String Processing, L. Gravano,
P. Ipeirotis, H. V. Jagadish, N. Koudas, S. Muthukrishnan, L.
Pietarinen, and D. Srivastava, in IEEE Data Engineering
Bulletin, vol. 24, no. 4, December 2001 [errata].
- Simplifying Data
Access: The Energy Data Collection Project, J.
L. Ambite, Y. Arens, E. Hovy, A. Philpot, L. Gravano, V.
Hatzivassiloglou, and J. Klavans, in IEEE Computer, vol. 34, no.
2, February 2001.
- Database Research at
Columbia University, S.-F. Chang, L.
Gravano, G. Kaiser, K. Ross, and S. Stolfo, in SIGMOD Record,
vol. 27, no. 3, September 1998.
- Mediating and
Metasearching on the Internet, L. Gravano and Y.
Papakonstantinou, in IEEE
Data Engineering Bulletin, vol. 21, no. 2, June 1998.
- The Stanford InfoBus
and Its Service Layers: Augmenting the Internet with
Higher-Level Information Management Protocols, M.
Roscheisen, M. Baldonado, C.-C. K. Chang, L. Gravano, S.
Ketchpel, and A. Paepcke, in Digital Libraries in Computer
Science: The MeDoc Approach, LNCS Series no. 1392,
Springer, 1998.
- Optimizing Queries over
Multimedia Repositories, S. Chaudhuri and L.
Gravano, in IEEE
Data Engineering Bulletin, vol. 19, no. 4, December 1996.
- Routing Techniques for
Massively Parallel Communication, S. Felperin, L.
Gravano, G. Pifarre, and J. Sanz, in Proceedings of the IEEE,
vol. 79, no. 4, April 1991.
Position Papers, Meeting Reports,
and Miscellaneous Publications
- Characterizing
Web Resources for Improved Search, L. Gravano.
Position paper for the
First
NSF-DELOS Workshop on Information Seeking, Searching, and
Querying in Digital Libraries, Zurich, Switzerland,
December 2000. (PDF
version)
- Resource Indexing and
Discovery In a Globally Distributed Digital Library,
L. Gravano. Position paper for the NSF-EU Digital Library
Collaboratory Working Group, Budapest, Hungary, November 1997.
- Querying Multiple Document
Collections across the Internet, L. Gravano. Ph.D.
Dissertation, Stanford University (advisor: H. Garcia-Molina),
August 1997. (PDF version)
- Informal Internet Standards
at Stanford, L. Gravano, C.-C. K. Chang, H.
Garcia-Molina, and A. Paepcke. Position paper for the 1996
World-Wide
Web Consortium (W3C) Distributed Indexing/Searching Workshop,
May 1996.
Luis Gravano
gravano@cs.columbia.edu