Executive Summary

The Digital Library program at Columbia University spans a wide range of activities: from cutting edge research projects to working production level projects now in use on campus and beyond. Columbia boasts one of the most integrated digital library programs in the country, precisely due to the coordination between the service components of the University including the Libraries and the Academic Information Systems division, instructional components across disciplines, and research components of the University specifically focussing on new technologies and content areas for the digital library. This integration permits the Columbia University Digital Library initiative to achieve superior results by harnessing the intellectual power of the entire University community, across disciplines and across functions. This document presents several aspects of the Digital Library program, including the infrastructure, the production system, the research and development which will lead to the Columbia Digital Library advanced prototype, and the impact of the digital library on the university and the community.

The Center for Research on Information Access (CRIA): Multidisciplinary Coordination

In order to ensure coordination of digital library research, testbed and evaluation components, and to ensure that activities are carried out in accordance with the vision of the Columbia Digital Library, Columbia University has established the Center for Research on Information Access (CRIA). The Center, housed in and associated with the Libraries, is also closely associated both with Academic Information Systems (AcIS) and with the Department of Computer Science. The director of CRIA identifies opportunities for new projects, initiates and develops projects, coordinates relations among project partners, oversees financial and project-development management, and is involved in ensuring that the research vision is pursued as a result of advice from its advisory committees. CRIA comprises several advisory committees, including the research advisory committee which oversees research aspects of projects, carrying out regular reviews of the research and identifying new directions that should be explored, and the intellectual property committee which coordinates legal experts, publishers, and researchers in exploring new approaches to online distribution of material.

The Columbia Digital Library Infrastructure

Over the past several years, Columbia University has been creating and assembling an increasingly rich and diverse set of networked information resources that serves and empowers its community, including high-speed network connections, electronic classrooms, campus-wide public terminals, and wiring of all dormitory rooms. Current investments build upon other technology initiatives that led to installation of the fiber-optic network, authentication systems, and network management systems. Enabled by timely investments in the campus network, enhanced by rapid developments in new information delivery technologies, and joined together by the ColumbiaNet infrastructure, these interconnecting services have become the components of a larger, and increasingly integrated, program of electronic information delivery. Viewed as an integrated program, these services form the nucleus of a digital library at Columbia

The Current Digital Library

Columbia's digital library is a collection of information, tools, and resources made available to the Columbia community over the network in an organized, well-managed and well-supported manner. The contents of the digital library include research, instructional, administrative, and student information in a variety of forms, including full text, images, indexes, catalogs, databases, multi-media resources, geographic and numeric datasets, and links to selected services on the Internet. The digital library organizes and delivers this content through search and retrieval mechanisms, and it provides a wide array of tools and applications for users to manipulate and employ the content they retrieve. The digital library in its broadest sense is a key component of the digital university, permitting access to material as wide-ranging as Dante to grades to weather measurements, using tools tailored to the task with proper intellectual property protection and security.

Research on Digital Library Technologies

The development of a digital library is a highly complex process and requires simultaneous advances in discrete domains of technical inquiry including user interfaces, search and retrieval techniques, representation and presentation of information, and management of intellectual property. It requires combining very large-scale networks with very large-scale file storage and creating digital collections of sufficient depth and breadth to be of compelling interest to working user groups. The Digital Library research program brings together an interdisciplinary team of experts to address these issues, resulting in a research prototype digital library.

Research and evaluation teams are working toward this goal by addressing several domains of technical inquiry, in a highly interactive manner. In developing a prototype advanced digital library, our focus is on creating a user interface that can meet the needs of a wide range of end users, on developing integrated search and retrieval to efficiently find documents of interest whether textual or visual, and on enhancing representation of information to facilitate search and presentation of information. An important feature of our research is the integrated search of textual and image documents, allowing a user to retrieve textual, image, video, or any combination of these three, in response to a single query. Our research addresses questions such as how to integrate information from both text and image in order to more effectively find a desired document. For example, someone wishing to find out about a news topic such as the World Trade Bombing could get summarized newpaper articles, relevant images, and maps, all presented in a concise organized fashion. Our goal is to present large amounts of available information, using both natural language summaries and graphical visualization, to users in a compact way to alleviate the growing problem of information overload.

Content Development

Content development is of particular importance to the research program given the fact that relatively few scholarly resources are currently available in digital form. Creation of a content testbed provides the means to validate research carried out on digital libraries, by developing a very large multimedia test data set which also will be made available to other researchers. Through the strength of Columbia University Libraries and partnership with a variety of publishers, the Columbia Digital Library provides access to collections in a growing range of areas. Collections are continuing to grow both through the efforts of Columbia University Libraries and through individual departments. For example, the Libraries and Academic Information Systems have embarked on a project to put online all of the texts and art history images required for the core humanities courses in Columbia College. In addition, Columbia's Libraries are undertaking a study of the use of on-line books, with support from the Mellon Foundation, and the participation of Oxford University Press, Columbia University Press, Garland Publishing, and Simon & Schuster Higher Education Division.

This study will answer questions such as which books are most useful in digital format (reference, high use, low use), how usage patterns differ between digital and paper formats, and how to incorporate copyright and fair use restrictions into digitally available data. As part of the instructional area, individual departments such as geology, chemistry, and medicine are creating online curricula for instructional use and these also provide access to large collections of relevant data and documents. In addition to drawing on collections developed and in use here at Columbia, the Digital Library also draws on the large amount of material available and growing on the Internet. Both the Columbia Digital Library production system and the Columbia Digital Library advanced research prototype aim at providing access to a broad set of topic areas.

Access to Content

The broadly defined scope of the digital library creates new challenges for organizing its contents, beyond those for managing and providing access to traditional information media. The nature of digital information presents both new opportunities and new complexities for those devising schemes of access. Although reliable access to this data is provided for all members of the University in addition to outside researchers, the richness and diversity of resources on ColumbiaWeb already push the limits of menuing systems and key word searching as access mechanisms. New technologies for retrieval to improve on existing systems are a focus of the research program for the research prototype, but these need to be effectively integrated with efforts to structure and catalog digital information. As components of the digital library prototype become available to the user community, they will be incorporated into the collection testbed and used to facilitate access. Members of the Academic Information Systems (AcIS) are already working on the production of the testbed, assuring reliability, access, and connectivity across campus.

Collaborations to Increase Impact

Our collaboration with public and university libraries, along with primary and secondary school settings, ensures that our work remains at the forefront of participatory design. As part of its membership in the National Digital Library Federation (which includes the New York Public Library and the Library of Congress as well as several large university research libraries), Columbia collaborates with other institutions to provide a varied audience of library patrons to evaluate its systems. In addition, we plan to make the Columbia Digital Library advanced prototype available to various intermediate school settings with whom we have established connections. In particular, Columbia is involved in establishing internet connections to a variety of schools within Harlem and providing expertise to educators within these settings to adequately avail themselves of the online connection. Through this infrastructure, which is being funded at Teachers College, we will be able to easily provide access to the Digital Library advanced prototype for controlled, educational problem solving tasks. These collaborations allow us to test our ideas with actual users with initial prototypes of the final system.

The Potential of the Digital Library

The Digital Library provides the framework in which to design and implement the continued expansion and enrichment of online information services for the Columbia community. Developments in digital library technology--locally, nationally and internationally-- have provided a focus for realizing the potential of digital information to be used in service of the University's mission. Columbia University is at the lead both in providing digital text, image, and sound to the community, and in the development of forward-thinking technologies for the advancement of the Digital Library.

Back to main CRIA page

This page is located at http://www.cs.columbia.edu/~klavans/Cria/digital-library-desc.html