SMBlog -- 8 March 2009

Access to Old Information

8 March 2009

There has been a lot of controversy about Google’s book-scanning project. A lot of the controversy concerns payments to authors of copyrighted works. While I’m all in favor of copyright — I earned a non-trivial amount of money from a book — some recent experiences have left me strongly in favor of the project, but worried about abuse of ownership of physical media.

I’ve recently been doing some research that has required access to some very old printed works. (Access to the Columbia University Libraries — definitely a world-class research resource — has been invaluable to this project. I owe a great debt of gratitude to the generations of librarians who have built and maintained such organizations.) As a result, I very much appreciate the efforts to scan in books. While there are very valid concerns about the permanence of digital file formats and media, books also decay over time. I’ve been working with 85-year-old books that are very fragile; they clearly will not be usable 85 years from now. The pages are yellow and crumbling, the bindings are failing, any use is a threat to the volume’s existence. (Aside: there is a certain undeniable thrill to checking out a book for the first time in more than 70 years, judging from the dates stamped in the back. It brings to mind Gandalf’s comment to Boromir in Lord of the Rings: “there lies in Minas Tirith still, unread, I guess, by any save Saruman and myself since the kings failed, a scroll that Isildur made himself.”) But of what use is a physical volume of knowledge if no one can access it?

Sometimes, this has already been done. I visited Columbia’s Rare Book and Manuscript Library, and felt privileged to be able to handle Smith’s Secret Corresponding Vocabulary, an 1845 publication by Samuel Morse’s business partner. But I was even happier when I found that Google had already scanned a copy. There is a danger, however: who owns the material? The following text appears in public domain books downloaded from Google’s archive:

We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.

Is scholarly researdch a "personal, non-commercial purpose"? Note that this statement explicitly applies to works that are in the public domain.

The issue isn’t simply electronic documents. Consider this work, which is a microfilm of a U.S. government document and hence never subject to copyright. The catalog entry bears this note:

Reuse of record except for individual research requires license from Congressional Information Services, Inc.

What is “individual research”?

What we are seeing is the use of contract law to obtain rights not granted by copyright. If we are not careful, we will see public information locked up. Worse yet, digital records can be protected by so-called Digital Rights Management (DRM) technology, making them inaccessible except on terms dictated by the physical record’s owner.

To be sure, a company that goes to the expense of scanning or photographing old books and documents is entitled to make a profit. That begs the question — at least two of them, in fact. We need to ask about the fate of public documents (such as government records) and about the role of libraries.

Government documents, in a very strong sense, belong to the people. It is perhaps reasonable to let a private company reproduce such documents and be compensated for its efforts, though I prefer the attitude that since public documents do belong to the people, they should be made available by the government itself. That said, if a private company is going to be the designated publisher, it should not control how the documents are used. Their profit should be from the collection — a collection copyright, if you will — rather than from the individual documents. It is certainly within the government’s power to impose that restriction in any contracts it signs for such reproductions.

Libraries have a different problem. They frequently don’t own the originals; however, their mission is to make information broadly available. By agreeing to stringent restrictions, above and beyond what would be permitted under the Fair Use doctrine of copyright law, they undermine their own goals. If libraries, as a matter of principle, decline to purchase works that are unduly encumbered, I suspect that many publishers will back off. (I doubt that there are many purchasers of the collected works of the War Department from a hundred years ago other than for research libraries.)

To sum up: there are two ways we can lose access to information, physical and legal. In avoiding the first problem, we have to take care that we do not create the second. This is especially serious for digitized materials, where technology can enforce restrictive contracts.