2 March 2011
I know I've been silent lately — among other things, I've been frantically working on a cryptologic history paper. If you're interested, I stumbled on a telegraph codebook that showed that the one-time pad — the only provably secure algorithmic cryptosystem — was actually invented 35 years earlier than had been thought. I think that that's a neat story, but there's another story: even 5 years ago, I don't think I could have done the necessary research; 10 years ago, it would have been impossible. The difference is not just the Internet, but also the older documents that have been digitized and made available. There are, however, a couple of caveats and warnings for the future.
I started with what seemed like an extremely hard problem: how could I learn about someone with a common name — Frank Miller — who published his book in 1882? From the book itself, I had exactly two hard facts: he lived in Sacramento, and he had 16 years of banking experience. To this, I added some background knowledge (Sacramento wasn't a large city then), some luck (he was in Sacramento and not, say, New York), some educated guesswork (16 years of experience in 1882 sounds like a post-(U.S.) Civil War career) — and the Internet. I was able to find an old, out-of-copyright book on the financial history of California. I was able to find another old, out-of-copyright book on the history of Sacramento. These, in turn, gave me the names of a few banks that I could put into my search queries. I was able to consult the online 1880 census database at FamilySearch.org, which told me that there were only two Frank Millers in Sacramento then, and only one was a banker. I was able to engage in rapid communication with assorted librarians and historians. I consulted a relatively recent book on the family's genealogy; though it was still in copyright, the author gave Google Books permission to scan it and make it available. All of this was extremely useful, but was in some sense "just" an accelerator; in principle, I could have done all of that manually, albeit it much more slowly. It's also questionable if I'd have expended the effort; I'm primarily a computer scientist, not a historian, and probably couldn't have justified the time and travel it would have taken.
More interestingly, though, I was able to do things that would have been effectively impossible without search engines. For example, Miller used the phrase "test word" for what I would call an "authenticator". Where did this come from? Via Google Books, I was able to establish that it once referred to a "password" people used to prove their membership in secretive social organizations like the Freemasons. I would never have thought of consulting an 1824 anti-Masonic screed. I also learned, the same way, that a rare 1876 codebook used it the way Miller did. I might have wanted to check that book, but as far as I can tell the only copy is at Oxford University; neither the Library of Congress nor the NSA museum's library has one. By World War I, there were special provisions in the censorship regulations for test words, and banks had special departments for generating and verifying them; again, I would not have found those references. (I should also note that the Oxford English Dictionary does not have either meaning listed, despite both meanings being well-attested in 19th and early 20th century books. I've emailed them.)
Indexing doesn't have to be free to be useful, though it helps. I was looking for connections between Miller and local military officers; I conjectured that they might have met at some social occasion. A military historian I contacted agreed that generally speaking, such contacts occurred in that era, and that consulting the Society pages of a San Francisco newspaper might answer the question. Fine — but there were literally hundreds and possibly thousands of archived papers I'd have to consult, probably on blurry microfiches or microfilms. Fortunately, the papers I wanted had been scanned and indexed, and the Columbia University library was able to get a trial subscription from the vendor. The ability to do that search in finite time was crucial to one point my paper makes: that Miller virtually certainly met and chatted with Parker Hitt, an important American cryptologist of the era.
There's a warning here, though: information that isn't indexed is much harder to find and use, and will tend to be ignored. I was in Boston recently and thought to check what codebooks the Boston Public Library had. Unfortunately, their electronic catalog only includes works acquired since 1974. The older card catalog had everything, of course, and they did microfilm it — but that catalog now appears to reside in two dusty (and as best I can tell, little used) volumes.
Information that one has to pay for is often only slightly more accessible. I checked the IEEE price for a recent short paper of mine: $30. If you need many such references, the cost is prohibitive. I'm lucky; I'm at a university with a superb library system and institutional subscriptions to very many journals. If I need other resources, the New York Public Library is just a short subway ride away. Independent researchers or those at smaller schools would have much more difficulty doing such research, and therein lies the danger: information is often becoming less accessible, not more. What worse, professional societies such as the ACM and IEEE prohibit authors from posting papers on their own web sites, because their traditional business model can't flourish that way.
It's also worth noting that most of the publications I consulted electronically were from before 1923. Partly, that's because of the era I was researching; however, it's also because of copyright restrictions: in general, works published before then are in the public domain. Google is scanning some later works, but the issue is tremendously controversial, especially for academic works.
In any field, people with more resources will, on the whole, do better; that's more or less by definition. However, we are stumbling towards a future where the difference is increasing, not decreasing. As libraries put more of their money into electronic resources, they'll have less to spend on traditional archival material. The information will be inaccessible to all but a privileged few, and we'll all be poorer.