March 2009
The White House Removes Videos from YouTube (2 March 2009
EFF's Surveillance Self-Defense Website (3 March 2009
Access to Old Information (8 March 2009
Internet Records Retention Bill (19 March 2009

The White House Removes Videos from YouTube

2 March 2009

In January, I wrote twice about the government and privacy when using YouTube videos. Chris Soghoian now reports that White House videos have been moved to Akamai. This is indeed a very good move from a privacy perspective.

Of course, now that that's settled, the Supreme Court is at least viewing YouTube videos as part of its deliberations. Does this allow Google to track the viewing habits of Supreme Court justices? Or do they take proper privacy precautions? (The video in question was posted to the Court's own web page — but hosted by the Court itself, rather than any outside provider.)

Update: the White House denies any change in policy. That's too bad — they did the right thing; why apologize for it? Are they afraid they'll be seen as having caved to "pressure"? Or is the concern that the new scheme won't work as well, so they want to be prepared in advance in case they have to regress?

EFF's Surveillance Self-Defense Website

3 March 2009

The Electronic Frontier Foundation (EFF) has a new website on Surveillance Self-Defense. (Not surprisingly, it — and as best I can tell the rest of the EFF's site — can be accessed via an encrypted channel. In fact, if you visit the SSD web site via unencrypted HTTP, you'll be redirected to the encrypted version.) It's a very good guide to the legal and technical issues surrounding surveillance; while I have a few minor quibbles, overall it's an excellent guide. (Caveat: while the technical information is broadly applicable, the legal discussion is specific to US law.)

Highly recommended.

Access to Old Information

8 March 2009

There has been a lot of controversy about Google's book-scanning project. A lot of the controversy concerns payments to authors of copyrighted works. While I'm all in favor of copyright — I earned a non-trivial amount of money from a book — some recent experiences have left me strongly in favor of the project, but worried about abuse of ownership of physical media.

I've recently been doing some research that has required access to some very old printed works. (Access to the Columbia University Libraries — definitely a world-class research resource — has been invaluable to this project. I owe a great debt of gratitude to the generations of librarians who have built and maintained such organizations.) As a result, I very much appreciate the efforts to scan in books. While there are very valid concerns about the permanence of digital file formats and media, books also decay over time. I've been working with 85-year-old books that are very fragile; they clearly will not be usable 85 years from now. The pages are yellow and crumbling, the bindings are failing, any use is a threat to the volume's existence. (Aside: there is a certain undeniable thrill to checking out a book for the first time in more than 70 years, judging from the dates stamped in the back. It brings to mind Gandalf's comment to Boromir in Lord of the Rings: ``there lies in Minas Tirith still, unread, I guess, by any save Saruman and myself since the kings failed, a scroll that Isildur made himself.'') But of what use is a physical volume of knowledge if no one can access it?

Sometimes, this has already been done. I visited Columbia's Rare Book and Manuscript Library, and felt privileged to be able to handle Smith's Secret Corresponding Vocabulary, an 1845 publication by Samuel Morse's business partner. But I was even happier when I found that Google had already scanned a copy. There is a danger, however: who owns the material? The following text appears in public domain books downloaded from Google's archive:

We designed Google Book Search for use by individuals, and we request that you use these files for personal, non-commercial purposes.
Is scholarly researdch a "personal, non-commercial purpose"? Note that this statement explicitly applies to works that are in the public domain.

The issue isn't simply electronic documents. Consider this work, which is a microfilm of a U.S. government document and hence never subject to copyright. The catalog entry bears this note:

Reuse of record except for individual research requires license from Congressional Information Services, Inc.
What is ``individual research''?

What we are seeing is the use of contract law to obtain rights not granted by copyright. If we are not careful, we will see public information locked up. Worse yet, digital records can be protected by so-called Digital Rights Management (DRM) technology, making them inaccessible except on terms dictated by the physical record's owner.

To be sure, a company that goes to the expense of scanning or photographing old books and documents is entitled to make a profit. That begs the question — at least two of them, in fact. We need to ask about the fate of public documents (such as government records) and about the role of libraries.

Government documents, in a very strong sense, belong to the people. It is perhaps reasonable to let a private company reproduce such documents and be compensated for its efforts, though I prefer the attitude that since public documents do belong to the people, they should be made available by the government itself. That said, if a private company is going to be the designated publisher, it should not control how the documents are used. Their profit should be from the collection — a collection copyright, if you will — rather than from the individual documents. It is certainly within the government's power to impose that restriction in any contracts it signs for such reproductions.

Libraries have a different problem. They frequently don't own the originals; however, their mission is to make information broadly available. By agreeing to stringent restrictions, above and beyond what would be permitted under the Fair Use doctrine of copyright law, they undermine their own goals. If libraries, as a matter of principle, decline to purchase works that are unduly encumbered, I suspect that many publishers will back off. (I doubt that there are many purchasers of the collected works of the War Department from a hundred years ago other than for research libraries.)

To sum up: there are two ways we can lose access to information, physical and legal. In avoiding the first problem, we have to take care that we do not create the second. This is especially serious for digitized materials, where technology can enforce restrictive contracts.

Internet Records Retention Bill

19 March 2009

A lot of pixels have been spilled lately over an Internet records retention bill recently introduced in both the House and the Senate. The goal is to fight child pornography. That's a worthwhile goal; however, I think these bills will do little to further it. Worse yet, I think that at least two of the provisions of the bill are likely to have bad side effects. Unfortunately, the text is quite bad; we will have to wait for regulatory action and/or overzealous prosecutors to see just how far the language will stretch.

The first troublesome provision is Section 3, which amends Chapter 95 of Title 18 of the U.S. Code to add

(a) Offense— Whoever, being an Internet content hosting provider or email service provider, knowingly engages in any conduct the provider knows or has reason to believe facilitates access to, or the possession of, child pornography (as defined in section 2256) shall be fined under this title or imprisoned not more than 10 years, or both.
This might criminalize things like onion routing, an important privacy-preserving technology. (Ironically, onion routing got its start at a government agency, the Navy Research Lab.) Since the clause is limited to "Internet content hosting providers" and "email service providers", most Tor nodes won't affected. Besides, very many Tor nodes are outside the country, so this provision likely won't hinder any would-be viewers of child pornography.

There are other infelicities in the definitions. "Internet content hosting provider" is defined broadly enough to include web caches; "Email service provider" requires that the site provide "transmission" and "retrieval" services, which excludes companies that offer only one. Besides, any networking technology "facilitates access to" all sorts of content, good and bad. Is the Internet being outlawed?

The records retention provision adds

(h) Retention of Certain Records and Information— A provider of an electronic communication service or remote computing service shall retain for a period of at least two years all records or other information pertaining to the identity of a user of a temporarily assigned network address the service assigns to that user.'.
to the end of 18 U.S.C. 2703. The problems start with the definitions. Given the location of the clause, the relevant definitions are in 18 U.S.C. 2510 and 18 U.S.C. 2711. These define "electronic communication service" and "remote computing service" but not a "provider" of those services. What does the clause mean? Is it intended just to cover ISPs? Probably not; elsewhere in that part of the law, there are explicit references to "providers … to the public". Who else is covered? Employers? Hotels? Universities? WiFi hotspots, free or not? Almost certainly, all of the above are included. Home users? Many people (myself included) have wireless routers or access points in our houses; clearly, any guest of mine or my family's is more than welcome to use our Internet connection. How am I supposed to "retain" logs of what IP addresses they get? By chance, I happened to need information just two days ago on what machines were associated with which access points. The logs kept by the devices were utterly useless for this purpose. Am I required to install a (currently non-existent) newer firmware release? for this purpose? Does anyone believe that the average home user is even slightly capable of finding such a thing, let alone installing it? By the way — as best I can tell, installing new firmware erases all of the current log information on my boxes… And of course, even if I do have such logs, all they would include are timestamps and MAC addresses. I do not retain records on who visits my house when (do I need a log book at the front door?); I have no idea what their devices' MAC addresses are. These people are my friends; I just give them the SSID and encryption key.

Of course, the law requires providers to "retain" records. Does that imply "create"? What if providers have no records? Must they start creating them? If not, home users would be excluded, but some ISPs may decide they don't need to create them, either.

The most troublesome provision of this clause is the restriction to "temporarily assigned network address the service assigns to that user". Presumably, this is intended to cover IP addresses assigned by DHCP or PPP. From a technical perspective, however, that clause is often useless, technically infeasible, or both. Many ISPs, some employers, and virtually all hotels and home WiFi routers use a technology known as Network Address Translation (NAT). When NAT is in use, the address assigned by DHCP is visible only within the provider's network. Law enforcement officials would have no access to this network until after they had identified a suspect. Log files or wiretaps from a child pornographer's site would reveal the external address of the client downloading the material, but that address implicates all users of provider during that period. The internal IP address is not visible on the outside.

Now — for every connection, a unique IP address and port number is allocated by the NAT box. This, coupled with accurate timestamps and DHCP records, would indeed identify the particular MAC address involved. However, this would require tracking every TCP connection of that user. Apart from being a gross invasion of privacy — does every hotel I stay at need to know every web site I visit? — it is probably infeasible to collect this data; there would be far too much of it. The Internet was never designed for this sort of fine-grained record-keeping.

There's another problem: what if the offender is using a Web proxy service? Many hotels and ISPs require use of these; any corporation with an application-level firewall does as well. In that case, the "guilty" IP address would be that of the proxy. Must proxies keep logs? No — the bill applies to "temporarily assigned network addresses", not proxy devices.

Then there's IPv6. It has a feature known as the Privacy Extensions for Stateless Address Autoconfiguration in IPv6. If this extension is in use, a computer can generate new IP addresses any time it wants to, precisely to avoid linkage across different transactions. This feature is not covered by the law, since the addresses are self-generated and not temporarily assigned by a provider.

I could go on, but I think the point is clear: the bill is poorly drafted, affects legitimate users, creates impossible requirements, leaves too much wiggle room for law enforcement mission creep — and doesn't do what it's intended to do.