The Early History of Usenet, Part VIII: Usenet Growth and B-news

30 November 2019

For quite a while, it looked like my prediction—one to two articles per day—was overly optimistic. By summer, there were only four new sites: Reed College, University of Oklahoma (at least, I think that that's what uucp node uok is), vax135, another Bell Labs machine—and, cruciallyy, U.C. Berkeley, which had a uucp connection to Bell Labs Research and was on the ARPANET.

In principle, even a slow rate of exponential growth can eventually take over the world. But that assumes that there are no “deaths” that will drive the growth rate negative. That isn't a reasaonable assumption, though. If nothing else, Jim Ellis, Tom Truscott, Steve Daniel, and I all planned to graduate. (We all succeeded in that goal.) If Usenet hadn't shown its worth to our successors by then, they'd have let it wither. For that matter, university faculty or Bell Labs management could have pulled the plug, too. Usenet could easily have died aborning. But the right person at Berkeley did the right thing.

Mary Horton was then a PhD student there. (After she graduated, she joined Bell Labs; she and I were two of the primary people who brought TCP/IP to the Labs, where it was sometimes known as the “datagram heresy”. The phone network was, of course, circuit-switched…) Known to her but unknown to us, there were two non-technical ARPANET mailing lists that would be of great interest to many potential Usenet users, HUMAN-NETS and SF-LOVERS. She set up a gateway that relayed these mailing lists into Usenet groups; these were at some point moved to the fa (“From ARPANET”) hierarchy. (For a more detailed telling of this part of the story, see Ronda Hauben's writings.) With an actual traffic source, it was easy to sell folks on the benefits of Usenet. People would have preferred a real ARPANET connection but that was rarely feasible and never something that a student could set up: ARPANET connections were restricted to places that had research contracts with DARPA. The gateway at Berkeley was, eventually, bidirectional for both Usenet and email; this enabled Usenet-style communication between the networks.

SF-LOVERS was, of course, for discussing science fiction; then as now, system administrators were likely to be serious science fiction fans. HUMAN-NETS is a bit harder to describe. Essentially, it dealt with the effect on society of widespread networking. If it still existed today, it would be a natural home for discussions of online privacy, fake news, and hate speech, as well as the positive aspects: access to much of the world's knowledge, including primary source materials that years ago were often hard to find, and better communications between people.

It is, in fact, unclear if the gateway was technically permissible. The ARPANET was intended for use by authorized ARPANET sites only; why was a link to another network allowed? The official reason, as I understand it, is that it was seen as a use by Berkeley, and thus passed muster; my actual impression is that it was viewed as an interesting experiment. The reason for the official restriction was to prevent a government-sponsored network from competing with then-embryonic private data networks; Usenet, being non-commercial, wasn't viewed as a threat.

Uucp email addresses, as seen on the ARPANET, were a combination of a uucp explicit path and an ARPANET hostname. This was before the domain name system; the ARPANET had a flat name space back then. My address would have been something like

research!duke!unc!smb@BERKELEY
but also
research!duke!unc!smb at BERKELEY
—in this era, " at " was accepted as a synonym for "@"…

With the growth in the number of sites came more newsgroups and more articles. This made the limitations of the A-news user interface painfully apparent. Mary designed a new scheme; a high school student, Matt Glickman, implemented what became B-news. There were many improvements.

The most important change was the ability to read articles by newsgroup, and to read them out of order. By contrast, A-news presented articles in order of arrival, and only stored the high-water mark of continuous articles read. The input file format changed, too, to one much more like email. Here's the sample from RFC 1036:

From: jerry@eagle.ATT.COM (Jerry Schwarz)
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
Newsgroups: news.announce
Subject: Usenet Etiquette — Please Read
Message-ID: <642@eagle.ATT.COM>
Date: Fri, 19 Nov 82 16:14:55 GMT
Followup-To: news.misc
Expires: Sat, 1 Jan 83 00:00:00 -0500
Organization: AT&T Bell Laboratories, Murray Hill

The body of the message comes here, after a blank line.

The most interesting change was the existence of both From: and Path: lines. The former was to be used for sending email; the latter was used to track which sites had already seen an article. There is also the implicit assumption that there would be a suitable ARPANET-to-uucp gateway, identified by a DNS MX record, to handle email relaying; at this time, such gateways were largely aspirational and mixed-mode addresses were still the norm.

B-news also introduced control messages. As noted, these were unauthenticated; mischief could and did result. Other than canceling messagse, the primary use was for the creation of new newsgroups—allowing them to be created willy-nilly didn't scale.

There was also control message support for mapping the network, which did not work as well as we expected. Briefly, the purpose of the senduuname message was to allow a site to calculate the shortest uucp path to a destination, both to relieve users of the mental effort to remember long paths and also to allow a shorter email path than simply retracing the Usenet path. (This was also a reliability feature; uucp email, especially across multiple hops, was not very reliable.) My code worked (and, after a 100% rewrite by Peter Honeyman) became my first published paper) but it was never properly integrated into mailers and the shorter paths were even less reliable than the long ones.

Finally, there were internal changes. A-news had used a single directory for all messages, but as the number of messages increased, that became a serious performance bottleneck. B-news use a directory per newsgroup, and eventually subdirectories that reflected the hierarchical structure.

The growth of Usenet had negative consequences, too: some sites became less willing to carry the load. Bell Labs Research had been a major forwarding site, but Doug McIlroy, then a department head, realized that the exponent in Usenet's growth rate was, in fact, significant, and that the forwarding load was threatening to overload the site—star networks don't scale. He ordered an end to email relaying. This could have been very, very serious; fortunately, there were a few other sites that had started to pick up the load, most notably decvax at Digital Equipment Corporations' Unix Engineering Group. This effort, spearheaded by Bill Shannon and Armando Stettner, was quite vital. Another crucial relay site was seismo, run by Rick Adams at the Center for Seismic Studies; Rick later went on to found UUNET, which became the first commercial ISP in the United States. At Bell Labs, ihnp4, run by Gary Murakami, became a central site, too. (Amusingly enough, even though I joined the Labs in late 1982, I did not create another hub: as a very junior person, I didn't feel that I could. But it wasn't because management didn't know about Usenet; indeed, on my first day on the job, my center's director (three levels up from me) greeted me with, “Hi, Steve—I've seen your flames on Netnews.” I learned very early that online posts can convey one's reputation…)

More on load issues in the next post.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part VII: The Public Announcement

25 November 2019

Our goal was to announce Usenet at the January, 1980 Usenix meeting. In those days, Usenix met at universities; it was a small, comaparatively informal organization, and didn't require hotel meeting rooms and the like. (I don't know just when Usenix started being a formal academic-style conference; I do know that it was no later than 1984, since I was on the program committee that year for what would later be called the Annual Technical Conference.) This meeting was in Boulder; I wasn't there, but Tom Truscott and Jim Ellis were.

Apart from the announcement itself, we of course needed non-experimental code—and my prototype was not going to cut it. Although I no longer remember precisely what deficiencies were in my C version, one likely issue was the ability to configure which neighboring sites would receive which newsgroups. Stephen Daniel, also at Duke CS, wrote the code that became known as “A-news”. One important change was the ability to have multiple hierarchies, rather than just the original “NET” or “NET.*”. (Aside: I said in a previous note that my C version had switched to “NET.*” for distributed groups, rather than the single NET. I'm now no longer sure of when that was introduced, in my C version or in Steve Daniel's version. He certainly supported other hierarchies; I certainly did not.) It was also possible in the production version to configure which groups or hierarchies a site would receive. For sanity's sake, this configuration would have to be in a file, rather than in an array built into the code.

That latter point was not always obvious. Uucp, as distributed, used an array to list the commands remote sites were permitted to execute:

char *Cmds[] = {
	"mail",
	"rmail",
	"lpr",
	"opr",
	"fsend",
	"fget",
	NULL
	};
/*  to remove restrictions from uuxqt
 *  redefine CMDOK 0
 *
 *  to add allowable commands, add to the list under Cmds[]
 */
To permit rnews to execute, a system administrator would have to change the source code (and most people had source code to Unix in those days) and recompile. This was, in hindsight, an obviously incorrect decision, but it arguably was justifiable in those days: what else should you be allowed to do? There were many, many fewer commands. (I should note: I no longer remember for certain what fsend, fget, or opr were. I think they were for sending and receiving files, and for printing to a Honeywell machine at the Bell Labs Murray Hill comp center. Think of the ancient GCOS field in /etc/passwd file.)

To work around this problem, we supplied a mail-to-rnews program: a sending site could email articles, rather than try to execute rnews directly. A clock-driven daemon would retrieve the email messages and pass them to rnews. And it had to be clock-driven: in those days, there was no way to have email delivered directly to a program or file. (A security feature? No, simply the simplicity that was then the guiding spirit of Unix. But yes, it certainly helped security.) The remote site configuration file in the A-news therefore needed to know a command to execute, too.

The formal announcement can be seen here. The HTML is easier on the eyes, but there are a few typos and even some missing text, so you may want to look at the scanned version linked to at the bottom. A few things stand out. First, as I noted in Part III, there was a provision for Duke to recover phone charges from sites it polled. There was clearly faculty support at Duke for the project. For that matter, faculty at UNC knew what I was doing.

A more interesting point is what we thought the wide-area use would be: "The first articles will probably concern bug fixes, trouble reports, and general cries for help." Given how focused on the system aspects we were, what we really meant was something like the eventual newsgroup comp.sys.unix-wizards. There was, then, a very strong culture of mutual assistance among programmers, not just in organizations like Usenix (which was originally, as I noted, the Unix Users' Group), but also in the IBM mainframe world. The Wikipedia article on SHARE explains this well:

A major resource of SHARE from the beginning was the SHARE library. Originally, IBM distributed what software it provided in source form and systems programmers commonly made small local additions or modifications and exchanged them with other users. The SHARE library and the process of distributed development it fostered was one of the major origins of open source software.

Another proposed use was locating interesting source code, but not flooding it to the network. Why not? Because software might be bulky, and phone calls then were expensive. The announcement estimates that nighttime phone rates were about US$.50 for three minutes; that sounds about right, though even within the US rates varied with distance. In that time, at 300 bps—30 bytes per second—you could send at most 5400 bytes; given protocol overhead, we conservatively estimated 3000 bytes, or a kilobyte per minute. To pick an arbitrary point of comparison, the source to uucp is about 120KB; at 1KB/sec, that's two hours, or US$20. Adjusting for inflation, that's over US$60 in today's money—and most people don't want most packages. And there was another issue: Duke only had two autodialers; there simply wasn't the bandwidth to send big files to many places, and trying to do so would block all news transfers to other sites. Instead, the proposal was for someone—Duke?—to be a central respository; software could then be retrieved on demand. This was a model later adopted by UUNET; more on it in the next installment of this series.

The most interesting thing, though, is what the announcement didn't talk about: any non-technical use. We completely missed social discussions, hobby discussions, politial discussions, or anything else like that. To the extent we considered it at all, it was for local use—after all, who would want to discuss such things with someone they'd never met?


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part VI: Authentication and Norms

22 November 2019

We knew that Usenet needed some sort of management system, and we knew that that would require some sort of authentication, for users, sites, and perhaps posts. We didn't add any, though—and why we didn't is an interesting story. (Note: much of this blog post is taken from an older post.)

The obvious solution was something involving public key cryptography, which we (the original developers of the protocol: Tom Truscott, the late Jim Ellis, and myself) knew about: all good geeks at the time had seen Martin Gardner's "Mathematical Games" column in the August 1977 issue of Scientific American (paywall), which explained both the concept of public key cryptography and the RSA algorithm. For that matter, Rivest, Shamir, and Adleman's technical paper had already appeared; we'd seen that, too. In fact, we had code available for trapdoor knapsack encryption: the xsend command for public key encryption and decryption, which we could have built upon, was part of 7th Edition Unix, and that's what is what Usenet ran on.

What we did not know was how to authenticate a site's public key. Today, we'd use certificate issued by a certificate authority. Certificates had been invented by then, but we didn't know about them, and of course there were no search engines to come to our aid. (Manual finding aids? Sure—but apart from the question of whether or not any accessible to us would have indexed bachelor's theses, we'd have had to know enough to even look. The RSA paper gave us no hints; it simply spoke of a "public file" or something like a phone book. It did speak of signed messages from a "computer network"—scare quotes in the original!—but we didn't have one of those except for Usenet itself. And a signed message is not a certificate.) Even if we did know, there were no certificate authorities, and we certainly couldn't create one along with creating Usenet.

Going beyond that, we did not know the correct parameters: how long a key to use (the estimates in the early papers were too low), what was secure (the xsend command used an algorithm that was broken a few years later), etc. Maybe some people could have made good guesses. We did not know and knew that we did not know.

The next thing we considered was neighbor authentication: each site could, at least in principle, know and authenticate its neighbors, due to the way the flooding algorithm worked. That idea didn't work, either. For one thing, it was trivial to impersonate a site that appeared to be further away. Every Usenet message contains a Path: line; someone trying to spoof a message would simply have to claim to be a few hops away. (This is how the famous kremvax prank worked.)

It was possible, barely, to have a separate uucp login for different sites, but apart from overhead for managing separate logins, it isn't clear that rnews could have handled it properly.

But there's a more subtle issue. Usenet messages were transmitted via a generic remote execution facility. The Usenet program on a given computer executed the Unix command

uux neighborsite!rnews
where neighborsite is the name of the next-hop computer on which the rnews command would be executed. (Before you ask: yes, the list of allowable remotely requested commands was very small; no, the security was not perfect. But that's not the issue I'm discussing here.) The trouble is that any knowledgeable user on a site could issue the uux command; it wasn't and couldn't easily be restricted to authorized users. Anyone could have generated their own fake control messages, without regard to authentication and sanity built in to the Usenet interface. And yes, we knew that at the time.

Could uux have been secured? This is itself a complex question that I don't want to go into now; please take it on faith and don't try to argue about setgid(), wrapper programs, and the like. It was our judgment then—and my judgment now—that such solutions would not be adopted. The minor configuration change needed to make rnews an acceptable command for remote execution was a sufficiently high hurdle that we provided alternate mechanisms for sites that wouldn't do it.

That left us with no good choices. The infrastructure for a cryptographic solution was lacking. The uux command rendered illusory any attempts at security via the Usenet programs themselves. We chose to do nothing. That is, we did not implement fake security that would give people the illusion of protection but not the reality.

This was the right choice.

But the story is more complex than that. It was the right choice in 1979 but not necessarily right later, for several reasons. The most important is that the online world in 1979 was very different than it is now. For one thing, since only a very few people had access to Usenet, mostly CS students and tech-literate employees of large, sophisticated companies—the norms were to some extent self-enforcing: if someone went too far astray, their school or employer could come down on them. And we did anticipate that some people would misbehave.

As I mentioned in the previous post, our projections of participation and volume were very low. On the one hand, a large network has much more need for management, including ways to deal with people and traffic that violates the norms. On the other, simply as a matter of statistics a large network will have at the least proportionately more malefactors. Furthermore, the increasing democratization of access meant that there were people who were not susceptible to school or employer pressure.

B-news (which I'll get to in a few days) did have control messages. They were necessary, useful—and abused. Spam messages were often countered by cancelbots, but of course cancelbots were not available only to the righteous. And online norms are not always what everyone wants them to be. The community was willing to act technically against the first large-scale spam outbreak, but other issues—a genuine neo-Nazi, posts to the misc.kids newsgroup by a member of NAMBLA, trolls on the soc.motss newsgroup, and more were dealt with by social pressure. (I should note: the first neo-Nazi appeared on Usenet very early on. And no, I'm not being even slightly hyperbolic when I call him that, but I won't give him more publicity by mentioning his name.)

There are several lessons here. One, of course, is that technical honesty is important. A second, though, is that the balance between security and functionality is not fixed—environments and hence needs change over time. B-news was around for a long time before cancel messages were used or abused on a large scale, and this mass good behavior was not because the insecurity wasn't recognized: when I had a job interview at Bell Labs in 1982, the first thing Dennis Ritchie said to me was "[B-news] is a tool of the devil!" A third lesson is that norms can matter, but that the community as a whole has to decide how to enforce them.

There's an amusing postscript to the public key cryptography issue. In 1979-1981, when the Usenet software was being written, there were no patents on public key cryptography nor had anyone heard about export licenses for cryptographic technology. If we'd been a bit more knowledgeable or a bit smarter, we'd have shipped software with such functionality. The code would have been very widespread before any patents were issued, making enforcement very difficult. On the other hand, Tom, Jim, Steve Daniel (who wrote the first released version of the software) and I might have had some very unpleasant conversations with the FBI. But the world of online cryptography would almost certainly have been very different. It's interesting to speculate on how things would have transpired if cryptography was widely used in the early 1980s.

As I alluded to above, we did anticipate possible trouble. In fact, the original public announcement warned about this:

  1. What about abuse of the network?
    In general, it will be straightforward to detect when abuse has occurred and who did it. The uucp system, like UNIX, is not designed to prevent abuses of overconsumption. Experience will show what uses of the net are in fact abuses, and what should be done about them.

    Certain abuses of the net can be serious indeed. As with ordinary abuses, they can be thought about, looked for, and even programmed against, but only experience will show what matters. Uucp provides some measure of protection. It runs as an ordinary user, and has strict access controls. It is safe to say that it poses no greater threat than that inherent in a call-in line.

  2. Who would be responsible when something bad happens?
    Not us! And we do not intend that any innocent bystander be held liable either. We are looking into this matter. Suggestions are solicited.
Clearly, we were mostly worried about hacking via uucp connections. (It's equally clear to those of who dealt with both uucp and security that we were quite naive in thinking that uucp was safe. There's a certain irony in the fact that both Jim Ellis and I ended up specializing in security…)

It seems, though, that we were worried about other abuses as well. The announcement mentions overconsumption of resources as a risk; we knew of that from an article we had seen by Dennis Ritchie in the Bell System Technical Journal. Quoting him:

The weakest area is in protecting against crashing, or at least crippling, the operation of the system. Most versions lack checks for overconsumption of certain resources, such as file space, total number of files, and number of processes (which are limited on a per-user basis in more recent versions). Running out of these things does not cause a crash, but will make the systemunusable for a period. When resource exhaustion occurs, it is generally evident what happened and who was responsible, so malicious actions are detetable, but the real problem is the accidental program bug.
Note the similarity between our "it will be straightforward…" and Ritchie's conclusion.

The bottom line, though, was that we really did not know what to do, nor even what sorts of problems would actually occur. I personally did worry about security to some extent—I actually caught my first hackers around 1971, when some activity generated a console message and I went and examined the punch cards(!) for the program involved—but it wasn't in any sense my primary focus. That said, when Morris and Thompson's famous paper on passwords appeared, I coded up a quick-and-dirty password guesser and informed some people about how bad their passwords were. (One answer I received: "It's not a problem that my password is 'abscissa'; no one else can spell it." Umm…) We would have received that issue of Communications of the ACM around the time that Usenet was being invented, but I do not recall when we saw it.

Were we worried about trolling and other forms of online misbehavior? From a vantage point of 40 years, it's hard to say. As mentioned earlier, we did anticipate people posting things like used car ads to inappropriate places. Of course, there was no way to anticipate what Usenet would become in just a few short years.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part V: Implementation and User Experience

21 November 2019

To understand some of our implementation choices, it's important to remember two things. First, the computers of that era were slow. The Unix machine at UNC's CS department was slower than most timesharing machines even for 1979—we had a small, slow disk, a slow CPU, and—most critically—not nearly enough RAM. Duke CS had a faster computer—they had an 11/70; we had an 11/45—but since I was doing the first implementation, I had to use what UNC had. (Log in remotely? How? There was no Internet then, neither department was on the ARPANET, and dialing up would have meant paying per-minute telephone charges, probably at daytime rates. Besides, a dial-up connection would have been at 300 bps, but if I stayed local I could do 9600 bps via the local Gandalf port selector.)

The second important point is that we knew we had to experiment to get things right. To quote from the first public announcement of Usenet, "Yes, there are problems. Several amateurs collaborated on this plan. But let's get started now. Once the net is in place, we can start a committee. And they will actually use the net, so they will know what the real problems are." None of us had designed a network protocol before; we knew that we'd have to experiment to get things even approximately right. (To be clear, we were not first-time programmers. We were all experienced system administrators, and while I don't know just how much experience Tom and Jim (and Dennis Rockwell, whose name I've inadvertently omitted in earlier posts) had, I'd been programming for about 14 years by this time, with much of my work at kernel level and with a fair amount of communications software experience.)

My strategy for development, then, was what today we would call rapid prototyping: I implemented the very first version of Netnews software as a Bourne Shell script. It was about 150 lines long, but implemented such features as multiple newsgroups and cross-posting.

Why did I use a shell script? First, simply compiling a program took a long time, longer than I wanted to wait each time I wanted to try something new. Second, a lot of the code was string-handling, and as anyone who has ever programmed in C knows, C is (to be charitable) not the best language for string-handling. Write a string library? I suppose I could have, but that's even more time spent enduring a very slow compilation process. Using a shell script let me try things quickly, and develop the code incrementally. Mind you, the shell script didn't execute that quickly, and it was far too slow for production use—but we knew that and didn't care, since it was never intended as a production program. It was a development prototype, intended for use when creating a suitable file format. Once that was nailed down, I recoded it in C. That code was never released publicly, but it was far more usable.

Unfortunately, the script itself no longer exists. (Neither does the C version.) I looked for it rather hard in the early 1990s but could not find a copy. However, I do remember a few of the implementation details. The list of newsgroups a user subscribed to was an environment variable set in their .profile file:

export NETNEWS="*"
or
export NETNEWS="NET admin social cars tri.*"
Why? It made it simple for the script to do something like
cd $NEWSHOME
groups=`echo $NETNEWS`
That is, it would emit the names of any directories whose names matched the shell pattern in the NETNEWS environment variable.

To find unread articles, the script did (the equivalent of)

newsitems=`find $groups -type f -newer $HOME/.netnews -print`
This would find all articles received since the last time the user had read news. Before exiting, the script would touch $HOME/.netnews to mark it with the current time. (More on this aspect below.)

There were a few more tricks. I didn't want to display cross-posted articles more than once, so the script did

ls -tri $newsitems | sort -n | uniq
to list the "i-node" numbers of each file and delete all but the first copy of duplicates: cross-posted articles appeared as single files linked to from multiple directories. Apart from enabling this simple technique for finding duplicates, it saved disk space, at a time when disk space was expensive. (Those who know Unix shell programming are undoubtedly dying to inform me that the uniq command as shown above would not do quite what I just said. Yup, quite right. How I handled that—and handle it I did—I leave as an exercise for the reader. And to be certain that your solution works, make sure that you stick to the commands available to me in 1979.)

I mentioned above that the script would only update the last-read time on successful exit. That's right: there was no way in this version to read things out of order, skip articles and come back to them later, or even stop reading part-way through the day's news feed without seeing everything you just read again. It seems like a preposterous decision, but it was due to one of my most laughable errors: I predicted that the maximum Usenet volume, ever, would never exceed 1-2 articles per day. Traffic today seems to be over 60 tebibytes per day, with more than 100,000,000 posts per day. As I noted last year, I'm proud to have had the opportunity to co-create something successful enough that my volume prediction could be off by so many orders of magnitude.

There are a couple of other points worth noting. The first line of each message contained the article-ID: the sitename, a period, and a sequence number. This guaranteed uniqueness: each site would number its own articles, with no need for global coordination. (There is the tacit assumption that each site would have a unique name. Broadly speaking, this was true, though the penchant among system administrators for using science fiction names and places meant that there were some collisions, though generally not among the machines that made up Usenet.) We also used the article-ID as the filename to store it. However, this decision had an implicit limitation. Site names were, as I recall, limited to 8 characters; filenames in that era were limited to 14. This meant that the sequence number was limited to five digits, right? No, not quite. The "site.sequence" format is a transfer format; using it as a filename is an implementation decision. Had I seen the need, I could easily have created a per-site directory. Given my traffic volume assumptions, there was obviously no need, especially for a prototype.

There was another, more subtle, reason for simply using the article-ID as the filename and using the existence of the file and its last-modified time as the sole metadata about the articles. The alternative—obviously more powerful and more flexible—would be to have some sort of database. However, having a single database of all news items on the system would have required some sort of locking mechanism to avoid race conditions, and locking wasn't easy on 7th Edition Unix. There was also no mechanism for inter-process communication except via pipes, and pipes are only useful for processes descended from the same parent who would have had to create the pipes before creating the processes. We chose to rely on the file system, which had its own, internal locks to guarantee consistency. There are a number of disadvanatages to that approach—but they don't matter if you're only receiving 1-2 articles per day.

The path to the creator served two purposes. One, it indicated which sites should not be sent a copy of newly received articles. Second, by intent it was a valid uucp email address; this permitted email replies to the poster.

I had noted a discrepancy beteween the date format in the announcement and that in RFC 850. On further reflection, I now believe that the announcement got it right and the RFC is wrong. Why? The format the announcement shows is exactly what was emitted by the date command; in a shell script, that would have been a very easy string to use, while deleting the time zone would have been more work. I do not think I would have gone to any trouble to delete something that was pretty clearly important.

The user interface was designed to resemble that of the 7th Edition mail command. It was simple, and worked decently for low-volume mail environments. We also figured that almost all Usenet readers would already be familiar with it.

To send an article to a neighboring system, we used the remote execution component of uucp.

uux $dest_site!rnews <$article_file_name
That is, on the remote system whose name is in the shell variable dest_site, execute the rnews command after first transfering the specified file to use as its standard input. This require that the receiving site allow execution of rnews, which in turn required that they recompile uucp (configuration files? What configuration files?), which was an obstacle for some; more on that in part VI.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part IV: File Format

17 November 2019

When we set out to design the over-the-wire file format, we were certain of one thing: we wouldn't get it perfectly right. That led to our first decison: the very first character of the transmitted file would be the letter "A", for the version. Why not a number on the first line, including perhaps a decimal point? If we ever considered that, I have no recollection of it.

A more interesting question is why we didn't use email-style headers, a style later adopted for HTTP. The answer, I think, is that few, if any, of us had any experience with those protocols at that time. My own personal awareness of them started when I requested and received a copy of the Internet Protocol Transition Workbook a couple of years later—but I was only aware of it because of Usenet. (A few years earlier, I gained a fair amount of knowledge of the ARPANET from the user level, but I concentrated more on learning Multics.)

Instead, we opted for the minimalist style epitomized by 7th Edition Unix. In fact, even if we had known of the Internet (in those days, ARPANET) style, we may have eschewed it anyway. Per a later discussion of implementation, the very first version of our code was a shell script. Dealing with entire lines as single units, and not trying to parse headers that allowed arbitrary case, optional white space, and continuation lines was certainly simpler!

The next question was what to do about duplicate articles. One obvious necessity is an article ID, since that would allow duplicate detection. In our design, the article ID was the rest of the first line, after the A. (Note: it's been 40 years and I no longer remember exactly what we decided at that meeting. Per the implementation discussion, there was experimentation and change. The details I'm giving here are taken from the final format as documented in RFC 850, but there is no doubt that there were changes during development.)

We also wanted to minimize transfer costs. As I noted in the previous post, article transmission was by expensive, dial-up connections; sending something that wasn't needed would cost real money. Accordingly, articles had to include a list of systems known to have already seen the article. This consisted of a series of hostnames separated by exclamation points, with the last element being the login name of the user who posted it. Thus, an article created by me at UNC Chapel Hill and relayed through Duke and alice, a computer at Bell Labs Research, would contain "alice!duke!unc!smb". If a possible next hop appeared in the path, the duplicate copy would not be sent. (Yes, that meant that it was easy to ensure that some sites would never see some articles. To my recollection, we did not worry about that issue and perhaps didn't even notice it.)

Why did we pick that format, instead of something like commas or blanks as separators? The format we chose was that used by uucp for email relaying; someone at some computer that alice talked to could type

mail alice!duke!unc!smb
and it would be relayed through alice and duke before reaching my department's computer and then me. (That sort of email relaying was to prove problematic; again, more on that later.)

Today, with full connectivity over the Internet, we wouldn't do things the same way. Instead, one party would send the next a list of article IDs; that party would then request the ones it had not yet seen. We did consider something like that, but rejected it. Why? Because we were using infrequent, dial-up connections to relay articles, and the number of loops (and hence duplicate articles received) seemed unlikely to be high.

Consider: in our original scheme, many sites would be polled once per night by Duke. If, during that call, Duke sent them a list of articles, they couldn't request it until the next night, and wouldn't receive them until the following night. That amount of delay was unacceptable. Instead, we accepted the chance of sending unnecessary text. While there certainly would be extra transmissions some of the time, we felt that the amount would not be prohibitive—this was before JPG and before MP3, so articles were entirely text and hence would be relatively small and thus cheap.

Sending a date and an article title were obvious enough that these didn't even merit much discussion. The date and time line used the format generated by the ctime() or asctime() library routines. I do not recall if we normalized the date and time to UTC or just ignored the question; clearly, the former would have been the proper choice. (There is an interesting discrepancy here. A reproduction of the original announcement clearly shows a time zone. Neither the RFC nor the ctime() routine had one. I suspect that announcement was correct.) The most interesting question, though, was about what came to be called newsgroups.

We decided, from the beginning, that we needed multiple categories of articles—newsgroups. For local use, there might be one for academic matters ("Doctoral orals start two weeks from tomorrow"), social activities ("Reminder: the spring picnic is Sunday!"), and more. But what about remote sites? The original design had one relayed newsgroup: NET. That is, there would be no distinction between different categories of non-local articles.

This approach was hotly debated. Was it really the case that there would be so little traffic of interest beyond the local machine that no further categorization was needed? (Our estimates of traffic volume were very, very wrong, and this error affected several implementation decisions.) The objection that carried the day: "What if someone wants to sell their car? They want it to reach other computers in the geographical area, but not beyond." We instead decided that anything in newsgroups beginning "NET." would be relayed. This, though, created a problem that is still not resolved: we conflated the notions of interest with the scope of relaying. That is, suppose that instead of duke and unc being directly connected, both sites spoke to alice. Material of regional interest—the two schools were only about 16 km apart—should be seen on both sites, but there would be no reason to send such items as used car ads to a Bell Labs machine in New Jersey. (Aside: years later, when Usenet was already reasonably widespread, someone posted a used car ad to a group with world-wide distribution. The author was rather confused when several of the original designers sent him congratulatory notes…)

There was one more interesting point. From the very beginning, we knew that some articles belonged in more than one category. We therefore supported cross-posting to multiple newsgroups from the very beginning. Cross-posting later came to be seen as impolite, but it was an intentional feature from the very beginning.

Using the example from RFC 850, the final format of a news article looked like this:

Aeagle.642
net.general
cbosgd!mhuxj!mhuxt!eagle!jerry
Fri Nov 19 16:14:55 1982
Usenet Etiquette - Please Read
The body of the article comes here, with no blank line.

We decided on one last issue at the meeting: the name of our system. We called the technology "Netnews"—Network News—and the particular instantiation we hoped for was "Usenix". Why Usenix? The Wikipedia article (as of the 17 November 2019 version) has it almost right: "The name 'Usenet' emphasizes its creators' hope that the USENIX organization would take an active role in its operation." However, there was a bit more. Until some time in 1979, the organization now known as Usenix was called the Unix User's Group. But Bell Labs' lawyers took exception to this use of their trademark, so a new name was chosen: Usenix. The technical folks, being innocent in the ways of lawyers, were bemused by this. Part of our reason for the name "Usenet" was as a gentle tease about this forced renaming.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part III: Hardware and Economics

15 November 2019

There was a planning meeting for what became Usenet at Duke CS. We knew three things, and three things only: we wanted something that could be used locally for administrative messages, we wanted a networked system, and we would use uucp for intersite communication. This last decision was more or less by default: there were no other possibilities available to us or to most other sites that ran standard Unix. Furthermore, all you needed to run uucp was a single dial-up modem port. (I do not remember who had the initial idea for a networked system, but I think it was Tom Truscott and the late Jim Ellis, both grad students at Duke.)

There was a problem with this last option, though: who would do the dialing? The problems were both economic and technical-economic. The latter issue was rooted in the regulatory climate of the time: hard-wired modems were quite unusual, and ones that could automatically dial were all but non-existent. (The famous Hayes Smartmodem was still a few years in the future.) The official solution was a leased Bell 801 autodialer and a DEC DN11 peripheral as the interface between the computer and the Bell 801. This was a non-starter for a skunkworks project; it was hard enough to manage one-time purchases like a modem or a DN11, but getting faculty to pay monthly lease costs for the autodialer just wasn't going to happen. Fortunately, Tom and Jim had already solved that problem.

There was one type of modem that was affordable and purchasable: the acoustic coupler. It works just like it sounds like it would work: you put the handset of a phone into tight-fitting cups, while connecting the electronic side to the computer. So: when the computer sent bits, the acoustic coupler emitted actual sounds via a small speaker and sent them into the handset's microphone. Similarly, a microphone in the acoustic coupler listened to noises corresponding to 0 or 1 bits and sent the appropriate voltage signal to the computer. Since the only connection to the phone network was via sounds, the phone company couldn't object. (Well, AT&T had tried to, several years earlier, but were slapped down.)

This worked great for manual dialing—you picked up the handset, dialed the number, and put the handset into the coupler—but how could a computer dial using one of these? That's where the cleverness came in. The computer was connected to the acoustic coupler via a standard known as RS-232. There were five pins of major interest: ground, transmit, receive, carrier detect (CD), and data terminal ready (DTR). (Aside: the full set of pins was far more complex, and back in the mists of time I had to deal with such arcana regularly. I won't go into details, but I became intimately familiar with things like breakout boxes and null modems. You do not want details on these…) Briefly, when the computer wanted to use the modem—that is, when a program opened the serial port—the computer would assert the DTR signal; when the modem was connected, it would send CD back to the computer. If the far end dropped the connection, the modem would drop CD; this signal was passed back to the program. And if you're a Unix programmer, you now know where the SIGHUP signal came from. Playing clever, hardware-assisted games with the DTR signal was the solution.

Duke implemented it first, but I thought the idea was sufficiently clever that I took it back to Chapel Hill and designed my own variant. The two solutions differed in detail, but since I know mine better and it was cleaner in a number of respects I'll describe mine.

When a landline phone is not in use (that is, when it's "on-hook"), it presents an open circuit to the phone line. (That's not strictly true, since it had to be able to ring, but rings are AC signals, with a capacitor in series with the ringer; it thus appeared as an open circuit to the DC line signal.) To simulate on-hook, we put a normally open relay in series with the phone line. (It seems to be distinctly possible that the phone company would have regarded this as an improper way of connecting to the phone network, but I wouldn't actually know that, would I?) When the computer wanted to use the modem, it asserted DTR; we wired the DTR line to close the relay and thus take the phone line off-hook. In other words, when the computer opened the device (for us, /dev/ttyz7, since we had a DZ11 terminal adapter and put the modem on port 7), it phone would go off-hook; when the device was closed, it went on-hook.

But how to dial? It turns out that old-fashioned rotary dial phones worked by interrupting the circuit briefly. When you dial, say, a 3, there are 3 momentary on-hook signals. This is called pulse dialing. The pulses are sent at 10 per second, with (in the US) a 2:3 make-break ratio. That is, for each dial pulse, the circuit would be interrupted for .6 seconds and then back on for .4 seconds before the next pulse. It turned out to be feasible to do this via softare control of the DTR signal. We couldn't match the exact timing specs—the clock on the Unix systems of the time interrupted at 60 Hz—but we could do 1:2 (four timer ticks on-hook to six off-hook) and that was good enough. (Some of you may wonder if it was possible to dial calls by tapping the hook switch at the right frequency for the right number of pulses, a technique that could prove useful if there was, say, a dial lock in place. That is, umm, possible.)

That solved the hardware side of the dialing problem: software control of the DTR line could, via this relay, take a phone off-hook and pulse-dial desired numbers. However, we still needed a software interface. I wrote a driver that was compatible with the official DN11 driver, but instead talked to the DZ11 driver's routines that controlled the DTR line. This strategy thus presented the modem and dialer as two separate devices, which is what the application software, e.g., uucp expected. (Again, Duke had a similar solution first. I worked with an excellent electronics tech at UNC Chapel Hill for the hardware side of ours. When, a year or so later, we bought a 1200 bps hardwired modem, he redesigned the funky dialer relay to handle the more complex signaling and timing requirements of this modem.)

Having solved the autodialer problem, we had to confront another problem: who was going to pay for the phone calls? Phone calls back then were remarkably expensive, and (even domestically) depended on distance and time of day. Calls during business hours cost the most; evening calls cost less, and late-night calls cost the least.

The solution agreed upon was simple: Duke had one of the few autodialers, so it would have to make the calls. Sites that wanted to join our network would have to set up an auto-answer modem (not exactly common, but much more common than dialers) and reimburse Duke for the phone calls. The calls would be made once or twice per night, depending on desire (i.e., willingness to pay) and traffic. Propagation of articles would be slow, but since Duke was the hub of this network they'd see all news articles sooner than anyone else.

We also knew that the system would not be strictly hub-and-spoke. In fact, the original network had four nodes (duke, duke34, a PDP-11/34, phs, the Physiology Department, and unc) with a loop: duke-phs, duke-unc, and phs-unc. We thus needed a protocol that could handle loops, but that's the topic of the next post. In addition, Tom had interned at the Computer Science Research Group at Bell Labs (the organization from which Unix had come originally) and believed that his contacts there would be willing to call Duke to send and receive traffic. This solution worked for a while, but more on that later.

Note carefully that the original scheme involved money changing hands. In other words, management (i.e., faculty members) had to be very aware of this activity. People in the Duke and UNC CS department's business offices were going to see sudden spikes in phone bills, and Duke was going to have to receive and process payments that other sites were going to have to make. Usenet was a skunkworks project, but one that had official sanction: we had faculty members who were wise enough to value innovation by grad students.

Next up: the original protocol design.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part II: The Technological Setting

14 November 2019

Usenet—Netnews—was conceived almost exactly 40 years ago this month. To understand where it came from and why certain decisions were made the way they were, it's important to understand the technological constraints of the time.

Metanote: this is a personal history as I remember it. None of us were taking notes at the time; it's entirely possible that errors have crept in, especially since my brain cells do not even have parity checking, let alone ECC. Please send any corrections.

In 1979, mainframes still walked the earth. In fact, they were the dominant form of computing. The IBM PC was about two years in the future; the microcomputers of the time, as they were known, had too little capability for more or less anything serious. For some purposes, especially in research labs and process control systems, so-called minicomputers—which were small, only the size of one or two full-size refrigerators—were used. So-called "super-minis", which had the raw CPU power of a mainframe though not the I/O bandwidth, were starting to become available.

At the time, Unix ran on a popular line of minicomputers, the Digital Equipment Corporation (DEC) PDP-11. The PDP-11 had a 16-bit address space (though with the right OS, you could quasi-double that by using one 16-bit address space for instructions and a separate one for data); depending on the model, memory was limited to the 10s of kilobytes (yes, kilobytes) to a very few megabytes. No one program could access more than 64K at a time, but the extra physical memory meant that a context switch could often be done without swapping, since other processes might still be memory-resident. (Note well: I said "swapping", not "paging"; the Unix of the time did not implement paging. There was too little memory per process to make it worthwhile; it was easier to just write the whole thing out to disk…)

For most people, networking was non-existent. The ARPANET existed (and I had used it by then), but to be on it you had be a defense contractor or a university with a research contract from DARPA. IBM had assorted forms of networking based on leased synchronous communications lines (plus some older mechanisms for dial-up batch remote job entry), and there was at least one public packet-switched network, but very, very few places had connections to it. The only thing that was halfway common was the dial-up modem, which ran at 300 bps. The Bell 212A full-duplex, dial-up modem had just been introduced but it was rare. Why? Because you more or less had to lease it from the phone company: Ma Bell, more formally known as AT&T. It was technically legal to buy your own modems, but to hardwire them to the phone network required going through a leased adapter known as a DAA (data access arrangement) to "protect the phone network". (Explaining that would take a far deeper dive into telephony regulation than I have the energy for tonight.) Usenet originated in a slightly different regulatory environment, though: Duke University was served by Duke Telecom, a university entity (and Durham was GTE territory), while UNC Chapel Hill, where I was a student, was served by Chapel Hill Telephone—the university owned the phone, power, water, and sewer systems, though around this time the state legislature ordered that the utilities be divested.

There was one more piece to the puzzle: the computing environments at UNC and Duke computer science. Duke had a PDP-11/70, then the high-end model, running Unix. We had a PDP-11/45 intended as a dedicated machine for molecular graphics modeling; it ran DOS, a minor DEC operating system. It had a few extra terminal ports, but these didn't even have modem control lines, i.e., the ports couldn't tell if the line had dropped. We hooked these to the university computer center's Gandalf port selector. With assistance from Duke, I and a few others brought up 6th Edition Unix on our PDP-11, as a part-time OS. Some of the faculty were interested enough that they scrounged enough money to buy a better 8-port terminal adapter and some more RAM (which might have been core storage, though around that time semiconductor RAM was starting to become affordable). We got a pair of VAX-11/780s soon afterwards, but Usenet originated on this small, slow 11/45.

The immediate impetus for Usenet was the desire to upgrade to 7th Edition Unix. On 6th Edition Unix, Duke had used a modification they got from elsewhere to provide an announcement facility to send messages to users when they logged in. It wasn't desirable to always send such messages; at 300 bps—30 characters a second—a five-line message took annoying long to print (and yes, I do mean "print" and not "display"; hardcopy terminals were still very, very common). This modification was not even vaguely compatible with the login command on 7th Edition; a completely new implementation was necessary. And 7th Edition had uucp (Unix-to-Unix Copy), a dial-up networking facility. This set the stage for Usenet.

To be continued…


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part I: Prologue

14 November 2019

November 2019 is, as best I can recall, the 40th anniversary of the conception of Usenet. (What's Usenet? The Wikipedia article is ok but not perfect.) I should have written a proper paper; instead, there will (probably) be an irregular series of blog posts. I'll do Part I of N tonight.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata…

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Crypto Wars Resume

7 October 2019

For decades, the US government has fought against widespread, strong encryption. For about as long, privacy advocates and technologists have fought for widespread, strong encryption, to protect not just privacy but also as a tool to secure our computers and our data. The government has proposed a variety of access mechanisms and mandates to permit them to decrypt (lawfully) obtained content; technologists have asserted that "back doors" are inherently insecure. (James Comey used the phrase "golden key"; the neutral term is "exceptional access".)

I personally have been involved with this issue for more than 25 years, and in a fairly strong sense I have nothing new to say—as I and others explained four years ago, from a technical perspective exceptional access is a thoroughly bad idea: it will create insecurity. Cryptography is a complex, subtle discipline; it's really, really hard to get even the basics right. Adding new, unusual requirements creates a high likelihood that there will be new vulnerabilities.

Despite all that, U.S. Attorney-General William Barr has now issued a new call for Facebook to add exceptional access features to its WhatsApp encrypted communications platform. The evils he cites— terrorism, organized crime and child pornography—are indeed evils; I don't think most people would dispute that. But his focus on Facebook is a significant change in direction and, arguably, an esclation of the battle over cryptography.

There is, broadly speaking, a consensus that the exceptional access problem is easier (note: I did not say easy) for devices, and in particular for phones, than for communications. Many reasons are given in the excellent Carnegie Foundation report on the problem; I'll note one more: because secure communications generally require interaction between the parties, there are many more opportunities to get things wrong. By contrast, when law enforcement presents an encrypted phone, all of the cryptography has already taken place. Encrypting objects still isn't easy—witness these new attacks on encrypted PDF files—but the attack surface is smaller.

Why, then, the escalation? Why is Barr going for everything, rather than seeing if there is a feasible solution for encrypted phones? Does he judge that the political moment is right? Is it because Facebook is politically weak right now? Or is it because law enforcement can read devices now?

What is a Security Mechanism?

12 September 2019

Orin Kerr recently blogged about a 9th Circuit decision that held that scraping a public web site (probably) doesn't violate the Computer Fraud and Abuse Act (CFAA). Quoting the opinion (and I copied the quote from that blog post):

For all these reasons, it appears that the CFAA's prohibition on accessing a computer "without authorization" is violated when a person circumvents a computer's generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user's accessing that publicly available data will not constitute access without authorization under the CFAA.
On its surface, it makes sense—you can't steal something that's public—but I think the simplicity of the rule is hiding some very deep questions. One, I think, can most easily be expressed as "what is the cost of the 'attack'"? That is, how much effort must someone expend to get the data? Does that matter? Should it?

Let's start with the Court's example: it is hacking (more precisely, a CFAA violation) if someone bypasses a username and password requirement. But what is the role of the username and password? Is it intended as an actual barrier or as a sign saying "Authorized Personnel Only"? Does it matter if the site has trivial password limitations, e.g., 2 digits only?

More concretely, imagine a badly coded website, where you're prompted for a login and password if you visit the home page, but not if you go directly to some internal page. (For the record, it's really easy for a neophyte to implement something this badly.) Is that a suitable barrier or warning? What if someone else links to an internal page (as I've done, above, to a blog post)? Is clicking on that link, and thus never even seeing the password prompt, a CFAA violation? It's hard to see how the answer could be "yes", but if you think that that example is too contrived, what about a misconfigured firewall that inadvertently permits access to the interior of a corporate net—is someone who stumbles on that access liable? That's a very subtle kind of error, and one that's easy to make.

There are, of course, other forms of access control. One of the simplest is address-based access control: only certain IP addresses may access a certain resource. It's long been known to be weak, but it's still used quite frequently, especially on Intranets. Is this a "generally applicable rule"? Is there a difference between an an address rule that says "these three IP addresses may have access" and "anyone but these three may have access"? Mathematically, they're identical, and it's actually not harder to specify the latter than the former; one doesn't have to write 4,294,967,293 separate "allow" rules. Does it matter if a blocked party changes their IP address to evade the blockage? What if their ISP happens to change it, as some consumer ISPs do quite regularly?

I should note that one common use for such restrictions is geoblocking: excluding certain locations from access to content. This may be major league baseball videos (they're blacked out in areas where there is a local TV channel that carries those games), movies for which a site does not have a world-wide license, and even online gambling if it's in violation of local laws (as in the US). If someone uses a VPN to evade such a restriction, is that a CFAA offense? What if they use Tor, not to evade the restriction but because they value their privacy but just happen to gain access?

There have also been systems that relied on, more or less, just a username or equivalent, and not a password. One of the best-known cases is that of Andrew "weev" Auernheimer; he and a colleague noticed that a database of AT&T customers could be accessed just by knowing the ICCID from an iPad's SIM. For that particular situation, it was possible to enumerate the namespace. Was that hacking? In a controversial move, the Justice Department prosecuted; hs conviction was eventually overturned on rather legalistic grounds, and the underlying CFAA issue was never squarely addressed.

Does it matter how hard it is to enumerate the namespace? Suppose the account numbers were sequential, in which case given a single number it's trivial to find the others. What if the odds on a random number being valid were 1:1,000,000? 1:1,000,000,000,000? Does it matter? Should it?

What all of these scenarios have in common is that they reflect a different degree of effort to gain access to some resource. Sometimes, the effort necessary is known to or knowable by the defender; other times, it may not be. My questions, then, are these:

I don't know the answers to any of these questions, but I think that they're important. Some situations, e.g., intentionally working around a password requirement, are pretty clearly (all other things being equal, which they may not be; see Orin's blog post for that) on the wrong side of the law. An address block where a "access unauthorized" message is displayed may also be clear, which suggests that the real issue of access control is intent and warning. But even there, there are numerous subtleties that are beyond the control of the defender.

Consider a situation where a firewall implements an address-based access control mechanism. Furthermore, the firewall is configured to return an ICMP Administratively Prohibited packet when it sees an unauthorized IP address attempting to connect. How will the requester's software display the error? Will it even know about the prohibition, as opposed to the simple fact that the destination isn't reachable? Does the exact language of the technical specification matter? It says

A Destination Unreachable message that is received MUST be reported to the transport layer. The transport layer SHOULD use the information appropriately
In standards-speak, "SHOULD" is defined:
This word or the adjective "RECOMMENDED" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course.
In other words, perhaps some network implementor did not pass on the code, in which case the application couldn't know.

We seem, then, to be stuck. The court's decision seems to imply the warning aspect as crucial, but sites can't always warn people. And why is a password more of a warning than an explicit communication, as was in fact the case here?