Y2038: It's a Threat

19 January 2020

Last month, for the 20th anniversary of Y2K, I was asked about my experiences. (Short answer: there really was a serious potential problem, but disaster was averted by a lot of hard work by a lot of unsung programmers.) I joked that, per this T-shirt I got from a friend, the real problem would be on January 19, 2038, and 03:14:08 GMT.

Picture of a T-shirt saying that Y2K
is harmless but that 2038 is dangerous

Why might that date be such a problem?

On Unix-derived systems, including Linux and MacOS, time is stored internally as the number of seconds since midnight GMT, January 1, 1970, a time known as “the Epoch”. Back when Unix was created, timestamps were stored in a 32-bit number. Well, like any fixed-size value, only a limited range of numbers can be stored in 32 bits: numbers from -2,147,483,648 to 2,147,483,647. (Without going into technical details, the first of those 32 bits is used to denote a negative number. The asymmetry in range is to allow for zero.)

I immediately got pushback: did I really think that 18 years hence, people would still be using 32-bit systems? Modern computers use 64-bit integers, which can allow for times up to 9,223,372,036,854,775,807 seconds since the Epoch. (What date is that? I didn't bother to calculate it, but it's about 292,271,023,045 years, a date that's well beyond when it is projected that the Sun will run out of fuel. I don't propose to worry about computer timestamps after that.)

It turns out, though, that just as with Y2K, the problems don't start when the magic date hits; rather, they start when a computer first encounters dates after the rollover point, and that can be a lot earlier. In fact, I just had such an experience.

A colleague sent me a file from his Windows machine; looking at the contents, I saw this.

$ unzip -l zipfile.zip
Archive: zipfile.zip
Length Date Time Name
——— ———- —— —-
2411339 01-01-2103 00:00 Anatomy…
——— ——-

Look at that date: it's in the next century! (No, I don't know how that happened.) But when I looked at it after extracting on my computer, the date was well in the past:

$ ls -l Anatomy…
-rw-r—r—@ 1 smb staff 2411339 Nov 24 1966 Anatomy…
Huh?

After a quick bit of coding, I found that the on-disk modification time of the extracted file was 4,197,067,200 seconds since the Epoch. That's larger than the limit! But it's worse than that. I translated the number to hexadecimal (base 16), which computer programmers use as an easy way to display the binary values that computers use internally. It came to FA2A29C0. (Since base 16 needs six more digits than our customary base 10, we use the letters A–F to represent them.) The first “F”, in binary, is 1111. And the first of those bits is the so-called sign bit, the bit that tells whether or not the number is negative. The value of FA2A29C0, if treated as a signed, 32-bit number, is -97,900,096, or about 3.1 years before the Epoch. Yup, that corresponds exactly to the Nov 24, 1966 date my system displayed. (Why should +4,197,067,200 come out to -97,900,096? As I indicated, that's moderately technical, but if you want to learn the gory details, the magic search phrase is “2's complement”.)

So what happened? MacOS does use 64-bit time values, so there shouldn't have been a problem. But the “ls” command (and the Finder graphical application) do do some date arithmetic. I suspect that there is old code that is using a 32-bit variable, thus causing the incorrect display.

For fun, I copied the zip file to a Linux system. It got it right, on extraction and display:

$ ls -l Anatomy…
-rw-r—r— 1 smb faculty 2411339 Jan 2 2103 Anatomy…
(Why Januaray 2 instead of January 1? I don't know for sure; my guess is time zones.)

So: there are clearly some Y2038 bugs in MacOS, today. In other words, we already have a problem. And I'm certain that these aren't the only ones, and that we'll be seeing more over the next 18 years.


Update: I should have linked to this thread about a more costly Y2038 incident.

The Early History of Usenet, Part XI: Errata

9 January 2020

I managed to conflate RFCs 733 and 822, and wrote 722 in the last poast. That's now been fixed.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part X: Retrospective Thoughts

9 January 2020

Usenet is 40 years old. Did we get it right, way back when? What could/should we have done differently, with the technology of the time and with what we should have known or could feasibly have learned? And what are the lessons for today?

A few things were obviously right, even in retrospect. For the expected volume of communications and expected connectivity, a flooding algorithm was the only real choice. Arguably, we should have designed a have/want protocol, but that was easy enough to add on later—and was, in the form of NNTP. There were discussions even in the mid- to late-1980s about how to build one, even for dial-up links. For that matter, the original announcement explicitly included a variant form:

Traffic will be reduced further by extending news to support "news on demand." X.c would be submitted to a newsgroup (e.g. "NET.bulk") to which no one subscribes. Any node could then request the article by name, which would generate a sequence of news requests along the path from the requester to the contributing system. Hopefully, only a few requests would locate a copy of x.c. "News on demand" will require a network routing map at each node, but that is desirable anyway.
Similarly, we were almost certainly right to plan on a linked set of star nodes, including of course Duke. Very few sites had autodialers, but most had a few dial-in ports.

The lack of a cryptographic authentication and hence control mechanisms is a somewhat harder call, but I still think we made the right decision. First, there really wasn't very much academic cryptographic literature at the time. We knew of DES, we knew of RSA, and we knew of trapdoor knapsacks. We did not know the engineering parameters for either of the latter two, and, as I noted in an earlier post, we didn't even know to look for a bachelor's thesis that might or might not have solved the problem. Today, I know enough about cryptography that I could, I think, solve the problem with the tools available in 1979 (though remember that there were no cryptographic hash functions then), but I sure didn't know any of that back then.

There's a more subtle problem, though. Cryptography is a tool for enforcing policies, and we didn't know what the policies should be. In fact, we said that, quite explicitly:

  1. What about abuse of the network?
    In general, it will be straightforward to detect when abuse has occurred and who did it. The uucp system, like UNIX, is not designed to prevent abuses of overconsumption. Experience will show what uses of the net are in fact abuses, and what should be done about them.
  2. Who would be responsible when something bad happens?
    Not us! And we don't not intend that any innocent bystander be held liable either. We are looking into this matter. Suggestions are solicited.
  3. This is a sloppy proposal. Let's start a committee.
    No thanks! Yes, there are problems. Several amateurs collaborated on this plan. But let's get started now. Once the net is in place, we can start a committee. And they will actually use the net, so they will know what the real problems are.
This is a crucial point: if you don't know what you want the policies to be, you can't design suitable enforcement mechanisms. Similarly, you have to have some idea who is charged with enforcing policies in order to determine who should hold, e.g., cryptographic keys.

Today's online communities have never satisfactorily answered either part of this. Twitter once described itself as the “free speech wing of the free speech party”; today, it struggles with how to handle things like Trump's tweets and there are calls to regulate social media. Add to that the international dimension and it's a horribly difficult problem—and Usenet was by design architecturally decentralized.

Original Usenet never tried to solve the governance problem, even within its very limited domain of discourse. It would be simple, today, to implement a scheme where posters could cancel their own articles. Past that, it's very hard to decide in whom to vest control. The best Usenet ever had were the Backbone Cabal and a voting scheme for creation of new newsgroups, but the former was dissolved after the Great Renaming because it was perceived to lack popular legitimacy and the latter was very easily abused.

Using threshold cryptography to let M out of N chosen “trustees” manage Usenet works technically but not politically, unless the “voters”—and who are they, and how do we ensure one Usenet user, one vote?—agree on how to choose the Usenet trustees and what their powers should be. There isn't even a worldwide consensus on how governments should be chosen or what powers they should have; adding cryptographic mechanisms to Usenet wouldn't solve it, either, even for just Usenet.

We did make one huge mistake in our design: we didn't plan for success. We never asked ourselves, “What if our traffic estimates are far too low?”

There were a number of trivial things we could have done. Newsgroups could always have been hierarchical. We could have had more hierarchies from the start. We wouldn't have gotten the hierarchy right, but computers, other sciences, humanities, regional, and department would have been obvious choices and not that far from what eventually happened.

A more substantive change would have been a more extensible header format. We didn't know about RFC 733, the then-current standard for ARPANET email, but we probably could have found it easily enough. But we did know enough to insist on having “A” as the first character of a post, to let us revise the protocol more easily. (Aside: tossing in a version indicator is easy. Ensuring that it's compatible with the next version is not easy, because you often need to know something of the unknowable syntax and semantics of the future version. B-news did not start all articles with a “B”, because that would have been incompatible with its header format.)

The biggest success-related issue, though, was the inability to read articles by newsgroup and out of order within a group. Ironically, Twitter suffers from the same problem, even now: you see a single timeline, with no easy way to flag some tweets for later reading and no way to sort different posters into different categories (“tweetgroups”?). Yes, there are lists, but seeing something in a list doesn't mean you don't see it again in your main timeline. (Aside: maybe that's why I spend too much time on Twitter, both on my main account and on my photography account.)

Suppose, in a desire to relive my technical adolescence, I decided to redesign Usenet. What would it look like?

Nope, not gonna go there. Even apart from the question of whether the world needs another social noise network, there's no way the human attention span scales far enough. The cognitive load of Usenet was far too high even at a time when very few people, relatively speaking, were online. Today, there are literally billions of Internet users. I mean, I could specify lots of obvious properties for Usenet: The Next Generation—distributed, peer-to-peer, cryptographically authenticated, privacy-preserving—but people still couldn't handle the load and there are still the very messy governance problems like illegal content, Nazis, trolls, organization, and more. The world has moved on, and I have, too, and there is no shortage of ways to communicate. Maybe there is a need for another, but Usenet—a single infrastructure intended to support many different topics—is probably not the right model.

And there's a more subtle point. Usenet was a batch, store-and-forward network, because that's what the available technology would support. Today, we have an always-online network with rich functionality. The paradigm for how one interacts with a network would and should be completely different. For example: maybe you can only interact with people who are online at the same time as you are—and maybe that's a good thing.

Usenet was a creation of its time, but around then, something like it was likely to happen. To quote Robert Heinlein's Door into Summer, “you railroad only when it comes time to railroad.” The corollary is that when it is time to railroad, people will do so. Bulletin Board Systems started a bit earlier, though it took the creation of the Hayes SmartModem to make them widespread in the 1980s. And there was CSnet, an official email gateway between the ARPANET and dial-up sites, started in 1981, with some of the same goals. We joked that when professors want to do something, they wrote a proposal and received lots of funding, but we, being grad students, just went and did it, without waiting for paperwork and official sanction.

Usenet, though, was different. Bulletin Board Systems were single-site, until the rise of Fidonet a few years later; Usenet was always distributed. CSnet had central administration; Usenet was, by intent, laissez-faire and designed for organic growth at the edges, with no central site that in some way needed money. Despite its flaws, it connected many, many people around the world, for more than 20 years until the rise of today's social network. And, though the user base and usage patterns have changed, it's still around, 40 years later.


This concludes my personal history of Usenet. I haven't seen any corrections, but I'll keep that link live in case I get some.


Correction: This post erroneously referred to RFC 722, by conflating 733 with 822, the revision.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part IX: The Great Renaming

26 December 2019

The Great Renaming was a significant event in Usenet history, since it involved issues of technology, money, and governance. From a personal perspective—and remember that this series of blog posts is purely my recollections—it also marked the end of my “official” involvement in “running” Usenet. I put “running” in quotation marks in the previous sentence because of the difficulty of actually controlling a non-hierarchical, distributed system with no built-in, authenticated control mechanisms.

As with so many other major changes in Usenet, the underlying problem was volume. Here, it wasn't so much the volume that individuals could consume as it was volume for sites to send, receive, and store. There was simply too much traffic. The problem was exacerbated by the newsgroup naming structure: it was too flat, and the hierarchy that did exist—net, fa (for “from ARPA”, ARPANET mailing lists that were gatewayed into Usenet newsgroups), and mod, for moderated newsgroups—wasn't very helpful for managing load. The hierarchy was not semantic, it was based on how content could appear: posted by anyone (net), relayed from a mailing list (fa), or controlled by a moderator (mod). Clearly, something had to be done to aid manageability. But who had both the authority and the power to make such decisions?

Although in theory, all Usenet nodes were equal, in practice some were more equal than others. In technical terms, though Usenet connectivity is considered a graph, in practice it was more like a set of star networks: a very few nodes had disproportionately high connectivity. These few nodes fed many end-sites, but they also talked to each other. In effect, those latter links were the de facto network backbone of Usenet, and the administrators of these major nodes wielded great power. They, together with a few Usenet old-timers, including me and Gene Spafford, constituted what became known as the “Backbone Cabal”. The Backbone Cabal had no power de jure; in practice, though, any newsgroups excluded by the entire cabal would have seen very little distribution outside of the originating region.

The problem had been recognized for quite a while before anything was actually done, see, e.g., this post by Chuq von Rospach, which is arguably the first detailed proposal. The essence of it and the scheme that was finally adopted were the same: organize groups into hierarchies that reflected both subject matter and signal-to-noise ratio. The latter was a significant problem; the volume of shouting in some newsgroups compares unfavorably to the “Comments” section of many web pages. The result was the same, though: sites could easily select what they wanted to receive, via broad categories rather than a long, long list of desired or undesired groups.

Contrary to what some, e.g., the Electronic Frontier Foundation have said, the issue was not censorship, even censorship designed to ensure that Usenet never created the kind of scandal that would lead to public outcry that would threaten the project. And the backbone sites never had to hide from immediate management; as I have indicated, management was very well aware of Usenet and—for backbone sites—was willing to absorb the phone bills. (“Companies so big that their Usenet-related long distance charges were lost in the dictionary-sized bills the company generated every month”—sorry, it doesn't work that way in any organization I've ever been associated with. Every sub-organization had its own budget and had to cover its own phone bills.) There were budget issues and there were worries about scandal, but to the best of my recollection these were more on some non-backbone sites. But the backbone sites had to administer their feeds, and that demanded hierarchy.

To be sure, the top-level hierarchies into which some newsgroups were put was political. It couldn't help being political, because everyone knew that moving something to the talk hierarchy would sharply curtail its distribution. And yes, members of the Cabal (including, of course, me) had their own particular interests. But that notwithstanding, trying to impose a hierarchical classification system on knowledge is hard—ask any librarian. (Thought experiment: how would you classify Apollo 11? Under “rocketry”? The “space race”? The “Cold War”? What about Werner von Braun's contribution to the project? Is he a subcategory of the Apollo Project? Or of the history of rocketry, or of World War II?) There was not and could not be a perfect solution.

(The Wikipedia article on the Great Renaming says that the two immediate drivers were the complexity of listing which groups which sites would receive, and/or the cost of the overseas links from seismo to Europe. That my very well be; I simply do not remember specific issues other than load writ broadly.)

The ultimate renaming scheme was the subject of a lot of discussion, and changes were made to the original proposals. Ultimately, it was adopted—and there was rapid counter-action. The alt hierarchy was created as a set of newsgroups explicitly outside the control the Backbone Cabal. And it succeeded, because technology had changed. For one thing, the cost of phone calls was dropping. For another, the spread of the Internet to many sites meant that Usenet didn't have to flow via phone calls billed by the minute: RFC 977, which prposed a standard for transmitting Usenet over the Internet, came out in early 1986. In other words, the notional control of the Backbone Cabal over content and distribution was just that: notional. The success of the alt hierarchy showed that Usenet had passed a critical point, where the disappearance of a very few nodes could have killed the whole idea of Usenet. At least partially in reaction to this, the Backbone Cabal disappeared—but it left unanswered the question of governance: who could or should control the net?

Newsgroup creation was one early topic. Creation was approved by voting: rough, imperfect voting, which gave rise to proposals for change. There was also the issue of unwanted or improper content, the creation of cancelbots, and more. People worried about liability, jurisdiction, copyright, and more, very early on. These issues are still largely unresolved. Fundamentally, the debate then was between a purely hands-off approach and some form of control; the latter, though, required both consensus on who should have the right to exercise authority and also the creation of appropriate technical mechanisms. Both of these issues are still with us today. I'll have more to say on them in the next (and final substantive) installment of this series.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part VIII: Usenet Growth and B-news

30 November 2019

For quite a while, it looked like my prediction—one to two articles per day—was overly optimistic. By summer, there were only four new sites: Reed College, University of Oklahoma (at least, I think that that's what uucp node uok is), vax135, another Bell Labs machine—and, cruciallyy, U.C. Berkeley, which had a uucp connection to Bell Labs Research and was on the ARPANET.

In principle, even a slow rate of exponential growth can eventually take over the world. But that assumes that there are no “deaths” that will drive the growth rate negative. That isn't a reasaonable assumption, though. If nothing else, Jim Ellis, Tom Truscott, Steve Daniel, and I all planned to graduate. (We all succeeded in that goal.) If Usenet hadn't shown its worth to our successors by then, they'd have let it wither. For that matter, university faculty or Bell Labs management could have pulled the plug, too. Usenet could easily have died aborning. But the right person at Berkeley did the right thing.

Mary Horton was then a PhD student there. (After she graduated, she joined Bell Labs; she and I were two of the primary people who brought TCP/IP to the Labs, where it was sometimes known as the “datagram heresy”. The phone network was, of course, circuit-switched…) Known to her but unknown to us, there were two non-technical ARPANET mailing lists that would be of great interest to many potential Usenet users, HUMAN-NETS and SF-LOVERS. She set up a gateway that relayed these mailing lists into Usenet groups; these were at some point moved to the fa (“From ARPANET”) hierarchy. (For a more detailed telling of this part of the story, see Ronda Hauben's writings.) With an actual traffic source, it was easy to sell folks on the benefits of Usenet. People would have preferred a real ARPANET connection but that was rarely feasible and never something that a student could set up: ARPANET connections were restricted to places that had research contracts with DARPA. The gateway at Berkeley was, eventually, bidirectional for both Usenet and email; this enabled Usenet-style communication between the networks.

SF-LOVERS was, of course, for discussing science fiction; then as now, system administrators were likely to be serious science fiction fans. HUMAN-NETS is a bit harder to describe. Essentially, it dealt with the effect on society of widespread networking. If it still existed today, it would be a natural home for discussions of online privacy, fake news, and hate speech, as well as the positive aspects: access to much of the world's knowledge, including primary source materials that years ago were often hard to find, and better communications between people.

It is, in fact, unclear if the gateway was technically permissible. The ARPANET was intended for use by authorized ARPANET sites only; why was a link to another network allowed? The official reason, as I understand it, is that it was seen as a use by Berkeley, and thus passed muster; my actual impression is that it was viewed as an interesting experiment. The reason for the official restriction was to prevent a government-sponsored network from competing with then-embryonic private data networks; Usenet, being non-commercial, wasn't viewed as a threat.

Uucp email addresses, as seen on the ARPANET, were a combination of a uucp explicit path and an ARPANET hostname. This was before the domain name system; the ARPANET had a flat name space back then. My address would have been something like

research!duke!unc!smb@BERKELEY
but also
research!duke!unc!smb at BERKELEY
—in this era, " at " was accepted as a synonym for "@"…

With the growth in the number of sites came more newsgroups and more articles. This made the limitations of the A-news user interface painfully apparent. Mary designed a new scheme; a high school student, Matt Glickman, implemented what became B-news. There were many improvements.

The most important change was the ability to read articles by newsgroup, and to read them out of order. By contrast, A-news presented articles in order of arrival, and only stored the high-water mark of continuous articles read. The input file format changed, too, to one much more like email. Here's the sample from RFC 1036:

From: jerry@eagle.ATT.COM (Jerry Schwarz)
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
Newsgroups: news.announce
Subject: Usenet Etiquette — Please Read
Message-ID: <642@eagle.ATT.COM>
Date: Fri, 19 Nov 82 16:14:55 GMT
Followup-To: news.misc
Expires: Sat, 1 Jan 83 00:00:00 -0500
Organization: AT&T Bell Laboratories, Murray Hill

The body of the message comes here, after a blank line.

The most interesting change was the existence of both From: and Path: lines. The former was to be used for sending email; the latter was used to track which sites had already seen an article. There is also the implicit assumption that there would be a suitable ARPANET-to-uucp gateway, identified by a DNS MX record, to handle email relaying; at this time, such gateways were largely aspirational and mixed-mode addresses were still the norm.

B-news also introduced control messages. As noted, these were unauthenticated; mischief could and did result. Other than canceling messagse, the primary use was for the creation of new newsgroups—allowing them to be created willy-nilly didn't scale.

There was also control message support for mapping the network, which did not work as well as we expected. Briefly, the purpose of the senduuname message was to allow a site to calculate the shortest uucp path to a destination, both to relieve users of the mental effort to remember long paths and also to allow a shorter email path than simply retracing the Usenet path. (This was also a reliability feature; uucp email, especially across multiple hops, was not very reliable.) My code worked (and, after a 100% rewrite by Peter Honeyman) became my first published paper) but it was never properly integrated into mailers and the shorter paths were even less reliable than the long ones.

Finally, there were internal changes. A-news had used a single directory for all messages, but as the number of messages increased, that became a serious performance bottleneck. B-news use a directory per newsgroup, and eventually subdirectories that reflected the hierarchical structure.

The growth of Usenet had negative consequences, too: some sites became less willing to carry the load. Bell Labs Research had been a major forwarding site, but Doug McIlroy, then a department head, realized that the exponent in Usenet's growth rate was, in fact, significant, and that the forwarding load was threatening to overload the site—star networks don't scale. He ordered an end to email relaying. This could have been very, very serious; fortunately, there were a few other sites that had started to pick up the load, most notably decvax at Digital Equipment Corporations' Unix Engineering Group. This effort, spearheaded by Bill Shannon and Armando Stettner, was quite vital. Another crucial relay site was seismo, run by Rick Adams at the Center for Seismic Studies; Rick later went on to found UUNET, which became the first commercial ISP in the United States. At Bell Labs, ihnp4, run by Gary Murakami, became a central site, too. (Amusingly enough, even though I joined the Labs in late 1982, I did not create another hub: as a very junior person, I didn't feel that I could. But it wasn't because management didn't know about Usenet; indeed, on my first day on the job, my center's director (three levels up from me) greeted me with, “Hi, Steve—I've seen your flames on Netnews.” I learned very early that online posts can convey one's reputation…)

More on load issues in the next post.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part VII: The Public Announcement

25 November 2019

Our goal was to announce Usenet at the January, 1980 Usenix meeting. In those days, Usenix met at universities; it was a small, comaparatively informal organization, and didn't require hotel meeting rooms and the like. (I don't know just when Usenix started being a formal academic-style conference; I do know that it was no later than 1984, since I was on the program committee that year for what would later be called the Annual Technical Conference.) This meeting was in Boulder; I wasn't there, but Tom Truscott and Jim Ellis were.

Apart from the announcement itself, we of course needed non-experimental code—and my prototype was not going to cut it. Although I no longer remember precisely what deficiencies were in my C version, one likely issue was the ability to configure which neighboring sites would receive which newsgroups. Stephen Daniel, also at Duke CS, wrote the code that became known as “A-news”. One important change was the ability to have multiple hierarchies, rather than just the original “NET” or “NET.*”. (Aside: I said in a previous note that my C version had switched to “NET.*” for distributed groups, rather than the single NET. I'm now no longer sure of when that was introduced, in my C version or in Steve Daniel's version. He certainly supported other hierarchies; I certainly did not.) It was also possible in the production version to configure which groups or hierarchies a site would receive. For sanity's sake, this configuration would have to be in a file, rather than in an array built into the code.

That latter point was not always obvious. Uucp, as distributed, used an array to list the commands remote sites were permitted to execute:

char *Cmds[] = {
	"mail",
	"rmail",
	"lpr",
	"opr",
	"fsend",
	"fget",
	NULL
	};
/*  to remove restrictions from uuxqt
 *  redefine CMDOK 0
 *
 *  to add allowable commands, add to the list under Cmds[]
 */
To permit rnews to execute, a system administrator would have to change the source code (and most people had source code to Unix in those days) and recompile. This was, in hindsight, an obviously incorrect decision, but it arguably was justifiable in those days: what else should you be allowed to do? There were many, many fewer commands. (I should note: I no longer remember for certain what fsend, fget, or opr were. I think they were for sending and receiving files, and for printing to a Honeywell machine at the Bell Labs Murray Hill comp center. Think of the ancient GCOS field in /etc/passwd file.)

To work around this problem, we supplied a mail-to-rnews program: a sending site could email articles, rather than try to execute rnews directly. A clock-driven daemon would retrieve the email messages and pass them to rnews. And it had to be clock-driven: in those days, there was no way to have email delivered directly to a program or file. (A security feature? No, simply the simplicity that was then the guiding spirit of Unix. But yes, it certainly helped security.) The remote site configuration file in the A-news therefore needed to know a command to execute, too.

The formal announcement can be seen here. The HTML is easier on the eyes, but there are a few typos and even some missing text, so you may want to look at the scanned version linked to at the bottom. A few things stand out. First, as I noted in Part III, there was a provision for Duke to recover phone charges from sites it polled. There was clearly faculty support at Duke for the project. For that matter, faculty at UNC knew what I was doing.

A more interesting point is what we thought the wide-area use would be: "The first articles will probably concern bug fixes, trouble reports, and general cries for help." Given how focused on the system aspects we were, what we really meant was something like the eventual newsgroup comp.sys.unix-wizards. There was, then, a very strong culture of mutual assistance among programmers, not just in organizations like Usenix (which was originally, as I noted, the Unix Users' Group), but also in the IBM mainframe world. The Wikipedia article on SHARE explains this well:

A major resource of SHARE from the beginning was the SHARE library. Originally, IBM distributed what software it provided in source form and systems programmers commonly made small local additions or modifications and exchanged them with other users. The SHARE library and the process of distributed development it fostered was one of the major origins of open source software.

Another proposed use was locating interesting source code, but not flooding it to the network. Why not? Because software might be bulky, and phone calls then were expensive. The announcement estimates that nighttime phone rates were about US$.50 for three minutes; that sounds about right, though even within the US rates varied with distance. In that time, at 300 bps—30 bytes per second—you could send at most 5400 bytes; given protocol overhead, we conservatively estimated 3000 bytes, or a kilobyte per minute. To pick an arbitrary point of comparison, the source to uucp is about 120KB; at 1KB/sec, that's two hours, or US$20. Adjusting for inflation, that's over US$60 in today's money—and most people don't want most packages. And there was another issue: Duke only had two autodialers; there simply wasn't the bandwidth to send big files to many places, and trying to do so would block all news transfers to other sites. Instead, the proposal was for someone—Duke?—to be a central respository; software could then be retrieved on demand. This was a model later adopted by UUNET; more on it in the next installment of this series.

The most interesting thing, though, is what the announcement didn't talk about: any non-technical use. We completely missed social discussions, hobby discussions, politial discussions, or anything else like that. To the extent we considered it at all, it was for local use—after all, who would want to discuss such things with someone they'd never met?


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part VI: Authentication and Norms

22 November 2019

We knew that Usenet needed some sort of management system, and we knew that that would require some sort of authentication, for users, sites, and perhaps posts. We didn't add any, though—and why we didn't is an interesting story. (Note: much of this blog post is taken from an older post.)

The obvious solution was something involving public key cryptography, which we (the original developers of the protocol: Tom Truscott, the late Jim Ellis, and myself) knew about: all good geeks at the time had seen Martin Gardner's "Mathematical Games" column in the August 1977 issue of Scientific American (paywall), which explained both the concept of public key cryptography and the RSA algorithm. For that matter, Rivest, Shamir, and Adleman's technical paper had already appeared; we'd seen that, too. In fact, we had code available for trapdoor knapsack encryption: the xsend command for public key encryption and decryption, which we could have built upon, was part of 7th Edition Unix, and that's what is what Usenet ran on.

What we did not know was how to authenticate a site's public key. Today, we'd use certificate issued by a certificate authority. Certificates had been invented by then, but we didn't know about them, and of course there were no search engines to come to our aid. (Manual finding aids? Sure—but apart from the question of whether or not any accessible to us would have indexed bachelor's theses, we'd have had to know enough to even look. The RSA paper gave us no hints; it simply spoke of a "public file" or something like a phone book. It did speak of signed messages from a "computer network"—scare quotes in the original!—but we didn't have one of those except for Usenet itself. And a signed message is not a certificate.) Even if we did know, there were no certificate authorities, and we certainly couldn't create one along with creating Usenet.

Going beyond that, we did not know the correct parameters: how long a key to use (the estimates in the early papers were too low), what was secure (the xsend command used an algorithm that was broken a few years later), etc. Maybe some people could have made good guesses. We did not know and knew that we did not know.

The next thing we considered was neighbor authentication: each site could, at least in principle, know and authenticate its neighbors, due to the way the flooding algorithm worked. That idea didn't work, either. For one thing, it was trivial to impersonate a site that appeared to be further away. Every Usenet message contains a Path: line; someone trying to spoof a message would simply have to claim to be a few hops away. (This is how the famous kremvax prank worked.)

It was possible, barely, to have a separate uucp login for different sites, but apart from overhead for managing separate logins, it isn't clear that rnews could have handled it properly.

But there's a more subtle issue. Usenet messages were transmitted via a generic remote execution facility. The Usenet program on a given computer executed the Unix command

uux neighborsite!rnews
where neighborsite is the name of the next-hop computer on which the rnews command would be executed. (Before you ask: yes, the list of allowable remotely requested commands was very small; no, the security was not perfect. But that's not the issue I'm discussing here.) The trouble is that any knowledgeable user on a site could issue the uux command; it wasn't and couldn't easily be restricted to authorized users. Anyone could have generated their own fake control messages, without regard to authentication and sanity built in to the Usenet interface. And yes, we knew that at the time.

Could uux have been secured? This is itself a complex question that I don't want to go into now; please take it on faith and don't try to argue about setgid(), wrapper programs, and the like. It was our judgment then—and my judgment now—that such solutions would not be adopted. The minor configuration change needed to make rnews an acceptable command for remote execution was a sufficiently high hurdle that we provided alternate mechanisms for sites that wouldn't do it.

That left us with no good choices. The infrastructure for a cryptographic solution was lacking. The uux command rendered illusory any attempts at security via the Usenet programs themselves. We chose to do nothing. That is, we did not implement fake security that would give people the illusion of protection but not the reality.

This was the right choice.

But the story is more complex than that. It was the right choice in 1979 but not necessarily right later, for several reasons. The most important is that the online world in 1979 was very different than it is now. For one thing, since only a very few people had access to Usenet, mostly CS students and tech-literate employees of large, sophisticated companies—the norms were to some extent self-enforcing: if someone went too far astray, their school or employer could come down on them. And we did anticipate that some people would misbehave.

As I mentioned in the previous post, our projections of participation and volume were very low. On the one hand, a large network has much more need for management, including ways to deal with people and traffic that violates the norms. On the other, simply as a matter of statistics a large network will have at the least proportionately more malefactors. Furthermore, the increasing democratization of access meant that there were people who were not susceptible to school or employer pressure.

B-news (which I'll get to in a few days) did have control messages. They were necessary, useful—and abused. Spam messages were often countered by cancelbots, but of course cancelbots were not available only to the righteous. And online norms are not always what everyone wants them to be. The community was willing to act technically against the first large-scale spam outbreak, but other issues—a genuine neo-Nazi, posts to the misc.kids newsgroup by a member of NAMBLA, trolls on the soc.motss newsgroup, and more were dealt with by social pressure. (I should note: the first neo-Nazi appeared on Usenet very early on. And no, I'm not being even slightly hyperbolic when I call him that, but I won't give him more publicity by mentioning his name.)

There are several lessons here. One, of course, is that technical honesty is important. A second, though, is that the balance between security and functionality is not fixed—environments and hence needs change over time. B-news was around for a long time before cancel messages were used or abused on a large scale, and this mass good behavior was not because the insecurity wasn't recognized: when I had a job interview at Bell Labs in 1982, the first thing Dennis Ritchie said to me was "[B-news] is a tool of the devil!" A third lesson is that norms can matter, but that the community as a whole has to decide how to enforce them.

There's an amusing postscript to the public key cryptography issue. In 1979-1981, when the Usenet software was being written, there were no patents on public key cryptography nor had anyone heard about export licenses for cryptographic technology. If we'd been a bit more knowledgeable or a bit smarter, we'd have shipped software with such functionality. The code would have been very widespread before any patents were issued, making enforcement very difficult. On the other hand, Tom, Jim, Steve Daniel (who wrote the first released version of the software) and I might have had some very unpleasant conversations with the FBI. But the world of online cryptography would almost certainly have been very different. It's interesting to speculate on how things would have transpired if cryptography was widely used in the early 1980s.

As I alluded to above, we did anticipate possible trouble. In fact, the original public announcement warned about this:

  1. What about abuse of the network?
    In general, it will be straightforward to detect when abuse has occurred and who did it. The uucp system, like UNIX, is not designed to prevent abuses of overconsumption. Experience will show what uses of the net are in fact abuses, and what should be done about them.

    Certain abuses of the net can be serious indeed. As with ordinary abuses, they can be thought about, looked for, and even programmed against, but only experience will show what matters. Uucp provides some measure of protection. It runs as an ordinary user, and has strict access controls. It is safe to say that it poses no greater threat than that inherent in a call-in line.

  2. Who would be responsible when something bad happens?
    Not us! And we do not intend that any innocent bystander be held liable either. We are looking into this matter. Suggestions are solicited.
Clearly, we were mostly worried about hacking via uucp connections. (It's equally clear to those of who dealt with both uucp and security that we were quite naive in thinking that uucp was safe. There's a certain irony in the fact that both Jim Ellis and I ended up specializing in security…)

It seems, though, that we were worried about other abuses as well. The announcement mentions overconsumption of resources as a risk; we knew of that from an article we had seen by Dennis Ritchie in the Bell System Technical Journal. Quoting him:

The weakest area is in protecting against crashing, or at least crippling, the operation of the system. Most versions lack checks for overconsumption of certain resources, such as file space, total number of files, and number of processes (which are limited on a per-user basis in more recent versions). Running out of these things does not cause a crash, but will make the systemunusable for a period. When resource exhaustion occurs, it is generally evident what happened and who was responsible, so malicious actions are detetable, but the real problem is the accidental program bug.
Note the similarity between our "it will be straightforward…" and Ritchie's conclusion.

The bottom line, though, was that we really did not know what to do, nor even what sorts of problems would actually occur. I personally did worry about security to some extent—I actually caught my first hackers around 1971, when some activity generated a console message and I went and examined the punch cards(!) for the program involved—but it wasn't in any sense my primary focus. That said, when Morris and Thompson's famous paper on passwords appeared, I coded up a quick-and-dirty password guesser and informed some people about how bad their passwords were. (One answer I received: "It's not a problem that my password is 'abscissa'; no one else can spell it." Umm…) We would have received that issue of Communications of the ACM around the time that Usenet was being invented, but I do not recall when we saw it.

Were we worried about trolling and other forms of online misbehavior? From a vantage point of 40 years, it's hard to say. As mentioned earlier, we did anticipate people posting things like used car ads to inappropriate places. Of course, there was no way to anticipate what Usenet would become in just a few short years.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part V: Implementation and User Experience

21 November 2019

To understand some of our implementation choices, it's important to remember two things. First, the computers of that era were slow. The Unix machine at UNC's CS department was slower than most timesharing machines even for 1979—we had a small, slow disk, a slow CPU, and—most critically—not nearly enough RAM. Duke CS had a faster computer—they had an 11/70; we had an 11/45—but since I was doing the first implementation, I had to use what UNC had. (Log in remotely? How? There was no Internet then, neither department was on the ARPANET, and dialing up would have meant paying per-minute telephone charges, probably at daytime rates. Besides, a dial-up connection would have been at 300 bps, but if I stayed local I could do 9600 bps via the local Gandalf port selector.)

The second important point is that we knew we had to experiment to get things right. To quote from the first public announcement of Usenet, "Yes, there are problems. Several amateurs collaborated on this plan. But let's get started now. Once the net is in place, we can start a committee. And they will actually use the net, so they will know what the real problems are." None of us had designed a network protocol before; we knew that we'd have to experiment to get things even approximately right. (To be clear, we were not first-time programmers. We were all experienced system administrators, and while I don't know just how much experience Tom and Jim (and Dennis Rockwell, whose name I've inadvertently omitted in earlier posts) had, I'd been programming for about 14 years by this time, with much of my work at kernel level and with a fair amount of communications software experience.)

My strategy for development, then, was what today we would call rapid prototyping: I implemented the very first version of Netnews software as a Bourne Shell script. It was about 150 lines long, but implemented such features as multiple newsgroups and cross-posting.

Why did I use a shell script? First, simply compiling a program took a long time, longer than I wanted to wait each time I wanted to try something new. Second, a lot of the code was string-handling, and as anyone who has ever programmed in C knows, C is (to be charitable) not the best language for string-handling. Write a string library? I suppose I could have, but that's even more time spent enduring a very slow compilation process. Using a shell script let me try things quickly, and develop the code incrementally. Mind you, the shell script didn't execute that quickly, and it was far too slow for production use—but we knew that and didn't care, since it was never intended as a production program. It was a development prototype, intended for use when creating a suitable file format. Once that was nailed down, I recoded it in C. That code was never released publicly, but it was far more usable.

Unfortunately, the script itself no longer exists. (Neither does the C version.) I looked for it rather hard in the early 1990s but could not find a copy. However, I do remember a few of the implementation details. The list of newsgroups a user subscribed to was an environment variable set in their .profile file:

export NETNEWS="*"
or
export NETNEWS="NET admin social cars tri.*"
Why? It made it simple for the script to do something like
cd $NEWSHOME
groups=`echo $NETNEWS`
That is, it would emit the names of any directories whose names matched the shell pattern in the NETNEWS environment variable.

To find unread articles, the script did (the equivalent of)

newsitems=`find $groups -type f -newer $HOME/.netnews -print`
This would find all articles received since the last time the user had read news. Before exiting, the script would touch $HOME/.netnews to mark it with the current time. (More on this aspect below.)

There were a few more tricks. I didn't want to display cross-posted articles more than once, so the script did

ls -tri $newsitems | sort -n | uniq
to list the "i-node" numbers of each file and delete all but the first copy of duplicates: cross-posted articles appeared as single files linked to from multiple directories. Apart from enabling this simple technique for finding duplicates, it saved disk space, at a time when disk space was expensive. (Those who know Unix shell programming are undoubtedly dying to inform me that the uniq command as shown above would not do quite what I just said. Yup, quite right. How I handled that—and handle it I did—I leave as an exercise for the reader. And to be certain that your solution works, make sure that you stick to the commands available to me in 1979.)

I mentioned above that the script would only update the last-read time on successful exit. That's right: there was no way in this version to read things out of order, skip articles and come back to them later, or even stop reading part-way through the day's news feed without seeing everything you just read again. It seems like a preposterous decision, but it was due to one of my most laughable errors: I predicted that the maximum Usenet volume, ever, would never exceed 1-2 articles per day. Traffic today seems to be over 60 tebibytes per day, with more than 100,000,000 posts per day. As I noted last year, I'm proud to have had the opportunity to co-create something successful enough that my volume prediction could be off by so many orders of magnitude.

There are a couple of other points worth noting. The first line of each message contained the article-ID: the sitename, a period, and a sequence number. This guaranteed uniqueness: each site would number its own articles, with no need for global coordination. (There is the tacit assumption that each site would have a unique name. Broadly speaking, this was true, though the penchant among system administrators for using science fiction names and places meant that there were some collisions, though generally not among the machines that made up Usenet.) We also used the article-ID as the filename to store it. However, this decision had an implicit limitation. Site names were, as I recall, limited to 8 characters; filenames in that era were limited to 14. This meant that the sequence number was limited to five digits, right? No, not quite. The "site.sequence" format is a transfer format; using it as a filename is an implementation decision. Had I seen the need, I could easily have created a per-site directory. Given my traffic volume assumptions, there was obviously no need, especially for a prototype.

There was another, more subtle, reason for simply using the article-ID as the filename and using the existence of the file and its last-modified time as the sole metadata about the articles. The alternative—obviously more powerful and more flexible—would be to have some sort of database. However, having a single database of all news items on the system would have required some sort of locking mechanism to avoid race conditions, and locking wasn't easy on 7th Edition Unix. There was also no mechanism for inter-process communication except via pipes, and pipes are only useful for processes descended from the same parent who would have had to create the pipes before creating the processes. We chose to rely on the file system, which had its own, internal locks to guarantee consistency. There are a number of disadvanatages to that approach—but they don't matter if you're only receiving 1-2 articles per day.

The path to the creator served two purposes. One, it indicated which sites should not be sent a copy of newly received articles. Second, by intent it was a valid uucp email address; this permitted email replies to the poster.

I had noted a discrepancy beteween the date format in the announcement and that in RFC 850. On further reflection, I now believe that the announcement got it right and the RFC is wrong. Why? The format the announcement shows is exactly what was emitted by the date command; in a shell script, that would have been a very easy string to use, while deleting the time zone would have been more work. I do not think I would have gone to any trouble to delete something that was pretty clearly important.

The user interface was designed to resemble that of the 7th Edition mail command. It was simple, and worked decently for low-volume mail environments. We also figured that almost all Usenet readers would already be familiar with it.

To send an article to a neighboring system, we used the remote execution component of uucp.

uux $dest_site!rnews <$article_file_name
That is, on the remote system whose name is in the shell variable dest_site, execute the rnews command after first transfering the specified file to use as its standard input. This require that the receiving site allow execution of rnews, which in turn required that they recompile uucp (configuration files? What configuration files?), which was an obstacle for some; more on that in part VI.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part IV: File Format

17 November 2019

When we set out to design the over-the-wire file format, we were certain of one thing: we wouldn't get it perfectly right. That led to our first decison: the very first character of the transmitted file would be the letter "A", for the version. Why not a number on the first line, including perhaps a decimal point? If we ever considered that, I have no recollection of it.

A more interesting question is why we didn't use email-style headers, a style later adopted for HTTP. The answer, I think, is that few, if any, of us had any experience with those protocols at that time. My own personal awareness of them started when I requested and received a copy of the Internet Protocol Transition Workbook a couple of years later—but I was only aware of it because of Usenet. (A few years earlier, I gained a fair amount of knowledge of the ARPANET from the user level, but I concentrated more on learning Multics.)

Instead, we opted for the minimalist style epitomized by 7th Edition Unix. In fact, even if we had known of the Internet (in those days, ARPANET) style, we may have eschewed it anyway. Per a later discussion of implementation, the very first version of our code was a shell script. Dealing with entire lines as single units, and not trying to parse headers that allowed arbitrary case, optional white space, and continuation lines was certainly simpler!

The next question was what to do about duplicate articles. One obvious necessity is an article ID, since that would allow duplicate detection. In our design, the article ID was the rest of the first line, after the A. (Note: it's been 40 years and I no longer remember exactly what we decided at that meeting. Per the implementation discussion, there was experimentation and change. The details I'm giving here are taken from the final format as documented in RFC 850, but there is no doubt that there were changes during development.)

We also wanted to minimize transfer costs. As I noted in the previous post, article transmission was by expensive, dial-up connections; sending something that wasn't needed would cost real money. Accordingly, articles had to include a list of systems known to have already seen the article. This consisted of a series of hostnames separated by exclamation points, with the last element being the login name of the user who posted it. Thus, an article created by me at UNC Chapel Hill and relayed through Duke and alice, a computer at Bell Labs Research, would contain "alice!duke!unc!smb". If a possible next hop appeared in the path, the duplicate copy would not be sent. (Yes, that meant that it was easy to ensure that some sites would never see some articles. To my recollection, we did not worry about that issue and perhaps didn't even notice it.)

Why did we pick that format, instead of something like commas or blanks as separators? The format we chose was that used by uucp for email relaying; someone at some computer that alice talked to could type

mail alice!duke!unc!smb
and it would be relayed through alice and duke before reaching my department's computer and then me. (That sort of email relaying was to prove problematic; again, more on that later.)

Today, with full connectivity over the Internet, we wouldn't do things the same way. Instead, one party would send the next a list of article IDs; that party would then request the ones it had not yet seen. We did consider something like that, but rejected it. Why? Because we were using infrequent, dial-up connections to relay articles, and the number of loops (and hence duplicate articles received) seemed unlikely to be high.

Consider: in our original scheme, many sites would be polled once per night by Duke. If, during that call, Duke sent them a list of articles, they couldn't request it until the next night, and wouldn't receive them until the following night. That amount of delay was unacceptable. Instead, we accepted the chance of sending unnecessary text. While there certainly would be extra transmissions some of the time, we felt that the amount would not be prohibitive—this was before JPG and before MP3, so articles were entirely text and hence would be relatively small and thus cheap.

Sending a date and an article title were obvious enough that these didn't even merit much discussion. The date and time line used the format generated by the ctime() or asctime() library routines. I do not recall if we normalized the date and time to UTC or just ignored the question; clearly, the former would have been the proper choice. (There is an interesting discrepancy here. A reproduction of the original announcement clearly shows a time zone. Neither the RFC nor the ctime() routine had one. I suspect that announcement was correct.) The most interesting question, though, was about what came to be called newsgroups.

We decided, from the beginning, that we needed multiple categories of articles—newsgroups. For local use, there might be one for academic matters ("Doctoral orals start two weeks from tomorrow"), social activities ("Reminder: the spring picnic is Sunday!"), and more. But what about remote sites? The original design had one relayed newsgroup: NET. That is, there would be no distinction between different categories of non-local articles.

This approach was hotly debated. Was it really the case that there would be so little traffic of interest beyond the local machine that no further categorization was needed? (Our estimates of traffic volume were very, very wrong, and this error affected several implementation decisions.) The objection that carried the day: "What if someone wants to sell their car? They want it to reach other computers in the geographical area, but not beyond." We instead decided that anything in newsgroups beginning "NET." would be relayed. This, though, created a problem that is still not resolved: we conflated the notions of interest with the scope of relaying. That is, suppose that instead of duke and unc being directly connected, both sites spoke to alice. Material of regional interest—the two schools were only about 16 km apart—should be seen on both sites, but there would be no reason to send such items as used car ads to a Bell Labs machine in New Jersey. (Aside: years later, when Usenet was already reasonably widespread, someone posted a used car ad to a group with world-wide distribution. The author was rather confused when several of the original designers sent him congratulatory notes…)

There was one more interesting point. From the very beginning, we knew that some articles belonged in more than one category. We therefore supported cross-posting to multiple newsgroups from the very beginning. Cross-posting later came to be seen as impolite, but it was an intentional feature from the very beginning.

Using the example from RFC 850, the final format of a news article looked like this:

Aeagle.642
net.general
cbosgd!mhuxj!mhuxt!eagle!jerry
Fri Nov 19 16:14:55 1982
Usenet Etiquette - Please Read
The body of the article comes here, with no blank line.

We decided on one last issue at the meeting: the name of our system. We called the technology "Netnews"—Network News—and the particular instantiation we hoped for was "Usenix". Why Usenix? The Wikipedia article (as of the 17 November 2019 version) has it almost right: "The name 'Usenet' emphasizes its creators' hope that the USENIX organization would take an active role in its operation." However, there was a bit more. Until some time in 1979, the organization now known as Usenix was called the Unix User's Group. But Bell Labs' lawyers took exception to this use of their trademark, so a new name was chosen: Usenix. The technical folks, being innocent in the ways of lawyers, were bemused by this. Part of our reason for the name "Usenet" was as a gentle tease about this forced renaming.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part III: Hardware and Economics

15 November 2019

There was a planning meeting for what became Usenet at Duke CS. We knew three things, and three things only: we wanted something that could be used locally for administrative messages, we wanted a networked system, and we would use uucp for intersite communication. This last decision was more or less by default: there were no other possibilities available to us or to most other sites that ran standard Unix. Furthermore, all you needed to run uucp was a single dial-up modem port. (I do not remember who had the initial idea for a networked system, but I think it was Tom Truscott and the late Jim Ellis, both grad students at Duke.)

There was a problem with this last option, though: who would do the dialing? The problems were both economic and technical-economic. The latter issue was rooted in the regulatory climate of the time: hard-wired modems were quite unusual, and ones that could automatically dial were all but non-existent. (The famous Hayes Smartmodem was still a few years in the future.) The official solution was a leased Bell 801 autodialer and a DEC DN11 peripheral as the interface between the computer and the Bell 801. This was a non-starter for a skunkworks project; it was hard enough to manage one-time purchases like a modem or a DN11, but getting faculty to pay monthly lease costs for the autodialer just wasn't going to happen. Fortunately, Tom and Jim had already solved that problem.

There was one type of modem that was affordable and purchasable: the acoustic coupler. It works just like it sounds like it would work: you put the handset of a phone into tight-fitting cups, while connecting the electronic side to the computer. So: when the computer sent bits, the acoustic coupler emitted actual sounds via a small speaker and sent them into the handset's microphone. Similarly, a microphone in the acoustic coupler listened to noises corresponding to 0 or 1 bits and sent the appropriate voltage signal to the computer. Since the only connection to the phone network was via sounds, the phone company couldn't object. (Well, AT&T had tried to, several years earlier, but were slapped down.)

This worked great for manual dialing—you picked up the handset, dialed the number, and put the handset into the coupler—but how could a computer dial using one of these? That's where the cleverness came in. The computer was connected to the acoustic coupler via a standard known as RS-232. There were five pins of major interest: ground, transmit, receive, carrier detect (CD), and data terminal ready (DTR). (Aside: the full set of pins was far more complex, and back in the mists of time I had to deal with such arcana regularly. I won't go into details, but I became intimately familiar with things like breakout boxes and null modems. You do not want details on these…) Briefly, when the computer wanted to use the modem—that is, when a program opened the serial port—the computer would assert the DTR signal; when the modem was connected, it would send CD back to the computer. If the far end dropped the connection, the modem would drop CD; this signal was passed back to the program. And if you're a Unix programmer, you now know where the SIGHUP signal came from. Playing clever, hardware-assisted games with the DTR signal was the solution.

Duke implemented it first, but I thought the idea was sufficiently clever that I took it back to Chapel Hill and designed my own variant. The two solutions differed in detail, but since I know mine better and it was cleaner in a number of respects I'll describe mine.

When a landline phone is not in use (that is, when it's "on-hook"), it presents an open circuit to the phone line. (That's not strictly true, since it had to be able to ring, but rings are AC signals, with a capacitor in series with the ringer; it thus appeared as an open circuit to the DC line signal.) To simulate on-hook, we put a normally open relay in series with the phone line. (It seems to be distinctly possible that the phone company would have regarded this as an improper way of connecting to the phone network, but I wouldn't actually know that, would I?) When the computer wanted to use the modem, it asserted DTR; we wired the DTR line to close the relay and thus take the phone line off-hook. In other words, when the computer opened the device (for us, /dev/ttyz7, since we had a DZ11 terminal adapter and put the modem on port 7), it phone would go off-hook; when the device was closed, it went on-hook.

But how to dial? It turns out that old-fashioned rotary dial phones worked by interrupting the circuit briefly. When you dial, say, a 3, there are 3 momentary on-hook signals. This is called pulse dialing. The pulses are sent at 10 per second, with (in the US) a 2:3 make-break ratio. That is, for each dial pulse, the circuit would be interrupted for .6 seconds and then back on for .4 seconds before the next pulse. It turned out to be feasible to do this via softare control of the DTR signal. We couldn't match the exact timing specs—the clock on the Unix systems of the time interrupted at 60 Hz—but we could do 1:2 (four timer ticks on-hook to six off-hook) and that was good enough. (Some of you may wonder if it was possible to dial calls by tapping the hook switch at the right frequency for the right number of pulses, a technique that could prove useful if there was, say, a dial lock in place. That is, umm, possible.)

That solved the hardware side of the dialing problem: software control of the DTR line could, via this relay, take a phone off-hook and pulse-dial desired numbers. However, we still needed a software interface. I wrote a driver that was compatible with the official DN11 driver, but instead talked to the DZ11 driver's routines that controlled the DTR line. This strategy thus presented the modem and dialer as two separate devices, which is what the application software, e.g., uucp expected. (Again, Duke had a similar solution first. I worked with an excellent electronics tech at UNC Chapel Hill for the hardware side of ours. When, a year or so later, we bought a 1200 bps hardwired modem, he redesigned the funky dialer relay to handle the more complex signaling and timing requirements of this modem.)

Having solved the autodialer problem, we had to confront another problem: who was going to pay for the phone calls? Phone calls back then were remarkably expensive, and (even domestically) depended on distance and time of day. Calls during business hours cost the most; evening calls cost less, and late-night calls cost the least.

The solution agreed upon was simple: Duke had one of the few autodialers, so it would have to make the calls. Sites that wanted to join our network would have to set up an auto-answer modem (not exactly common, but much more common than dialers) and reimburse Duke for the phone calls. The calls would be made once or twice per night, depending on desire (i.e., willingness to pay) and traffic. Propagation of articles would be slow, but since Duke was the hub of this network they'd see all news articles sooner than anyone else.

We also knew that the system would not be strictly hub-and-spoke. In fact, the original network had four nodes (duke, duke34, a PDP-11/34, phs, the Physiology Department, and unc) with a loop: duke-phs, duke-unc, and phs-unc. We thus needed a protocol that could handle loops, but that's the topic of the next post. In addition, Tom had interned at the Computer Science Research Group at Bell Labs (the organization from which Unix had come originally) and believed that his contacts there would be willing to call Duke to send and receive traffic. This solution worked for a while, but more on that later.

Note carefully that the original scheme involved money changing hands. In other words, management (i.e., faculty members) had to be very aware of this activity. People in the Duke and UNC CS department's business offices were going to see sudden spikes in phone bills, and Duke was going to have to receive and process payments that other sites were going to have to make. Usenet was a skunkworks project, but one that had official sanction: we had faculty members who were wise enough to value innovation by grad students.

Next up: the original protocol design.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.