January 2020
The Early History of Usenet, Part X: Retrospective Thoughts (9 January 2020)
The Early History of Usenet, Part XI: Errata (9 January 2020)
Y2038: It's a Threat (19 January 2020)

The Early History of Usenet, Part X: Retrospective Thoughts

9 January 2020

Usenet is 40 years old. Did we get it right, way back when? What could/should we have done differently, with the technology of the time and with what we should have known or could feasibly have learned? And what are the lessons for today?

A few things were obviously right, even in retrospect. For the expected volume of communications and expected connectivity, a flooding algorithm was the only real choice. Arguably, we should have designed a have/want protocol, but that was easy enough to add on later—and was, in the form of NNTP. There were discussions even in the mid- to late-1980s about how to build one, even for dial-up links. For that matter, the original announcement explicitly included a variant form:

Traffic will be reduced further by extending news to support "news on demand." X.c would be submitted to a newsgroup (e.g. "NET.bulk") to which no one subscribes. Any node could then request the article by name, which would generate a sequence of news requests along the path from the requester to the contributing system. Hopefully, only a few requests would locate a copy of x.c. "News on demand" will require a network routing map at each node, but that is desirable anyway.
Similarly, we were almost certainly right to plan on a linked set of star nodes, including of course Duke. Very few sites had autodialers, but most had a few dial-in ports.

The lack of a cryptographic authentication and hence control mechanisms is a somewhat harder call, but I still think we made the right decision. First, there really wasn’t very much academic cryptographic literature at the time. We knew of DES, we knew of RSA, and we knew of trapdoor knapsacks. We did not know the engineering parameters for either of the latter two, and, as I noted in an earlier post, we didn’t even know to look for a bachelor’s thesis that might or might not have solved the problem. Today, I know enough about cryptography that I could, I think, solve the problem with the tools available in 1979 (though remember that there were no cryptographic hash functions then), but I sure didn’t know any of that back then.

There’s a more subtle problem, though. Cryptography is a tool for enforcing policies, and we didn’t know what the policies should be. In fact, we said that, quite explicitly:

  1. What about abuse of the network?
    In general, it will be straightforward to detect when abuse has occurred and who did it. The uucp system, like UNIX, is not designed to prevent abuses of overconsumption. Experience will show what uses of the net are in fact abuses, and what should be done about them.
  2. Who would be responsible when something bad happens?
    Not us! And we don’t not intend that any innocent bystander be held liable either. We are looking into this matter. Suggestions are solicited.
  3. This is a sloppy proposal. Let’s start a committee.
    No thanks! Yes, there are problems. Several amateurs collaborated on this plan. But let’s get started now. Once the net is in place, we can start a committee. And they will actually use the net, so they will know what the real problems are.
This is a crucial point: if you don’t know what you want the policies to be, you can’t design suitable enforcement mechanisms. Similarly, you have to have some idea who is charged with enforcing policies in order to determine who should hold, e.g., cryptographic keys.

Today’s online communities have never satisfactorily answered either part of this. Twitter once described itself as the “free speech wing of the free speech party”; today, it struggles with how to handle things like Trump’s tweets and there are calls to regulate social media. Add to that the international dimension and it’s a horribly difficult problem—and Usenet was by design architecturally decentralized.

Original Usenet never tried to solve the governance problem, even within its very limited domain of discourse. It would be simple, today, to implement a scheme where posters could cancel their own articles. Past that, it’s very hard to decide in whom to vest control. The best Usenet ever had were the Backbone Cabal and a voting scheme for creation of new newsgroups, but the former was dissolved after the Great Renaming because it was perceived to lack popular legitimacy and the latter was very easily abused.

Using threshold cryptography to let M out of N chosen “trustees” manage Usenet works technically but not politically, unless the “voters”—and who are they, and how do we ensure one Usenet user, one vote?—agree on how to choose the Usenet trustees and what their powers should be. There isn’t even a worldwide consensus on how governments should be chosen or what powers they should have; adding cryptographic mechanisms to Usenet wouldn’t solve it, either, even for just Usenet.

We did make one huge mistake in our design: we didn’t plan for success. We never asked ourselves, “What if our traffic estimates are far too low?”

There were a number of trivial things we could have done. Newsgroups could always have been hierarchical. We could have had more hierarchies from the start. We wouldn’t have gotten the hierarchy right, but computers, other sciences, humanities, regional, and department would have been obvious choices and not that far from what eventually happened.

A more substantive change would have been a more extensible header format. We didn’t know about RFC 733, the then-current standard for ARPANET email, but we probably could have found it easily enough. But we did know enough to insist on having “A” as the first character of a post, to let us revise the protocol more easily. (Aside: tossing in a version indicator is easy. Ensuring that it’s compatible with the next version is not easy, because you often need to know something of the unknowable syntax and semantics of the future version. B-news did not start all articles with a “B”, because that would have been incompatible with its header format.)

The biggest success-related issue, though, was the inability to read articles by newsgroup and out of order within a group. Ironically, Twitter suffers from the same problem, even now: you see a single timeline, with no easy way to flag some tweets for later reading and no way to sort different posters into different categories (“tweetgroups”?). Yes, there are lists, but seeing something in a list doesn’t mean you don’t see it again in your main timeline. (Aside: maybe that’s why I spend too much time on Twitter, both on my main account and on my photography account.)

Suppose, in a desire to relive my technical adolescence, I decided to redesign Usenet. What would it look like?

Nope, not gonna go there. Even apart from the question of whether the world needs another social noise network, there’s no way the human attention span scales far enough. The cognitive load of Usenet was far too high even at a time when very few people, relatively speaking, were online. Today, there are literally billions of Internet users. I mean, I could specify lots of obvious properties for Usenet: The Next Generation—distributed, peer-to-peer, cryptographically authenticated, privacy-preserving—but people still couldn’t handle the load and there are still the very messy governance problems like illegal content, Nazis, trolls, organization, and more. The world has moved on, and I have, too, and there is no shortage of ways to communicate. Maybe there is a need for another, but Usenet—a single infrastructure intended to support many different topics—is probably not the right model.

And there’s a more subtle point. Usenet was a batch, store-and-forward network, because that’s what the available technology would support. Today, we have an always-online network with rich functionality. The paradigm for how one interacts with a network would and should be completely different. For example: maybe you can only interact with people who are online at the same time as you are—and maybe that’s a good thing.

Usenet was a creation of its time, but around then, something like it was likely to happen. To quote Robert Heinlein’s Door into Summer, “you railroad only when it comes time to railroad.” The corollary is that when it is time to railroad, people will do so. Bulletin Board Systems started a bit earlier, though it took the creation of the Hayes SmartModem to make them widespread in the 1980s. And there was CSnet, an official email gateway between the ARPANET and dial-up sites, started in 1981, with some of the same goals. We joked that when professors want to do something, they wrote a proposal and received lots of funding, but we, being grad students, just went and did it, without waiting for paperwork and official sanction.

Usenet, though, was different. Bulletin Board Systems were single-site, until the rise of Fidonet a few years later; Usenet was always distributed. CSnet had central administration; Usenet was, by intent, laissez-faire and designed for organic growth at the edges, with no central site that in some way needed money. Despite its flaws, it connected many, many people around the world, for more than 20 years until the rise of today’s social network. And, though the user base and usage patterns have changed, it’s still around, 40 years later.


This concludes my personal history of Usenet. I haven’t seen any corrections, but I’ll keep that link live in case I get some.


Correction: This post erroneously referred to RFC 722, by conflating 733 with 822, the revision.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

The Early History of Usenet, Part XI: Errata

9 January 2020

I managed to conflate RFCs 733 and 822, and wrote 722 in the last poast. That’s now been fixed.


Here is the table of contents, actual and projected, for this series.

  1. The Early History of Usenet: Prologue
  2. The Technological Setting
  3. Hardware and Economics
  4. File Format
  5. Implementation and User Experience
  6. Authentication and Norms
  7. The Public Announcement
  8. Usenet Growth and B-news
  9. The Great Renaming
  10. Retrospective Thoughts
  11. Errata

The tag URL https://www.cs.columbia.edu/~smb/blog/control/tag_index.html#TH_Usenet_history will always take you to an index of all blog posts on this topic.

Y2038: It's a Threat

19 January 2020

Last month, for the 20th anniversary of Y2K, I was asked about my experiences. (Short answer: there really was a serious potential problem, but disaster was averted by a lot of hard work by a lot of unsung programmers.) I joked that, per this T-shirt I got from a friend, the real problem would be on January 19, 2038, and 03:14:08 GMT.

Picture of a T-shirt saying that Y2K
is harmless but that 2038 is dangerous

Why might that date be such a problem?

On Unix-derived systems, including Linux and MacOS, time is stored internally as the number of seconds since midnight GMT, January 1, 1970, a time known as “the Epoch”. Back when Unix was created, timestamps were stored in a 32-bit number. Well, like any fixed-size value, only a limited range of numbers can be stored in 32 bits: numbers from -2,147,483,648 to 2,147,483,647. (Without going into technical details, the first of those 32 bits is used to denote a negative number. The asymmetry in range is to allow for zero.)

I immediately got pushback: did I really think that 18 years hence, people would still be using 32-bit systems? Modern computers use 64-bit integers, which can allow for times up to 9,223,372,036,854,775,807 seconds since the Epoch. (What date is that? I didn’t bother to calculate it, but it’s about 292,271,023,045 years, a date that’s well beyond when it is projected that the Sun will run out of fuel. I don’t propose to worry about computer timestamps after that.)

It turns out, though, that just as with Y2K, the problems don’t start when the magic date hits; rather, they start when a computer first encounters dates after the rollover point, and that can be a lot earlier. In fact, I just had such an experience.

A colleague sent me a file from his Windows machine; looking at the contents, I saw this.

$ unzip -l zipfile.zip
Archive: zipfile.zip
Length Date Time Name
——— ———- —— —-
2411339 01-01-2103 00:00 Anatomy…
——— ——-

Look at that date: it’s in the next century! (No, I don’t know how that happened.) But when I looked at it after extracting on my computer, the date was well in the past:

$ ls -l Anatomy…
-rw-r—r—@ 1 smb staff 2411339 Nov 24 1966 Anatomy…
Huh?

After a quick bit of coding, I found that the on-disk modification time of the extracted file was 4,197,067,200 seconds since the Epoch. That’s larger than the limit! But it’s worse than that. I translated the number to hexadecimal (base 16), which computer programmers use as an easy way to display the binary values that computers use internally. It came to FA2A29C0. (Since base 16 needs six more digits than our customary base 10, we use the letters A–F to represent them.) The first “F”, in binary, is 1111. And the first of those bits is the so-called sign bit, the bit that tells whether or not the number is negative. The value of FA2A29C0, if treated as a signed, 32-bit number, is -97,900,096, or about 3.1 years before the Epoch. Yup, that corresponds exactly to the Nov 24, 1966 date my system displayed. (Why should +4,197,067,200 come out to -97,900,096? As I indicated, that’s moderately technical, but if you want to learn the gory details, the magic search phrase is “2’s complement”.)

So what happened? MacOS does use 64-bit time values, so there shouldn’t have been a problem. But the “ls” command (and the Finder graphical application) do do some date arithmetic. I suspect that there is old code that is using a 32-bit variable, thus causing the incorrect display.

For fun, I copied the zip file to a Linux system. It got it right, on extraction and display:

$ ls -l Anatomy…
-rw-r—r— 1 smb faculty 2411339 Jan 2 2103 Anatomy…
(Why Januaray 2 instead of January 1? I don’t know for sure; my guess is time zones.)

So: there are clearly some Y2038 bugs in MacOS, today. In other words, we already have a problem. And I’m certain that these aren’t the only ones, and that we’ll be seeing more over the next 18 years.


Update: I should have linked to this thread about a more costly Y2038 incident.