SMBlog -- 21 November 2019

The Early History of Usenet, Part V: Implementation and User Experience

21 November 2019

To understand some of our implementation choices, it’s important to remember two things. First, the computers of that era were slow. The Unix machine at UNC’s CS department was slower than most timesharing machines even for 1979—we had a small, slow disk, a slow CPU, and—most critically—not nearly enough RAM. Duke CS had a faster computer—they had an 11/70; we had an 11/45—but since I was doing the first implementation, I had to use what UNC had. (Log in remotely? How? There was no Internet then, neither department was on the ARPANET, and dialing up would have meant paying per-minute telephone charges, probably at daytime rates. Besides, a dial-up connection would have been at 300 bps, but if I stayed local I could do 9600 bps via the local Gandalf port selector.)

The second important point is that we knew we had to experiment to get things right. To quote from the first public announcement of Usenet, "Yes, there are problems. Several amateurs collaborated on this plan. But let’s get started now. Once the net is in place, we can start a committee. And they will actually use the net, so they will know what the real problems are." None of us had designed a network protocol before; we knew that we’d have to experiment to get things even approximately right. (To be clear, we were not first-time programmers. We were all experienced system administrators, and while I don’t know just how much experience Tom and Jim (and Dennis Rockwell, whose name I’ve inadvertently omitted in earlier posts) had, I’d been programming for about 14 years by this time, with much of my work at kernel level and with a fair amount of communications software experience.)

My strategy for development, then, was what today we would call rapid prototyping: I implemented the very first version of Netnews software as a Bourne Shell script. It was about 150 lines long, but implemented such features as multiple newsgroups and cross-posting.

Why did I use a shell script? First, simply compiling a program took a long time, longer than I wanted to wait each time I wanted to try something new. Second, a lot of the code was string-handling, and as anyone who has ever programmed in C knows, C is (to be charitable) not the best language for string-handling. Write a string library? I suppose I could have, but that’s even more time spent enduring a very slow compilation process. Using a shell script let me try things quickly, and develop the code incrementally. Mind you, the shell script didn’t execute that quickly, and it was far too slow for production use—but we knew that and didn’t care, since it was never intended as a production program. It was a development prototype, intended for use when creating a suitable file format. Once that was nailed down, I recoded it in C. That code was never released publicly, but it was far more usable.

Unfortunately, the script itself no longer exists. (Neither does the C version.) I looked for it rather hard in the early 1990s but could not find a copy. However, I do remember a few of the implementation details. The list of newsgroups a user subscribed to was an environment variable set in their .profile file:

export NETNEWS="*"

export NETNEWS="NET admin social cars tri.*"

Why? It made it simple for the script to do something like

cd $NEWSHOME groups=‘echo $NETNEWS‘

That is, it would emit the names of any directories whose names matched the shell pattern in the NETNEWS environment variable.

To find unread articles, the script did (the equivalent of)

newsitems=‘find $groups -type f -newer $HOME/.netnews -print‘

This would find all articles received since the last time the user had read news. Before exiting, the script would touch $HOME/.netnews to mark it with the current time. (More on this aspect below.)

There were a few more tricks. I didn’t want to display cross-posted articles more than once, so the script did

ls -tri $newsitems | sort -n | uniq

to list the "i-node" numbers of each file and delete all but the first copy of duplicates: cross-posted articles appeared as single files linked to from multiple directories. Apart from enabling this simple technique for finding duplicates, it saved disk space, at a time when disk space was expensive. (Those who know Unix shell programming are undoubtedly dying to inform me that the uniq command as shown above would not do quite what I just said. Yup, quite right. How I handled that—and handle it I did—I leave as an exercise for the reader. And to be certain that your solution works, make sure that you stick to the commands available to me in 1979.)

I mentioned above that the script would only update the last-read time on successful exit. That’s right: there was no way in this version to read things out of order, skip articles and come back to them later, or even stop reading part-way through the day’s news feed without seeing everything you just read again. It seems like a preposterous decision, but it was due to one of my most laughable errors: I predicted that the maximum Usenet volume, ever, would never exceed 1-2 articles per day. Traffic today seems to be over 60 tebibytes per day, with more than 100,000,000 posts per day. As I noted last year, I’m proud to have had the opportunity to co-create something successful enough that my volume prediction could be off by so many orders of magnitude.

There are a couple of other points worth noting. The first line of each message contained the article-ID: the sitename, a period, and a sequence number. This guaranteed uniqueness: each site would number its own articles, with no need for global coordination. (There is the tacit assumption that each site would have a unique name. Broadly speaking, this was true, though the penchant among system administrators for using science fiction names and places meant that there were some collisions, though generally not among the machines that made up Usenet.) We also used the article-ID as the filename to store it. However, this decision had an implicit limitation. Site names were, as I recall, limited to 8 characters; filenames in that era were limited to 14. This meant that the sequence number was limited to five digits, right? No, not quite. The "site.sequence" format is a transfer format; using it as a filename is an implementation decision. Had I seen the need, I could easily have created a per-site directory. Given my traffic volume assumptions, there was obviously no need, especially for a prototype.

There was another, more subtle, reason for simply using the article-ID as the filename and using the existence of the file and its last-modified time as the sole metadata about the articles. The alternative—obviously more powerful and more flexible—would be to have some sort of database. However, having a single database of all news items on the system would have required some sort of locking mechanism to avoid race conditions, and locking wasn’t easy on 7th Edition Unix. There was also no mechanism for inter-process communication except via pipes, and pipes are only useful for processes descended from the same parent who would have had to create the pipes before creating the processes. We chose to rely on the file system, which had its own, internal locks to guarantee consistency. There are a number of disadvanatages to that approach—but they don’t matter if you’re only receiving 1-2 articles per day.

The path to the creator served two purposes. One, it indicated which sites should not be sent a copy of newly received articles. Second, by intent it was a valid uucp email address; this permitted email replies to the poster.

I had noted a discrepancy beteween the date format in the announcement and that in RFC 850. On further reflection, I now believe that the announcement got it right and the RFC is wrong. Why? The format the announcement shows is exactly what was emitted by the date command; in a shell script, that would have been a very easy string to use, while deleting the time zone would have been more work. I do not think I would have gone to any trouble to delete something that was pretty clearly important.

The user interface was designed to resemble that of the 7th Edition mail command. It was simple, and worked decently for low-volume mail environments. We also figured that almost all Usenet readers would already be familiar with it.

To send an article to a neighboring system, we used the remote execution component of uucp.

uux $dest_site!rnews <$article_file_name

That is, on the remote system whose name is in the shell variable dest_site, execute the rnews command after first transfering the specified file to use as its standard input. This require that the receiving site allow execution of rnews, which in turn required that they recompile uucp (configuration files? What configuration files?), which was an obstacle for some; more on that in part VI.

Here is the table of contents, actual and projected, for this series.