Useful Links

Recent Posts

Archive

Will the Circle Be Unbroken?

18 July 2011

I've been playing around with Google+ recently. It's based on a crucial concept, circles. Circles are very powerful — and that may be their big problem.

Circles are Google's answer to Facebook's friends, but they can do more. (The choice of word has also given rise to an endless debate: what is the verb form equivalent to "to friend"? To circle? To encircle? To circumscribe? (We won't go into the question of whether or not one should "befriend" people on Facebook instead of friending them...)) In their simplest form, circles serve two purposes: access control (who can see your posts?) and following à la Twitter: whose posts do you see by default? This is the first danger: the concept is overloaded. Just because I want to hear what someone else says doesn't mean that I want them to hear what I say. The problem can be avoided by proper assignment of people to different circles, but I'm very skeptical that people will get that right; they don't on Facebook.

The problem is worse, though: circles can be used for many more things. There are already lists of creative ways to use them, but such circles are also both access control and following lists. Google+ is still a very geeky place, and was geekier still early on, but I saw a lot of confusion from people I know to be ubergeeks. Once you get used to circles, they're great, but of course the current population is asking for more power still, such as Venn diagram operations on circles. Wonderful — until you get something wrong.

There are many good things here. I especially like that you're asked, explicitly, with whom any new post should be shared. On the other hand, you get no such choice if you post a comments to someone else's thread; indeed, you can't even tell with whom the original poster decided that it should be shared. But I fear that the overloading will lead to very big trouble.

How to Abolish the DNS Hierarchy --- But it's a Bad Idea

2 July 2011

There's been a fair amount of controversy of late about ICANN's decision to dramatically increase the number of top-level domains. With a bit of effort, though — and with little disruption to the infrastructure — we could abolish the issue entirely. Any string whatsoever could be used, and it would all Just Work. That is, it would Just Work in a narrow technical sense; it would hurt innovation and it would likely have serious economic failure modes.

The trick is to use a cryptographic hash function to convert a string of bytes into a sequence of hexadecimal digits. For example, if one applies the SHA-1 function to the string

Amazon.com
the result is a46af6931d9dace2200617548fab3274549e308f. Add a dot after every pair of hex digits, tacking on a suffix like .arb (for "arbitrary", since .hash might be seen as having other connotations), and you get
a4.6a.f6.93.1d.9d.ac.e2.20.06.17.54.8f.ab.32.74.54.9e.30.8f.arb
which looks like a domain name, albeit a weird one. It not only looks like one, it is; that string could be added to the DNS today with no changes in code. We could even distribute the servers; at every level, there are 256 easy-to-split subtrees. So what's wrong?

The technical limitation is that every end point would have to be upgraded to do the hashing. Yes, that's a problem, but we've been through it before; supporting internationalized domain names required the same thing. And it works:

But — how do endpoints know to do the hashing in this scheme? Something in a the URL bar of a web browser? There are lots of things on the net that aren't web browsers; how will they know what to do? You can't necessarily tell from a string if it should be used literally or via this hashing scheme; "Amazon.com" appears to be the legal name of the corporation.

There's another problem: canonicalization. Similar strings will produce very different hash values. Here's an example:

New York Times 7e145e463809ea5e7c28f2ddf103499f942c9ea3
The New York Times 1950c50c10f288dd6e9190361c968e1b8c4a3775
N.Y. Times e69011929d6d30347ddca11c7955a07df8390984
NY Times 48b6b7d57f0ed2885816f1df96da1ffa86f09dda

We could no doubt define some set of rules that would handle many common cases. Equally certain, we'd miss many more. Companies could think of their own rules, but if they missed some we'd be back to cybersquatting and typosquatting. This would be worse, though, because the names are so spread out.

The real issue, though, is economic: who would run the different pieces of .arb? There are currently about 100M names in .com. Let's allow for growth and assume 1,000,000,000 names. To handle canonicalization, assume another factor of 10, for about 10B names. Does that work? To a first approximation, sure; we can delegate at each period in the name, and there are 256 values at each level. That means that going down just two levels, we could have 65,536 different registries, each handling about 150K names. That's easy to do, but a given registry could handle more than one zone. Let's assume that 1.5M names is a good size (which is somewhat challenging, though it's clearly possible since it works today). That means we'd need about 6,600 registries. But they have no way to do marketing; there's no way to target any particular business segment, since names are mapped to more or less random parts of the name tree. If a registry failed, an unpredictable portion of the net would suddenly be unreachable.

Most of us never see registries; when we want to create a new domain, we do business with a registrar. But every registrar would need to do business with every registry! The number of relationships would get ungainly, and again, there's no way to do targeted marketing. The registrars for, say, .museum can target museums, while ignoring, say, banks. With this scheme, everyone is doing business with everyone. It's great to have a global market; it's also very expensive.