Useful Links

Recent Posts


Problems with the Burr-Feinstein Bill

8 April 2016

What appears to be a leaked copy of the Burr-Feinstein on encryption back doors. Crypto issues aside—I and my co-authors have written on those before—this bill has many other disturbing features. (Note: I've heard a rumor that this is an old version. If so, I'll update this post as necessary when something is actually introduced.)

One of the more amazing oddities is that the bill's definition of "communications" (page 6, line 10) includes "oral communication", as defined in 18 USC 2510. Now, that section says that "oral communication"

means any oral communication uttered by a person exhibiting an expectation that such communication is not subject to interception under circumstances justifying such expectation, but such term does not include any electronic communication;
Leaving aside the recursion in that definition, it states that oral communications are just that, oral—how could they be encrypted?

"Covered entities"—more below on what that means—have to provide an "intelligible" version (8:5) of information or data that has been "encrypted, enciphered, encoded, modulated, or obfuscated". We'll ignore the nit that "enciphered" is a subset of "encrypted". "Encoded" is more probematic; it might be a form of encryption, but it could also refer to a standard for representing information, e.g., ASCII: the American Standard Code for Information Interchange. It's tempting to think that the encryption meaning is the intended one, but look at the next word: "modulated". Modulation has nothing to do with secrecy. If law enforcement can't cope with a modulation technique, it points to a lack of technical capability, not an attempt by a vendor to conceal information. It also leaves me wondering what "encoded" means.

An "intelligible version" is supposed to be the "original form" (8:14) of the information or data. What does that mean, especially if we're talking about an encoding format that law enforcement doesn't understand? Let's consider, say, the Lytro camera. Lytros use cool technology that lets you do things like change the focus after you've taken the picture. You can certainly get a JPG out of a Lytros image—but which one? Focus and depth of field matter. And there is no "original form" save for the actual three-dimensional objects in the field of view of the camera. It's also worth noting that the JPG format—a way to encode an image—is a lossy algorithm. That is, there is by design no way to go back to the "original". Should JPG be outlawed?

There's also language to vastly expand the amount of metadata that has to be available to law enforcement. The bill speaks of "communication identifying information" (4:19) as something that has to be made available. It sounds like classic metadata, but the definition is now expanded. It generally speaks of "dialing, routing, addressing, signaling, switching, processing, transmitting ... information", (4:21) while the older definition speaks just of "dialing, routing, addressing, or signaling". The new definition includes "public and local source and destination addressing" (5:8), "port numbers" (5:15), and, I believe, MAC addresses (5:17): "addresses or other information that uniquely identifies the equipment used...". The older terms were not defined; the new ones are worse. "Processing"? What does that mean? This section opens the door to unconstitutional overcollection. The WiFi router in your home is almost certainly covered by this bill (it's a "device" used to "facilitate a communication ... of data"), but when some of that information reaches your router there are no third parties involved. That makes it legally unavailable to law enforcement unless they have a wiretap warrant—but this bill requires that the information be available under "any order or warrant issued by a court of competent jurisdiction" (emphasis added). A court order sufficient to obtain metadata is not a warrant.

A "covered entity" (6:18) is more or less anyone: a software vendor, a hardware vendor, a provider of "remote" or "electronic" communication services, and more. At least as important, a "license distributor" (4:10)—the language isn't completely clear, but it seems to refer to app store operators—must ensure that anything distributed via their store (and it includes not just apps but also "services") conforms to these requirements. That's right: even if an app store does not already do vetting, it would be obligated to at least mandate the crypto back door. One wonders what that means for, say, GitHub—even open source software is generally distributed pursuant to a license that is included with the software.

There's more; I could go on. For example, Orin Kerr noted that the bill "doesn't require only reasonable assistance: It's 'assistance as is necessary' to decrypt". The application to home routers is incredibly invasive, since it requires features that such boxes have never had and either remote access by the manufacturer or a serious leak of private information. And all that is on top of the generally bad concept of crypto back doors.

This is a really bad bill.

Update: The official version of the bill has been released. There appear to be very few changes and none that affect anything I've said.

The FBI and the iPhone: Important Unanswered Questions

28 March 2016

As you probably know (if you read this blog), the FBI has gotten into Syed Farook's iPhone. Many people have asked the obvious questions: how did the FBI do it, will they tell Apple, did they find anything useful, etc.? I think there are deeper questions that really get to the full import of the break.

How expensive is the attack?
Security—-and by extension, insecurity—-are not absolutes. Rather, they're only meaningful concepts if they include some notion of the cost of an attack. If an attack is cheap, it can be used frequently; if it's expensive, it will be reserved for high-value targets.

We don't know anything about the cost. Did it require special hardware? Unusual skills?

How long does the attack take to carry out?
This attack took at most a few weeks to launch, probably less: the FBI would have wanted to test the exploit on unimportant phones before trying it on Farook's phone. But there's a big different between an exploit that takes a few seconds and one that takes several days. The former is a risk to, for example, business travelers crossing a border; the latter would likely be used only if there is already good reason to think that valuable information will be retrieved.

How easy is it to launch?
Can the attack be launched remotely, say via a carefully crafted text message? Does it require tethering to a computer? Does it work if the phone has been pair-locked to a particular computer? Do chips have to be unsoldered? Is special equipment necessary?

These questions are interesting because together, they define the risk to everyone else if the hole is not patched. More importantly, the answers set the parameters of the vulnerabilities equities discussion without divulging information that the FBI may consider too sensitive to reveal as yet. A cheap, fast, easy attack is a serious risk; an expensive, slow, difficult attack is a risk to only a few.

Caveats About "Computer Science For All"

1 February 2016

As someone who learned to program at age 14 and who benefited immensely from the opporunities my high school's computer provided, I think that it's a great idea to give more children a similar chance. Programming was more fun than just about anything else I'd ever done, and it quickly displaced math, physics, and law as possible career paths for me. (No, I probably would never have become a lawyer, since math and science were too much fun, but I was interested in law and policy even then.) And yes, quite obviously my career path was shaped by that early opportunity.

Given all that, I'm delighted by the White House's "Computer Science For All" initiative. Even people who don't become programmers—-probably the large majority of students who will take these classes—-will benefit from that sort of thinking. That said, I do have a few concerns.

Teaching the teachers
The White House recognizes that we need far more people qualified to teach computer science. It's a crucial need, but I wonder if $4 billion is nearly enough money. I wonder if we need another level: teaching the teachers who will teach the children's teachers.

The teachers have to really understand programming. I had another spot of luck when I was young: I had a relative who know how to program and who could answer my questions. My career almost died aborning; there were two or three crucial ideas that I just didn't get at first. I don't know if I'd have been able to work past them on my own.

Reteaching the teachers
Computer science is incredibly dynamic, even at the introductory levels. Let's put it like this: the iPhone is less than 10 years old, but it's completely changed the industry. Teaching children to program but ignoring smart phones would be a bad idea, if only because they'll be less interested in the subject matter. But progress isn't stopping with smart phones; not very many years from now, school kids will want—-need—-to learn about programming the next big thing, whatever it will be. Internet of Things? Wearables? Programmable drones? I have no idea, but I'm sure it will happen.

In other words, the teachers are going to need frequent refreshers. The curriculum will also need frequent updates. There is in-service training today, but I suspect there will need to be more. In most subjects, the content of the course doesn't change drastically every five years; in computer science, it does. (Yes, programming is programing. But the details of what you program will change.)

In other words, the the training budget has to be an ongoing commitment. Even if $4 billion is the right number now, more will be needed not very many years in the future.

Buying Equipment
Teaching programming requires computers and software. Computers age and they don't age gracefully; software is even worse. The hardware will need to be replaced every 4-5 years; the software will need to be upgraded every year or two. This, of course, also requires money.

I suspect that there also should be a subsidy for equipment and Internet connectivity for poorer households. You learn programming only be doing, and it takes hours of non-class time for every hour of instruction. Students who don't have easy access to current-enough computers and software won't learn.

In other words, teaching programming to all children will require a notable amount of extra money, on top of today's budgets, every single year. Furthermore, if extra funds are not allocated to poorer districts, much of the money spent there will be wasted and we'll worsen the digital divide.

I am not saying that the White House initiative is a bad idea—-quite the contrary, in fact. I am saying it's just the down payment on a long-term effort. The challenge now is to identify where the continuing funding will come from. It might be reasonable to give priority on the initial outlays to districts and states that have identified and committed to a sustainable funding model—-but again, this has to be done in a way that won't worsen poverty.

Why More Effort Won't Solve the Exceptional Access Problem

3 January 2016

In the debate over government "exceptional access" to encrypted communications, opponents with a technical bent (and that includes me) have said that it won't work: that such a scheme would inevitably lead to security problems. The response—from the policy side, not from techncial folk—has been to assert that perhaps more effort would suffice. FBI Director James Comey has said, "But my reaction to that is: I'm not sure they've really tried." Hillary Clinton wants a "Manhattan-like project, something that would bring the government and the tech communities together". More effort won't solve the problem—but the misunderstanding lies at the heart of why exceptional access is so hard.

The Manhattan Project had to solve one problem. It was a very hard problem, one they didn't even know could be solved, but it was just one problem. Exceptional access is a separate problem for every app, every service, every device. Possibly, a few will get it right. More likely, they'll fail even more abysmally than they've failed at even simple encryption. Study ("Developers have botched encryption in seven out of eight Android apps and 80 percent of iOS apps") after study ("10,327 out of 11,748 applications that use cryptographic APIs—88% overall—make at least one mistake") after study ("root causes are not simply careless developers, but also limitations and issues of the current SSL development paradigm") after study ("We demonstrate that SSL certificate validation is completely broken in many security-critical applications and libraries.") have shown the same thing: programmers don't get the crypto right—and these are primarily studies of apps that use standardized and well-understood protocols and APIs.

Oppenheimer and company had the advantage of effectively unlimited resources. When confronted with two design choices, they could try both. This let them avoid dead ends, such as trying to build a gun-type bomb with plutonium. They could try gaseous diffusion, thermal diffusion, centrifuges, and electromagnetic separation for uranium enrichment. (The latter required more than $1 billion dollars of silver wire—and they got it.) App developers don't have that luxury. Even if one or two do, they don't share their source code with each other. Besides, most developers don't know if they've gotten it right or wrong; the failure mode here is silent insecurity. Most of the time, holes like these are found by people who do a serious penetration study—and these are generally the attackers.

One size doesn't fit all when it comes to cryptography. That's why cryptographic APIs are so complex. We can't solve the exceptional access problem once and for all, and individual efforts won't suffice for particular cases.

Cryptography is Hard

22 December 2015

In the debate about "exceptional access" to encrypted conversations, law enforcement says they need such access to prevent and solve crimes; cryptographers, on the other hand, keep saying it's too complicated to do safely. That claim is sometimes met with skepticism: what's so hard about encryption? After all, you learn someone's key and just start encrypting, right? I wish it were that simple—but it's not. I'm going to try the impossible: I'm going to give an example of how cryptography is really used, and how it can fail. I warn you in advance that this blog post is considerably more technical than most that I write. If at any point you get lost, skip ahead to the Summary.

I'm first going to describe a cryptographic protocol, a stylized set of messages used to set up an encrypted session in the real world. The particular protocol I'm going to describe was devised in 1978 by Roger Needham and Michael Schroeder; it's the oldest protocol in the unclassified literature. I'm going to focus on their Protocol 2.

Before I start, though, I need to describe some concepts and notation. We'll assume that everyone has a "key pair", a public key that is used to encrypt messages and a private key to decrypt them. If I write

I mean "encrypt message M with the public key Ka belonging to A". Only A has the private key necessary to decrypt it.

Cryptographers also make use of numbers known as nonces. A nonce is a random number used one time and one time only. We use nonces for a variety of purposes. One is to make sure that a message is "fresh": it's not an old message that is being replayed. We also use nonces to generate session keys, the actual key used to encrypt some traffic. (Why do we use session keys? There are many reasons; the simplest to understand is that if through some accident today's session key is revealed, you don't want yesterday's or tomorrow's traffic to be at risk.) If a nonce is only used once and a session key is made from a nonce, you know that the session key will only be used once.

Cryptographers also speak of the threat model: what we assume the enemy can and cannot do. Much of the time, it's pretty simple: just assume that any time you want to send a messages to someone, you do it by giving the message to your enemy to deliver. When you receive a message, it's handed to you by the enemy. In other words, you can't trust the network. The enemy can discard your messages, copy them, deliver them multiple times, rearrange them, invent messages, even create messages made up of pieces of older messages that may or may not have been delivered. It sounds preposterous, but there are real-world examples of attackers doing all of those things.

The attackers are pretty good at hacking; they can penetrate some (but not all) of the computers involved. We'll assume that she can't hack the computers belonging to either of the two parties who really want to talk securely, or we wouldn't have to do any more.

Oddly enough, there's one thing we generally assume the enemy can't do: break the encryption algorithm. That is, if I send

to A, even via an enemy-controlled network, I can safely assume that the attacker can't read it. It turns out that we know a lot about encryption algorithms today, and we have a pretty good handle on what can and cannot be done. ("What about Enigma?" I hear you say. Apart from the fact that it's an 80-year-old design and the state of the art has advanced considerably since then, the real problem wasn't that the design was weak. Rather, it was that the Germans often used it incorrectly. For example, when creating a sesion key, some soldiers would type things like XXX. The British caught on to patterns like that... There were other errors, too, in the analog of what today would be called a cryptographic protocol.) We'll also assume that an encrypted message can't be tampered with; if it is, it won't decrypt properly and the recipient will notice. This itself isn't a trivial property to achieve, but we'll ignore that today.

Are you still with me? Good. It's time to begin!

Suppose that computer A wants to talk securely to computer B. (Cryptographers anthropomorphize their computers and say that Alice wants to talk to Bob.) To do that, Alice and Bob have to set up a session key. Alice also wants to be sure she's talking to Bob and not to some enemy whom I'll dub E, Eve, the eavesdropper. In other words, authentication—making sure you're talking to the right party—is a vital part of the process, too. It's not enough to set up a session key, it has to be set up with exactly the right party, and both parties have to know the identity of the other party.

I'm going to assume that Alice knows Bob's public key Kb and Bob knows Alice's key Ka. I won't go into how they know it; let it suffice to say that if there's another party whom they both trust, it's a solved problem. (In the real world, that sort of mutual trust is hard to come by, but that's a post for another day, too.)

Alice starts by sending a message to Bob:

A → B: {Na, A}Kb
In English, this says, "I, Alice, am sending my name and a nonce Na; those are encrypted with Bob's public key Kb, and no one else can read or tamper with that part of the message." After this message is sent and received, both Alice and Bob know that Alice wants to talk to Bob and that she has sent Na. Eve knows who wants to talk to whom, but doesn't know Na.

The next message, from Bob to Alice, is a bit more subtle:

B → A: {Na,Nb}Ka
Bob has returned Alice's nonce to her, generated his own nonce Nb, and encrypted both with Alice's key Ka. Why does Bob send Alice's nonce back to her? It serves two different purposes. First, it assures freshness: this is a response to Alice's message from this session, not from some other run of the protocol. It's not an old message that Eve has saved up to confuse things. At least as important, it shows that the message really came from Bob: Na was sent encrypted to Bob, so no one else could have read it. In other words, Alice now knows that despite Eve's best efforts, her message got through to Bob and Bob got his reply through to Alice. (Recall that if Eve tampered with an encrypted message, it wouldn't decrypt properly, which Alice would have noticed.)

At this point, both Alice and Bob know Na and Nb; Eve knows neither.

In the third and last message, Alice returns Bob's nonce to him:

A → B: {Nb}Kb
This proves that Alice received message 2 correctly; otherwise, she wouldn't know Nb.

At the end of this protocol, Alice and Bob each know both nonces; more important, both know that the other party knows both nonces. They also know who the other party is; only Bob can read messages encrypted with Kb and only Alice can read messages encrypted with Ka. They can then use those nonces to generate a session key. (I won't go into how that's done, but today that isn't that hard.) Problem solved, right? There are two values that they and only they know.

The point here is that encryption, even in a simple scenario, isn't that simple. You really need a session key; setting that up properly takes several messages and things like nonces. This isn't even a realistic scenario—where does Alice's private key come from? Her password is the obvious answer, but she doesn't want to type it constantly. For a more realistic design—that is, one more usable in the real world—see this dialog about the design of the Kerberos system. It's not easy going, but it's probably easier than the rest of this essay.

As complex as those two protocols seem, I've still oversimplified things. For example, I completely ignored the difficulty of actually doing the encryptions, and that matters. The most straightforward way of doing {Na,A}Kb only operates on blocks of exactly 256 bytes when using today's normal key sizes. How do you handle that? It turns out to matter; padding messages inappropriately can also lead to security failures. I (and for that matter Needham and Schroeder) assumed that it was possible to encrypt messages so that they can't be tampered with. That's not easy, either. We know how to do it properly now (though even today not everyone gets it right); back in 1978, not that long after the dawn of academic cryptography, the correct way to do it was not known.

And in the real world? Let's put it like this: the specification for TLS, the protocol used for secure web transactions, is over 100 pages long, and that's without the seven additional RFCs describing extensions and extra features.

All of this complexity introduces danger, and it doesn't take a 100+ page document to encounter it. That three-line protocol I presented? It's got a security flaw; can you find it? Stare at it for a bit. I won't blame you if you don't see it; it took 17 years and automated assistance for the flaw to be discovered...

In 1996, Gavin Lowe published a remarkable paper. He had developed an automated protocol analyzer; when he fed it the Needham-Schroeder protocol, it found a flaw that had gone unnoticed since 1978. Eve's goal here is to impersonate Alice, something she presumably can't do; after all, she can't read messages that Bob encrypts with Ka. She does this by hacking some other party with whom Alice communicates, C (Carol). (Alternatively, maybe Alice doesn't know Eve is an attacker (though perhaps she should) and simply talks to Eve directly.) Eve then uses part of Alice's dialog with Carol to allow her to impersonate Alice when talking with Bob. Ready? Here's where things get complicated. (Again, if you get confused you can skip ahead to the Summary.)

We start with Alice trying to talk to Carol and sending a nonce, encrypted with Carols' public key:

A → C: {Na,A}Kc
Instead of replying immediately to Alice, though, Eve sends a message to Bob while pretending to be Alice:
E(A) → B: {Na,A}Kb
(I use E(A) to mean "Eve, who is pretending to be Alice.")

Bob can't tell the difference; all he knows is that there's a message that claims to be from Alice. He replies to Alice—but remember that Eve has tremendous powers over the network; she can intercept that message and make sure that it never reaches Alice. So:

B → E(A): {Na,Nb}Ka

You'd think that Eve would be stymied here, since only Alice can read that message. But she's cleverer than that. Alice is expecting a nonce pair from Carol; Eve, pretending to be Carol, sends it to Alice as if it were really from Carol:

E(C) → A: {Na,Nb}Ka

At this point, Alice knows Na and Nb, though she thinks that Nb is really Nc. Eve only knows Na, not Nb. But Alice, who knows nothing of the attack, is about to solve that for her by replying:

A → C: {Nb}Kc
In other words, Alice has decrypted the message from Bob and sent it to Carol—in other words, to Eve. Eve now knows Na and Nb. She can now complete the protocol by sending Nb to Bob, again pretending to be Alice:
E(A) → B: {Nb}Kb

I know that this is confusing. Let's take a higher-level look at what just happened. Eve was able to trick Alice into decrypting a message for her. She could do this because Alice thought it came from Carol, whereas in reality it came from Bob. Because of this protocol flaw, the protocol doesn't do what it's supposed to. Eve didn't break the encryption; she still can't read messages encrypted to Alice or Bob. But she was able to trick Alice into doing a decryption for her. Eve is thus able to impersonate Alice when talking to Bob. If Bob (a computer, after all) is really Alice's mail server, this means that Eve can now read all of Alice's email. The failure was a system failure.

There's a simple fix, though I'll present it later. If you want to try to figure out yourself, here's a hint: you have to prevent Alice from being fooled about whom she is talking to.


Cryptographic protocols are hard. The Needham-Schroeder protocol was actually mathematically proven correct, but the proof was flawed. (More accurately, the proof system wasn't powerful enough to detect this sort of attack.) It allowed Eve to impersonate Alice, even though that should have been impossible.

There's complexity all the way down. Did you look at that Kerberos protocol design dialog that I linked to? As it turns out, my very first paper on cryptography identified flaws in the detailed specifications for Kerberos. They fixed the flaws— but a fix in the code to answer one of our criticisms resulted in a different and more serious bug.

We don't use Needham-Schroeder today; there are newer and better protocols. I present this as an illustration of how difficult the problem is. Cryptography, it turns out, is really hard. It's hard even for experts. Trying to add new functionality to allow for exceptional access is far, far harder. Remember: the goal of the original protocol was to set up a secure session between two parties, with no one else able to read the traffic or to impersonate anyone. Our solution was only three messages long, but it was wrong and it took 17 years to find the flaw. Modern, real-world protocols are far more complex; there are very many real-world requirements that have to be met. Why should anyone have any confidence that adding yet more complexity in order to bring a third party into the conversation would be correct? I certainly wouldn't trust it.

The Fix

The fix to the protocol is simple. Consider again that second message:
B → E(A): {Na,Nb}Ka
This is the message that Eve forwarded to Alice, getting her to decrypt it. Bob thought he was talking to Alice; Alice thought she was hearing from Carol. Both, of course, were dealing with Eve. If that message is changed to include Bob's name in the encryption,
B → E(A): {Na,Nb,B}Ka
Alice won't be fooled: she'll see a message that's from Bob when she was expecting one from Carol. She therefore will know that monkey business is going on and won't complete her part of the protocol, thereby foiling the attack.

Why I Wrote Thinking Security

24 November 2015

I have a new book out, Thinking Security: Stopping Next Year's Hackers. There are lots of security books out there today; why did I think another was needed?

Two wellsprings nourished my muse. (The desire for that sort of poetic imagery was not among them.) The first was a deep-rooted dissatisfaction with common security advice. This common "wisdom"—I use the word advisedly—often seemed to be outdated. Yes, it was the distillation of years of conventional wisdom, but that was precisely the problem: the world has changed; the advice hasn't.

Consider, for example, passwords (and that specifically was the other source of my discomfort). We all know what to do: pick strong passwords, don't reuse them, don't write them down, etc. That all seems like very sound advice—but it comes from a 1979 paper by Morris and Thompson. The world was very different then. Many people were still using hard-copy, electromechanical terminals, people had very few logins, and neither defenders nor attackers had much in the way of computational power. None of that is true today. Maybe the advice was still sound, or maybe it wasn't, but very few people seemed to be questioning it. In fact, the requirement was embedded in very static checklists that sites were expected to follow.

Suppose that passwords are in fact terminally insecure. What the alternative? The usual answer is some form of two-factor authentication. Is that secure? Or is two-factor authentication subject to its own problems? If it's secure today, will it remain secure tomorrow? Computer technology is an extremely dynamic field; not only does the technology change, the applications and the threats change as well. Let's put it like this—why should you expect the answers to any of these questions to remain the same?

The only solution, I concluded, was to go back to first principles. What were the fundamental assumptions behind security? It turns out that for passwords, the main reason you need strong passwords is if a site's password database is compromised. In other words, a guessed password is the second failure; if the first could be avoided, the second isn't an issue. But if a site can't protect a password file, can it protect some other sort of authentication database? That doesn't seem likely. What does that mean for the security of other forms of authentication?

Threats also change. 21 years ago, when Bill Cheswick and I wrote Firewalls and Internet Security, no one was sending phishing emails to collect bank account passwords. Of course, there were no online banks then (there was barely a Web), but that's precisely the point. I eventually concluded that threats could be mapped along two axes, how skilled the attacker was and how much your site was being targeted:

Your defenses have to vary. Enterprise-scale firewalls are useful against unskilled joy hackers, they're only a speed bump to intelligence agencies, and targeted attacks are often launched by insiders who are, by definition, on the inside. Special-purpose internal firewalls, though, can be very useful.

All of this and more went into Thinking Security. It's an advanced book, not a collection of checklists. I do give some advice based on today's technologies and threats, but I show what assumptions that advice is based on, and what sorts of changes would lead it to change. I assume you already know what an encryption algorithm is, so I concentrate on what encryption is and isn't good for. The main focus is how to think about the problem. I'm morally certain that right now, someone in Silicon Valley or Tel Aviv or Hyderabad or Beijing or Accra or somewhere is devising something that 10 years from now, we'll find indispensable, but will have as profound an effect on security as today's smartphones have had. (By the way—the iPhone is only about 8 years old, but few people in high-tech can imagine life without it or an Android phone. What's next?) How will we cope?

That's why I wrote this new book. Threats aren't static, so our defenses and our thought processes can't be, either.