Useful Links

Recent Posts

Archive

Microsoft is Abandoning SHA-1 Hashes for Updates—But Why?

19 February 2019

Microsoft is shipping a patch to eliminate SHA-1 hashes from its update process. There's nothing wrong with eliminating SHA-1—but their reasoning may be very interesting.

SHA-1 is a "cryptographic hash function". That is, it takes an input file of any size and outputs 20 bytes. An essential property of cryptographic hash functions is that in practice (though obviously not in theory), no two files should have the same hash value unless the files are identical.

SHA-1 no longer has that property; we've known that for about 15 years. But definitions matter. SHA-1 is susceptible to a "collision attack": an attacker can simultaneously create two files that have the same SHA-1 hash. However, given an existing file and hence its hash, it is not possible, as far as anyone knows, to generate a second file with that same hash. This attack, called a "pre-image attack", is far more serious. (There's a third type of attack, a "second pre-image attack", which I won't go into.)

In the ordinary sequence of events, someone at Microsoft prepares an update file. Its hash—its SHA-1 hash, in many cases—is calculated; this value is then digitally signed. Someone who wished to create a fake update would have to crack either the signature algorithm or, somehow, produce a fake update that had the same hash value as the legitimate update. But that's a pre-image attack, and SHA-1 is still believed to be secure against those. So: is this update useless? Not quite—there's still a risk.

Recall that SHA-1 is vulnerable to a collision attack. This means that if two updates are prepared simultaneously, one good and one evil, there can be a signed, malicious update. In other words, the threat model here is a corrupt insider. By eliminating use of SHA-1 for updates, Microsoft is protecting users against misbehavior by one of its own employees.

Now, perhaps this is just housekeeping. Microsoft can get SHA-1 out of its code base, and thus discourage its use. And it's past time to do that; the algorithm is about 25 years old and does have serious weaknesses. But it's also recognition that an insider who turns to the Dark Side can be very dangerous.

Yes, "algorithms" can be biased. Here's why.

25 January 2019

My thoughts on algorithmic bias are in an op-ed at Ars Technica.

Protecting Privacy Differently

9 November 2018

My thesis is simple: the way we protect privacy today is broken and cannot be fixed without a radical change in direction.

My full argument is long; I submitted it to the NTIA's request for comments on privacy. Here's a short summary.

For almost 50 years, privacy protection has been based on the Fair Information Practice Principles (FIPPs). There are several provisions, including purpose specification and transparency, but fundamentally, the underlying principle is notice and consent: users must be informed about collection and consent to it. This is true for both the strong GDPR model and the much weaker US model: ultimately, users have to understand what's being collected and how the data will be used, and agree to it. Unfortunately, the concept no longer works (if indeed it ever did). Arthur Miller (no, not the playwright) put it this way:

A final note on access and dissemination. Excessive reliance should not be placed on what too often is viewed as a universal solvent—the concept of consent. How much attention is the average citizen going to pay to a governmental form requesting consent to record or transmit information? It is extremely unlikely that the full ramifications of the consent will be spelled out in the form; if they were, the document probably would be so complex that the average citizen would find it incomprehensible. Moreover, in many cases the consent will be coerced, not necessarily by threatening a heavy fine or imprisonment, but more subtly by requiring consent as a prerequisite to application for a federal job, contract, or subsidy.

The problem today is worse. Privacy policies are vague and ambiguous; besides, no one reads them. And given all of the embedded content on web pages, no one knows which policies to read.

What should replace notice and consent? It isn't clear. One possibility is use controls: user specify for what their information can be used, rather than who can collect it. But use controls pose their own problems. They may be too complex to use, there are continuity issues, and—at least in the US—there may be legal issues standing in their way.

I suspect that what we need is a fundamentally new paradigm. While we're at it, we should also work on a better definition of privacy harms. People in the privacy community take for granted that too much information collection is bad, but it is often hard to explain to others just what the issue is. It often seems to boil down to "creepiness factor".

These are difficult research questions. Until we have something better, we should use use controls; until we can deploy those, we need regulatory changes about how embedded content is handled. In the US, we should also clarify the FTC's authority to act against privacy violators.

None of this is easy. But our "data shadow" is growing longer every day; we need to act quickly.

A Voting Disaster Foretold

27 October 2018

The 2018 Texas general election is going to be a disaster, and that's independent of who wins or loses. To be more precise, I should say "who appears to win or lose", because we're never really going to know. Despite more than 15 years of warnings, Texas still uses DRE (Direct Recording Electronic) voting machines, where a vote is entered into a computer and there is no independent record of how people actually voted. And now, with early voting having started, people's votes are being changed by the voting machines.

This isn't the first time votes have been miscounted because of DRE machine failures. In 2004, "Carteret County lost 4,438 votes during the early-voting period leading up to Election Day because a computer didn't record them." Ed Felten has often written about the machines' own outputs show inconsistencies. (For some reasons, images are not currently showing on those blog posts, so I've linked to a Wayback Machine copy.)

It doesn't help that the problem here appears to be due to a completely avoidable design error by the vendor. Per the Texas Secretary of State's office, bad things can happen if voters operate controls while the page is rendering. That's an excuse, not a reason, and it's a bad one. Behavior like that is completely unacceptable from a human factors perspective. If the system will misbehave from data entry during rendering, the input controls should be disabled or inputs from them should be ignored during that time— period, end of discussion. There is literally no excuse for not doing this correctly. Programming this correctly is "hard"? Sorry; not an acceptable answer. And judging from how quickly Texas officials "diagnosed" the problem, it appears that they've known about the issue and let it ride. Again, this is completely unacceptable.

I've been warning about buggy voting machine software for more than 10 years:

Ironically, for all that I'm a security expert, my real concern with electronic voting machines is ordinary bugs in the code. These have demonstrably happened. One of the simplest cases to understand is the counter overflow problem: the voting machine used too small a field for the number of votes cast. The machine used binary arithmetic (virtually all modern computers do), so the critical number was 32,767 votes; the analogy is trying to count 10,000 votes if your counter only has 4 decimal digits. In that vein, the interesting election story from 2000 wasn't Florida, it was Bernalillo County, New Mexico; you can see a copy of the Wall Street Journal story about the problem here.
I haven't changed my mind
Bellovin is "much more worried about computer error — buggy code — than cyberattacks," he says. "There have been inexplicable errors in some voting machines. It's a really hard problem to deal with. It's not like, say, an ATM system, where they print out a log of every transaction and take pictures, and there's a record. In voting you need voter privacy — you can't keep logs — and there's no mechanism for redoing your election if you find a security problem later."

Others agree. A recent National Academies report noted long-standing concerns about DRE machines:

The rapid growth in the prominence of DREs brought greater voice to concerns about their use, particularly their vulnerability to software malfunctions and external security risks. And as with the lever machines that preceded them, without a paper record, it is not possible to conduct a convincing audit of the results of an election.
and recommended that
4.11 Elections should be conducted with human-readable paper ballots. These may be marked by hand or by machine (using a ballot-marking device); they may be counted by hand or by machine (using an optical scanner). Recounts and audits should be conducted by human inspection of the human-readable portion of the paper ballots. Voting machines that do not provide the capacity for independent auditing (e.g., machines that do not produce a voter-verifiable paper audit trail) should be removed from service as soon as possible.

4.12 Every effort should be made to use human-readable paper ballots in the 2018 federal election. All local, state, and federal elections should be conducted using human-readable paper ballots by the 2020 presidential election.

This election will undooubtedly end up in court: there's a hotly contested Senate race, and both campaigns are very well-funded. Whatever the outcome, many people will feel that they were disenfranchised—and it didn't have to happen.

The National Academies Report "The Future of Voting"

6 September 2018

The National Academies of Sciences, Engineering, and Medicine has just released a new report on The Future of Voting. The recommendations in that report are clear, unambiguous, far-reaching, and (in my opinion) absolutely correct. I won't try to summarize it—if you can't read the whole thing, at least read the executive summary—but a few of the recommendations are worth highlighting:

Implicit in all of this: voting is a systems problem. The registration systems, the pollbooks, the actual ballot-casting and tallying, the workers, the vendors, and more are all part of the system. This means that all of these areas need to be addressed. I'm glad the committee recognized this.

Also: though it isn't a major part of the report, the committee did briefly address those who suggest that the blockchain should be employed to secure elections. Again, they were unambiguous:

While the notion of using a blockchain as an immutable ballot box may seem promising, blockchain technology does little to solve the fundamental security issues of elections, and indeed, blockchains introduce additional security vulnerabilities.

If you're at all concerned about voting, read this report. My congratulations to the committee on a wonderful job.

German Cryptanalytic Attacks on the British World War II "TYPEX" Machine

24 August 2018

This morning, I saw a link to a fascinating document. Briefly, it's a declassified TICOM document on some German cryptanalytic efforts during World War II. There are a number of interesting things about it, starting with the question of why it took until 2018 to declassify this sort of information. But maybe there's a answer lurking here…

(Aside: I'm on the road and don't have my library handy; I may update this post when I get home.)

TYPEX—originally Type 10, or Type X—was the British high-level cipher device. It was based on the commercial Enigma as modified by the British. The famous German military Enigma was also derived from the commercial model. Although the two parties strengthened it in different ways, there were some fundamental properties—and fundamental weaknesses—that both inherited from the original design. And the Germans had made significant progress against TYPEX—but they couldn't take it to the next level.

The German Amy Cryptanalytic Agency, OKH/In 7/VI, did a lot of statistical work on TYPEX. They eventually figured out more or less everything about how it worked, learning only later that the German army had captured three TYPEX units at Dunkirk. All that they were missing were the rotors, and in particular how they were wired and where the "notch" was on each. (The notch controlled when the rotor would kick over to the next position.) And if they'd had the rotor details and a short "crib" (known plaintext)?

The approximate number of tests required would be about 6 × 143 = 16,464. This was not by any means a large number and could certainly be tackled by hand. No fully mechanised method was suggested, but a semi-mechanised scheme using a converted Enigma and a lampboard was suggested. There can be no doubt that it would have worked if the conditions (a) and (b) had ever been fulfilled. Moreover, the step from a semi-mechanised approach to a fully automatic method would not have been a difficult one.
In other words, the Germans never cracked TYPEX because they didn't know anything about the rotors and never managed to "pinch" any. But the British did have the wiring of the Enigma rotors. How?

It turns out that the British never did figure that one out. It was the work of a brilliant Polish mathematician, Marian Rejewski; the Poles eventually gave their results to the French and the British, since they realized that even perfect knowledge of German plans wouldn't help if their army was too weak to exploit the knowledge.

Rejewski was, according to David Kahn, the first person to use mathematics other than statistics and probability in cryptanalysis. In particular, he used group theory and permutation theory to figure out the rotor wiring. This was coupled with a German mistake in how they encrypted the indicators, the starting positions of the rotors. (Space prohibits a full discussion of what that means. I recommend Kahn's Seizing the Enigma and Budiansky's Battle of Wits for more details.)

But what if the Germans had solved TYPEX? What would that have meant? Potentially, it would have been a very big deal.

The first point is that since TYPEX and the German military Enigma had certain similarities, the ability to crack TYPEX (which is generally considered stronger than Enigma) might have alerted the Germans that the British could do the same to them—which was, of course, the case. If that wasn't enough, the British often used TYPEX to communicate ULTRA—the intelligence derived from cryptanalysis of Engima and some other systems—to field units. (Aside: the British used one-time pads to send ULTRA to army-level commands but used TYPEX for lower-level units.) In other words, had the German army gained the ability to read TYPEX, it might have been extremely serious. And although their early work was on 1940 and earlier TYPEX, "Had they succeeded in reading early traffic it seems reasonable to conjecture that they might have maintained continuity beyond the change on 1/7/40 when the 'red' drums were introduced." It's certainly the case that the British exploited continuity with Enigma; most historians agree that if the Germans had used Enigma at the start of the war as well as they used it at the end, it's doubtful that the British could have cracked it.

There are a couple of other interesting points in the TICOM report. For one thing, at least early in the war British cipher clerks were making the same sorts of mistakes as the German clerks did: "operators were careless about moving the wheels between the end of one message and the start of the next". The British called their insight about similar laziness by the Germans the "Herivel tip". And the British didn't even encipher their indicators; they sent them in the clear. (To be sure, the bad way the Germans encrypted their indicators was what led to the rotor wiring being recovered, thus showing that not even trying can be better than doing something badly!)

So where are we? The Germans knew how TYPEX worked and had devised an attack that was feasible if they had the rotor wiring. But they never captured any rotors and they lacked someone with the brilliance of Marian Rejewski, so they couldn't make any progress. We're also left with a puzzle: why was this so sensitive that it wasn't declassified until more than 70 years the war? Might the information have been useful to someone else, someone who did know the rotor wiring?

It wouldn't have been the U.S. The U.S. and the British cooperated very closely on ULTRA, though the two parties didn't share everything: "None of our allies was permitted even to see the [SIGABA] machine, let alone have it." Besides, TICOM was a joint project; the US had the same information on TYPEX's weaknesses. However, might the Soviets have benefited? They had plenty of well-placed agents in the U.K. Might they have had the rotor wirings? I don't know—but I wonder if something other than sheer inertia kept that report secret for so many years.