The Crypto Wars Resume

7 October 2019

For decades, the US government has fought against widespread, strong encryption. For about as long, privacy advocates and technologists have fought for widespread, strong encryption, to protect not just privacy but also as a tool to secure our computers and our data. The government has proposed a variety of access mechanisms and mandates to permit them to decrypt (lawfully) obtained content; technologists have asserted that "back doors" are inherently insecure. (James Comey used the phrase "golden key"; the neutral term is "exceptional access".)

I personally have been involved with this issue for more than 25 years, and in a fairly strong sense I have nothing new to say—as I and others explained four years ago, from a technical perspective exceptional access is a thoroughly bad idea: it will create insecurity. Cryptography is a complex, subtle discipline; it's really, really hard to get even the basics right. Adding new, unusual requirements creates a high likelihood that there will be new vulnerabilities.

Despite all that, U.S. Attorney-General William Barr has now issued a new call for Facebook to add exceptional access features to its WhatsApp encrypted communications platform. The evils he cites— terrorism, organized crime and child pornography—are indeed evils; I don't think most people would dispute that. But his focus on Facebook is a significant change in direction and, arguably, an esclation of the battle over cryptography.

There is, broadly speaking, a consensus that the exceptional access problem is easier (note: I did not say easy) for devices, and in particular for phones, than for communications. Many reasons are given in the excellent Carnegie Foundation report on the problem; I'll note one more: because secure communications generally require interaction between the parties, there are many more opportunities to get things wrong. By contrast, when law enforcement presents an encrypted phone, all of the cryptography has already taken place. Encrypting objects still isn't easy—witness these new attacks on encrypted PDF files—but the attack surface is smaller.

Why, then, the escalation? Why is Barr going for everything, rather than seeing if there is a feasible solution for encrypted phones? Does he judge that the political moment is right? Is it because Facebook is politically weak right now? Or is it because law enforcement can read devices now?

What is a Security Mechanism?

12 September 2019

Orin Kerr recently blogged about a 9th Circuit decision that held that scraping a public web site (probably) doesn't violate the Computer Fraud and Abuse Act (CFAA). Quoting the opinion (and I copied the quote from that blog post):

For all these reasons, it appears that the CFAA's prohibition on accessing a computer "without authorization" is violated when a person circumvents a computer's generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user's accessing that publicly available data will not constitute access without authorization under the CFAA.
On its surface, it makes sense—you can't steal something that's public—but I think the simplicity of the rule is hiding some very deep questions. One, I think, can most easily be expressed as "what is the cost of the 'attack'"? That is, how much effort must someone expend to get the data? Does that matter? Should it?

Let's start with the Court's example: it is hacking (more precisely, a CFAA violation) if someone bypasses a username and password requirement. But what is the role of the username and password? Is it intended as an actual barrier or as a sign saying "Authorized Personnel Only"? Does it matter if the site has trivial password limitations, e.g., 2 digits only?

More concretely, imagine a badly coded website, where you're prompted for a login and password if you visit the home page, but not if you go directly to some internal page. (For the record, it's really easy for a neophyte to implement something this badly.) Is that a suitable barrier or warning? What if someone else links to an internal page (as I've done, above, to a blog post)? Is clicking on that link, and thus never even seeing the password prompt, a CFAA violation? It's hard to see how the answer could be "yes", but if you think that that example is too contrived, what about a misconfigured firewall that inadvertently permits access to the interior of a corporate net—is someone who stumbles on that access liable? That's a very subtle kind of error, and one that's easy to make.

There are, of course, other forms of access control. One of the simplest is address-based access control: only certain IP addresses may access a certain resource. It's long been known to be weak, but it's still used quite frequently, especially on Intranets. Is this a "generally applicable rule"? Is there a difference between an an address rule that says "these three IP addresses may have access" and "anyone but these three may have access"? Mathematically, they're identical, and it's actually not harder to specify the latter than the former; one doesn't have to write 4,294,967,293 separate "allow" rules. Does it matter if a blocked party changes their IP address to evade the blockage? What if their ISP happens to change it, as some consumer ISPs do quite regularly?

I should note that one common use for such restrictions is geoblocking: excluding certain locations from access to content. This may be major league baseball videos (they're blacked out in areas where there is a local TV channel that carries those games), movies for which a site does not have a world-wide license, and even online gambling if it's in violation of local laws (as in the US). If someone uses a VPN to evade such a restriction, is that a CFAA offense? What if they use Tor, not to evade the restriction but because they value their privacy but just happen to gain access?

There have also been systems that relied on, more or less, just a username or equivalent, and not a password. One of the best-known cases is that of Andrew "weev" Auernheimer; he and a colleague noticed that a database of AT&T customers could be accessed just by knowing the ICCID from an iPad's SIM. For that particular situation, it was possible to enumerate the namespace. Was that hacking? In a controversial move, the Justice Department prosecuted; hs conviction was eventually overturned on rather legalistic grounds, and the underlying CFAA issue was never squarely addressed.

Does it matter how hard it is to enumerate the namespace? Suppose the account numbers were sequential, in which case given a single number it's trivial to find the others. What if the odds on a random number being valid were 1:1,000,000? 1:1,000,000,000,000? Does it matter? Should it?

What all of these scenarios have in common is that they reflect a different degree of effort to gain access to some resource. Sometimes, the effort necessary is known to or knowable by the defender; other times, it may not be. My questions, then, are these:

I don't know the answers to any of these questions, but I think that they're important. Some situations, e.g., intentionally working around a password requirement, are pretty clearly (all other things being equal, which they may not be; see Orin's blog post for that) on the wrong side of the law. An address block where a "access unauthorized" message is displayed may also be clear, which suggests that the real issue of access control is intent and warning. But even there, there are numerous subtleties that are beyond the control of the defender.

Consider a situation where a firewall implements an address-based access control mechanism. Furthermore, the firewall is configured to return an ICMP Administratively Prohibited packet when it sees an unauthorized IP address attempting to connect. How will the requester's software display the error? Will it even know about the prohibition, as opposed to the simple fact that the destination isn't reachable? Does the exact language of the technical specification matter? It says

A Destination Unreachable message that is received MUST be reported to the transport layer. The transport layer SHOULD use the information appropriately
In standards-speak, "SHOULD" is defined:
This word or the adjective "RECOMMENDED" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course.
In other words, perhaps some network implementor did not pass on the code, in which case the application couldn't know.

We seem, then, to be stuck. The court's decision seems to imply the warning aspect as crucial, but sites can't always warn people. And why is a password more of a warning than an explicit communication, as was in fact the case here?

Facebook, Privacy, and Cryptography

1 August 2019

There has long been pressure from governments to provide back doors in encryption systems. Of course, if the endpoints are insecure it doesn't matter much if transmission is encrypted; indeed, a few years ago, I and some colleagues even suggested lawful hacking as an alternative. Crucially, we said that this should be done by taking advantage of existing security holes rather than be creating new ones.

Facebook may have taken part of this to heart. A Forbes columnist has written that Facebook is incorporating content analysis into its mobile client and that

The company even noted that when it detects violations it will need to quietly stream a copy of the formerly encrypted content back to its central servers to analyze further, even if the user objects, acting as true wiretapping service.
(It's not even clear that this claim is accurate, but for this analysis let's assume that it is.)

Now, it's not unreasonable for Facebook to move some analysis to the client; indeed, a few months ago I speculated that they might. But that's a very different issue than using their clients for government access.

As I and others have often noted, security is a systems property. That is, you don't achieve security just by hardening this or encrypting that or putting a firewall in front of some other thing. Rather, security emerges from a system where the individual elements are secure and they're combined properly and there are no gaps and everything is used properly and—well, you get the picture. Let's walk this back: if the Facebook mobile client has wiretapping ability, how might that fail?

First, of course, that code might itself be buggy. To give one example, suppose that the wiretap code tried to set up an encrypted connection back to Facebook. It turns out, though, that certificate-checking in that sort of code is very hard to get right:

We demonstrate that SSL certificate validation is completely broken in many security-critical applications and libraries. Vulnerable software includes Amazon's EC2 Java library and all cloud clients based on it; Amazon's and PayPal's merchant SDKs responsible for transmitting payment details from e-commerce sites to payment gateways; integrated shopping carts such as osCommerce, ZenCart, Ubercart, and PrestaShop; AdMob code used by mobile websites; Chase mobile banking and several other Android apps and libraries; Java Web-services middleware including Apache Axis, Axis 2, Codehaus XFire, and Pusher library for Android and all applications employing this middleware. Any SSL connection from any of these programs is insecure against a man-in-the-middle attack.
The code would work correctly—until someone launched an active attack on the connections.

Alternatively, someone could try to hack Facebook. Facebook is a very sophisticated corporation and probably has very good internal security controls—but are they proof against a major attack by a foreign intelligence agency? At attack that is aided by pressure on some country's expatriates who now work for Facebook?

Beyond that, of course, there are all of the procedural and diplomatic issues: how would Facebook authenticate requests, what about requests from oppressive governments, etc.?

In other words, although this scheme would (proably) not suffer from the fragility of cryptographic protocols, it would open up other avenues for attack.

As I noted above, we endorsed using existing holes, not creating new ones. Yes, it's more expensive, but that isn't necessarily a bad thing. As Justice Sotomayor noted in her concurrence in United States v. Jones, “limited police resources and community hostility” are major checks on police misbehavior. A cheap, surreptitious means of breaking security is exactly the wrong thing to do.

The claim about Facebook's plans may be wrong. I certainly hope so.


Update: Will Cathcart, the VP in charge of WhatsApp, has categorically denied the allegation:
To be crystal clear, we have not done this, have zero plans to do so, and if we ever did it would be quite obvious and detectable that we had done it. We understand the serious concerns this type of approach would raise which is why we are opposed to it.
I hope no one suggests that other companies try this, either—the reasons why it would be bad if Facebook did it are at least as applicable to anyone else, especially to companies with less engineering talent (and that's most of the world).

Buying Computers Properly

15 June 2019

There's been a bit of a fuss lately about a New York Times article that said that Baltimore city computers were hacked in part by an exploit known as EternalBlue that was stolen from the NSA a couple of years ago. I don't know if the attackers targeted the vulnerability used by EternalBlue, nor if they used the actual purloined code. What's more interesting to me is that Microsoft released a patch for the vulnerablity in March 2017, but many city government systems remained unpatched and hence vulnerable. Why? What did Baltimore (and other organizations, especially governments) do wrong? Why were these systems unpatched and hence vulnerable for so long?

(A semantic note: a vulnerablity is the actual flaw in some software package. An exploit is the code that takes advantage of it. EternalBlue is NSA's exploit for the CVE-2017-0144 vulnerability.)

Don't get me wrong; as I've written before, patching is hard and risky. I also noted that sometimes, it's important to take the risk. (And yes, in that latter blog post I was writing about EternalBlue.) But this incident shows a more serious failing: EternalBlue and other exploits targeting that vulnerability have been a threat for more than two years. Baltimore's systems have remained unpatched for that long.

The root of the problem is attitudinal: in many places, computers are treated as capital equipment with a fairly long lifespan, and as devices that need operation but not maintenance. These attitudes may date back to the 1950s, when the first was fairly true and the cost of maintenance was hidden in the operational cost. Neither is true today. Computers are consumables that require regular, skilled care. Skipping this care is like not changing the oil in your car: you can get away with it for a little while, but at some point you're in trouble. In fact, and as I explain below, it's worse than dirty engine oil: not only are you at risk for a security incident, you end up in a maintenance trap.

To paraphrase George R.R. Martin's famous line “valar morghulis”, all code must die. In fact, software is rarely healthy even to start with; vendors constantly issue patches for their products. Eventually, though, the patches stop: there's a new version of the product, and vendors have no interest in continuing to support ancient versions. Not only would they rather sell you something new, there's no viable economic model to pay for continued patch development for older versions. The cost of the first few years of patching is, of course, built in to the initial price of the software. Ultimately, of course, a software package is succeeded by a newer version. Switching versions is in some sense the ultimate patch, and just as patching can be hard, upgrading can be very hard.

Worse yet, patching and upgrading are expensive. System administrators and application programmers need to test and perhaps modify their code for compatiblity with the new version. Extra hardware may be needed, partially as test machines but also because at some point, newer vendor packages (especially operating systems) simply won't run on older computers. If money is tight, it becomes very tempting to postpone patches and upgrades. After all, the reasoning seems to go, if things are working now, why bother changing?

Government agencies are particularly vulnerable here. Year-to-year budgets aren't that predictable; they're at the mercy of political winds, and tax increases are never popular. (Partisan anti-tax rhetoric is another factor, of course.) If there's a funding crunch, deferring software maintenance is an easy thing to cut, especially since it's hard to lay off civil servants and city governments can't easily sell off agencies. (Privatization? That's a political question I won't go into, and ultimately it doesn't matter: whatever the irreducible core of governmental functions, that core will be susceptible to the same dilemma.)

So: software maintenance is deferred. What are the consequences? If the problem is a delayed security patch, your site is vulnerable until the patch is installed. But if you miss a version upgrade, you're in big trouble. For one thing, security patch support tends to end soon thereafter. For another, upgrading to the next version is much harder: there are generally tools to help you migrate to the next version, but tools that reliably upgrade two versions are much, much harder to build, and may not even be available.

The net result is a system that is insecure, unsupported, and more expensive to maintain and upgrade than one that had been patched and upgraded properly all along.

Ultimately, this boils down to money: where will maintenance money come from? The way to solve this problem is to accept, in a tangible form, that a computer is first, a consumable item, and second, one that requires ongoing expenditures. Think of it as a leased automobile: while you have it, you have to insure it, refuel it, and change the oil and tires; when the lease is up, you have to replace it—or do without a car as a means of transportation. In other words, a leased car costs you something to own and operate, and has a relatively short lifespan.

It's the same with computers. An enterprise computer should be regarded as having a lifespan of about four years, and an enterprise's budget should include money for the expense of operating and maintaining each and every computer it owns. Furthermore, budgets should include money for operating system and major application upgrades. They will happen, and doing these upgrades will cost money—but these upgrades are utterly necessary. (No, I'm not saying that a corporation should instantly switch to the next release of WhateverOS as soon as it comes out—I learned “never install .0 of anything” back in 1970—but not all that long after WhateverOS X.0 comes out, release X-1 will drop out of support.)

In a well-run corporation, the CIO would put advice like that into practice. (In a corporation with a well-run IT department, they already know this and do it.) It would be an interesting exercise to try to mandate it for government agencies, either by law or by executive order. I'd be interested in suggestions on how to do it.

A Dangerous, Norm-Destroying Attack

25 March 2019

Kim Zetter has a new story out describing a very serious attack. In fact, the implications are about as bad as possible. The attack has been dubbed ShadowHammer by Kaspersky Lab, which discovered it.

Briefly, some crew of attackers—I suspect an intelligence agency; more on that below—has managed to abuse ASUS' update channel and private signing key to distribute bogus patches. These patches checked the victims' MAC address; machines on the this list (about 600 of them) downloaded the malware payload from a bogus website that masqueraded as belonging to ASUS.

The reason this is so bad is that trust in the update channel is utterly vital. All software is at least potentially buggy, and some of those bugs will be security holes. For this reason, virtually all software is shipped with a built-in update mechanism. Indeed, on consumer versions of Windows 10 patching is automatic, and while this poses some risks, overall it has almost certainly signficantly improved the security of the Internet: most penetrations exploit known holes, holes for which patches exist but have not been installed.

Now we have an attack that points out the danger of malicious updates. If this scares people away from patching their systems, it will hurt the entire Internet, possibly in a disastrous way. Did the people who planned this operation take this risk into account?

I once blogged that

In cyberattacks, there are no accepted rules… The world knows, more or less, what is acceptable behavior in the physical world: what constitutes an act of war, what is spying, what you can do about these, etc. Do the same rules apply in cyberspace?
ShadowHammer is norm-destroying—or rather, it would be, if such norms existed.

Ten years ago, the New York Times reported on a plan to hack Saddam Hussein's bank accounts. They refrained because of the possible consequences and side-effects:

“We are deeply concerned about the second- and third-order effects of certain types of computer network operations, as well as about laws of war that require attacks be proportional to the threat,” said one senior officer.

This officer, who like others spoke on the condition of anonymity because of the classified nature of the work, also acknowledged that these concerns had restrained the military from carrying out a number of proposed missions. “In some ways, we are self-deterred today because we really haven’t answered that yet in the world of cyber,” the officer said.

Whoever launched this attack was either not worried about such issues—or felt that the payoff was worth it.

I am convinced that this attack was launched by some country's intelligence service. I say this for three reasons: it abuses a very sensitive channel, it shows very selective targeting, and the targeting is based on information—MAC addresses—that aren't that widely available.

The nature of the channel is the first clue. Code-signing keys are precious commodities. While one would hope that a company the size of ASUS would use a hardware security model to protect its keys, at the very least they would be expected to have strong defenses around them. This isn't the first time that code-signing keys have been abused—Stuxnet did it, too—but it's not a common thing. This alone shows the attacker's sophistication.

The highly selective nature of the attack is the next clue. Only ASUS users were affected, and of the estimated 500,000 computers that downloaded the bogus update, the real damage was done to only 600. An ordinary thief, one who wanted bank account logins and passwords, wouldn't bother with this sort of restriction. Also, limiting the number of machines that had the actual malicious payload minimizes the risk of discovery. Any attacker might worry about discovery, but governments really don't want covert operations tied back to them.

Finally, there's the question of how the party behind this attack (and we don't know who it is, though Kaspersky has tied it to the BARIUM APT, which some have linked to China). MAC addresses aren't secret, but they're not trivially available to most parties. They're widely available on-LAN; that might suggest that the attacker already had a toehold in the targets' networks. Under certain circumstances, other LANs within an enterprise can see them, too (DHCP Relay, if you're curious). If any of these machines are laptops that have been used elsewhere, e.g., a hotel or public hotspot, someone who had penetrated that infrasctructure could monitor them. They could be on shipping boxes, or in some vendor database, e.g., inside ASUS—which we already know has been compromised. It's even possible to get them externally, if the victims (a) use IPv6, (b) use stateless IP address configuration, (c) don't use the privacy-enhanced version; and (d) visit the attacker's IPv6 website. In any of these scenarios, you'd also have to link particular MAC addresses to particular targets.

Any or all of these are possible. But they all require significant investment and really good intelligence. To me, this plus the other two clues strongly point to some country's intelligence agency.

So: we have a state actor willing to take signficant risks with the total security of the Internet, in pursuit of an objective that may or may not be that important. This is, shall we say, bad. The question is what the security community should recommend as a response. The answer is not obvious.

"Don't patch" is a horrid idea. As I noted, that's a sure-fire recipe for disaster. In fact, if the ShadowHammerers' goal was to destroy the Internet, this is a pretty good first step, to be followed by attacks on the patch channels of other major vendors. (Hmm: as I write this, I'm installing patches to my phone and tablet…)

Cautious individuals and sites may wish to defer installing patches; indeed, the newest version of Windows 10 appears to permit a deferral of 35 days. That allows time for bugs to be shaken out of the patch, and for confirmation that the update is indeed a real one. (Zetter noted that some ASUS users did wonder about the ShadowHammer patch.) Sometimes, though, you can't wait. Equifax was apparently hit very soon after the vulnerability was announced.

Nor is waiting for a vendor announcement a panacea. A high-end attacker—that is to say, a major intelligence agency—can piggyback malware on an existing patch, possibly by subborning insiders.

A high-end vendor might have an independent patch verification team. It would anonymously download patches, reverse-engineer them, and see if they did what they're supposed to do. Of course, that's expensive, and small IoT vendors may not be able to afford that. Besides, there are many versions of some patches, e.g., for different language packs.

Ultimately, I suspect that there is no single answer. System penetration via bogus updates were predicted 45 years ago in the classic Karger/Schell report on Multics security. (For those following along at home, it's in Section 3.4.5.1.) Caution and auditing by all concerned seems to be the best technical path forward. But policy makers have a role, too. We desperately need international agreements on military norms for cyberspace. These won't be easy to devise nor to enforce, but ultimately, self-restraint may be the best answer.


Update: Juan Andres Guerrero-Saade points out that Flame also abused the update channel. This is quite correct, and I should have been clearer about that. My blog post on Flame, cited above, was written a few days before that aspect of it was described publicly, and I misremembered the attack as spoofing a code-signing certificate à la Stuxnet. Flame was thus just as damaging to vital norms.

Update 2: Matt Blaze has an excellent New York Times op-ed on the importance of patching, despite this incident.

Facebook and Privacy

11 March 2019

Mark Zuckerberg shocked a lot of people by promising a new focus on privacy for Facebook. There are many skeptics; Zuckerberg himself noted that the company doesn't "currently have a strong reputation for building privacy protective services". And there are issues that his blog post doesn't address; Zeynep Tufekci discusses many of them While I share many of her concerns, I think there are some other issues—and risks.

The Velocity of Content

Facebook has been criticized for being a channel where bad stuff—anti-vaxxer nonsense, fake news (in the original sense of the phrase…), bigotry, and more—can spread very easily. Tufekci called this out explicitly:

At the moment, critics can (and have) held Facebook accountable for its failure to adequately moderate the content it disseminates—allowing for hate speech, vaccine misinformation, fake news and so on. Once end-to-end encryption is put in place, Facebook can wash its hands of the content. We don't want to end up with all the same problems we now have with viral content online—only with less visibility and nobody to hold responsible for it.
Some critics have called for Facebook to do more to curb such ideas. The company itself has announced it will stop recommending anti-vaccination content. Free speech advocates, though, worry about this a lot. It's not that anti-vaxxer content is valuable (or even coherent…); rather, it's that encouraging such a huge, influential company to censor communications is very dangerous. Besides, it doesn't scale; automated algorithms will make mistakes and can be biased; people not only make mistakes, too, but find the activity extremely stressful. As someone who is pretty much a free speech absolutist myself, I really dislike censorship. That said, as a scientist I prefer not closing my eyes to unpleasant facts. What if Facebook really is different enough that a different paradigm is needed?

Is Facebook that different? I confess that I don't know. That is, it has certain inherent differences, but I don't know if they're great enough in effect to matter, and if so, if the net benefit is more or less than the net harm. Still, it's worth taking a look at what these differences are.

Before Gutenberg, there was essentially no mass communication: everything was one person speaking or writing to a few others. Yes, the powerful—kings, popes, and the like—could order their subordinates to pass on certain messages, and this could have widespread effect. Indeed, this phenomenon was even recognized in the Biblical Book of Esther

3:12 Then were the king's scribes called on the thirteenth day of the first month, and there was written according to all that Haman had commanded unto the king's lieutenants, and to the governors that were over every province, and to the rulers of every people of every province according to the writing thereof, and to every people after their language; in the name of king Ahasuerus was it written, and sealed with the king's ring.

3:13 And the letters were sent by posts into all the king's provinces, to destroy, to kill, and to cause to perish, all Jews, both young and old, little children and women, in one day, even upon the thirteenth day of the twelfth month, which is the month Adar, and to take the spoil of them for a prey.

3:14 The copy of the writing for a commandment to be given in every province was published unto all people, that they should be ready against that day.

3:15 The posts went out, being hastened by the king's commandment, and the decree was given in Shushan the palace. And the king and Haman sat down to drink; but the city Shushan was perplexed.

By and large, though, this was rare.

Gutenberg's printing press made life a lot easier. People other than potentates could produce and distribute fliers, pamphlets, newspapers, books, and the like. Information became much more democratic, though, as has often been observed, "freedom of the press belongs to those who own printing presses". There was mass communication, but there were still gatekeepers: most people could not in practice reach a large audience without the permission of a comparative few. Radio and television did not change this dynamic.

Enter the Internet. There was suddenly easy, cheap, many-to-many communication. A U.S. court recognized this. All parties to the case (on government-mandated censorship of content accessible to children) stipulated, among other things:

79. Because of the different forms of Internet communication, a user of the Internet may speak or listen interchangeably, blurring the distinction between "speakers" and "listeners" on the Internet. Chat rooms, e-mail, and newsgroups are interactive forms of communication, providing the user with the opportunity both to speak and to listen.

80. It follows that unlike traditional media, the barriers to entry as a speaker on the Internet do not differ significantly from the barriers to entry as a listener. Once one has entered cyberspace, one may engage in the dialogue that occurs there. In the argot of the medium, the receiver can and does become the content provider, and vice-versa.

81. The Internet is therefore a unique and wholly new medium of worldwide human communication.

The judges recognized the implications:
It is no exaggeration to conclude that the Internet has achieved, and continues to achieve, the most participatory marketplace of mass speech that this country—and indeed the world—has yet seen. The plaintiffs in these actions correctly describe the "democratizing" effects of Internet communication: individual citizens of limited means can speak to a worldwide audience on issues of concern to them. Federalists and Anti-Federalists may debate the structure of their government nightly, but these debates occur in newsgroups or chat rooms rather than in pamphlets. Modern-day Luthers still post their theses, but to electronic bulletin boards rather than the door of the Wittenberg Schlosskirche. More mundane (but from a constitutional perspective, equally important) dialogue occurs between aspiring artists, or French cooks, or dog lovers, or fly fishermen.

Indeed, the Government's asserted "failure" of the Internet rests on the implicit premise that too much speech occurs in that medium, and that speech there is too available to the participants. This is exactly the benefit of Internet communication, however. The Government, therefore, implicitly asks this court to limit both the amount of speech on the Internet and the availability of that speech. This argument is profoundly repugnant to First Amendment principles.

But what if this is the problem? What if this new, many-to-many communications, is precisely what is causing trouble? More precisely, what if the problem is the velocity of communcation, in units of people per day?

High velocity propagation appears to be exacerbated by automation, either explicitly or as a side-effect. YouTube's recommendation algorithm appears to favor extremist content. Facebook has a similar problem:

Contrast this, however, with another question from Ms. Harris, in which she asked Ms. Sandberg how Facebook can “reconcile an incentive to create and increase your user engagement when the content that generates a lot of engagement is often inflammatory and hateful.” That astute question Ms. Sandberg completely sidestepped, which was no surprise: No statistic can paper over the fact that this is a real problem.

Facebook, Twitter and YouTube have business models that thrive on the outrageous, the incendiary and the eye-catching, because such content generates “engagement” and captures our attention, which the platforms then sell to advertisers, paired with extensive data on users that allow advertisers (and propagandists) to “microtarget” us at an individual level.

The velocity, in these cases, appears to be a side-effect of this algorithmic desire for engagement. Sometimes, though, bots appear to be designed to maximize the spread of malicious content. Either way, information spreads far more quickly than it used to, and on a many-to-many basis.

Zuckerberg suggests that Facebook wants to focus on smaller-scale communications:

This is different from broader social networks, where people can accumulate friends or followers until the services feel more public. This is well-suited to many important uses—telling all your friends about something, using your voice on important topics, finding communities of people with similar interests, following creators and media, buying and selling things, organizing fundraisers, growing businesses, or many other things that benefit from having everyone you know in one place. Still, when you see all these experiences together, it feels more like a town square than a more intimate space like a living room.

There is an opportunity to build a platform that focuses on all of the ways people want to interact privately. This sense of privacy and intimacy is not just about technical features—it is designed deeply into the feel of the service overall. In WhatsApp, for example, our team is obsessed with creating an intimate environment in every aspect of the product. Even where we've built features that allow for broader sharing, it's still a less public experience. When the team built groups, they put in a size limit to make sure every interaction felt private. When we shipped stories on WhatsApp, we limited public content because we worried it might erode the feeling of privacy to see lots of public content—even if it didn't actually change who you're sharing with.

What if Facebook evolves that way, and moves more towards small-group communication rather than being a digital town square? What will be the effect? Will smaller-scale many-to-many communications behave this way?

I personally like being able to share my thoughts with the world. I was, after all, one of the creators of Usenet; I still spend far too much time on Twitter. But what if this velocity is bad for the world? I don't know if it is, and I hope it isn't—but what if it is?

One final thought on this… In democracies, restrictions on speech are more likely to pass legal scrutiny if they're content-neutral. For example, a loudspeaker truck advocating some controversial position can be banned under anti-noise regulations, regardless of what it is saying. It is quite possible that a velocity limit would be accepted—and it's not at all clear that this would be desirable. Authoritarian governments are well aware of the power of mass communications:

The use of big-character-posters did not end with the Cultural Revolution. Posters appeared in 1976, during student movements in the mid-1980s, and were central to the Democracy Wall movement in 1978. The most famous poster of this period was Wei Jingsheng's call for democracy as a "fifth modernization." The state responded by eliminating the clause in the Constitution that allowed people the right to write big-character-posters, and the People’s Daily condemned them for their responsibility in the "ten years of turmoil" and as a threat to socialist democracy. Nonetheless the spirit of the big-character-poster remains a part of protest repertoire, whether in the form of the flyers and notes put up by students in Hong Kong's Umbrella Movement or as ephemeral posts on the Chinese internet.
As the court noted, "Federalists and Anti-Federalists may debate the structure of their government nightly, but these debates occur in newsgroups or chat rooms rather than in pamphlets." Is it good if we give up high-velocity, many-to-many communications?

Certainly, there are other channels than Facebook. But it's unique: with 2.32 billion users, it reaches about 30% of the world's population. Any change it makes will have worldwide implications. I wonder if they'll be for the best.

Possible Risks

Zuckerberg spoke of much more encryption, but he also noted the risks of encrypted content: "Encryption is a powerful tool for privacy, but that includes the privacy of people doing bad things. When billions of people use a service to connect, some of them are going to misuse it for truly terrible things like child exploitation, terrorism, and extortion. We have a responsibility to work with law enforcement and to help prevent these wherever we can". What does this imply?

One possibility, of course, is that Facebook might rely more on metadata for analysis: "We are working to improve our ability to identify and stop bad actors across our apps by detecting patterns of activity." But he also spoke of analysis "through other means". What might they be? Doing client-side analysis? About 75% of Facebook users employ mobile devices to access the service; Facebook clients can look at all sorts of things. Content analysis can happen that way, too; though Facebook doesn't use content to target ads, might it use it for censorship, good or bad?

Encryption also annoys many governments. Governments disliking encryption is not new, of course, but the more people use it, the more upset they will get. This will be exacerbated if encrypted messaging is used for mass communications; Tufekci is specifically concerned about that: "Once end-to-end encryption is put in place, Facebook can wash its hands of the content. We don't want to end up with all the same problems we now have with viral content online—only with less visibility and nobody to hold responsible for it." We can expect pressure for back doors to increase—but they'll still be a dangerous idea, for all of the reasons we've outlined. (And of course that interacts with the free speech issue.)

I'm not even convinced that Facebook can actually pull this off. Here's the problem with encryption: who has the keys? Note carefully: you need the key to read the content—but that implies that if the authorized user loses her key, she herself has lost access to her content and messages. The challenge for Facebook, then, is protecting keys against unauthorized parties—Zuckerberg specifically calls out "heavy-handed government intervention in many countries" as a threat—but also making them available to authorized users who have suffered some mishap. Matt Green calls this mud puddle test: if you drop your device in a mud puddle and forget your password, how do you recover your keys?

Apple has gone to great lengths to lock themselves out of your password. Facebook could adopt a similar strategy—but that could mean that a forgotten password means loss of all encrypted content. Facebook of course has a way to recover from a forgotten password—but will that recover a lost key? Should it? So-called secondary authentication is notoriously weak. Perhaps it's an acceptable tradeoff to regain access to your account but lose access to older content—indeed, Zuckerberg explicitly spoke of the desirability of evanescent content. But even if that's a good tradeoff—Zuckerberg says "you'd have the ability to change the timeframe or turn off auto-deletion for your threads if you wanted"—if someone else (including a government) took control of you're account, it would violate another principle Facebook holds dear: "there must never be any doubt about who you are communicating with".

How Facebook handles this dilemma will be very important. Key recovery will make many users very happy, but it will allow the "heavy-handed government intervention" Zuckerberg decries. A user-settable option on key recovery? The usability of any such an option is open to serious question; beyond that, most users will go with the default, and will thus inherit the risks of that default.

Microsoft is Abandoning SHA-1 Hashes for Updates---But Why?

19 February 2019

Microsoft is shipping a patch to eliminate SHA-1 hashes from its update process. There's nothing wrong with eliminating SHA-1—but their reasoning may be very interesting.

SHA-1 is a "cryptographic hash function". That is, it takes an input file of any size and outputs 20 bytes. An essential property of cryptographic hash functions is that in practice (though obviously not in theory), no two files should have the same hash value unless the files are identical.

SHA-1 no longer has that property; we've known that for about 15 years. But definitions matter. SHA-1 is susceptible to a "collision attack": an attacker can simultaneously create two files that have the same SHA-1 hash. However, given an existing file and hence its hash, it is not possible, as far as anyone knows, to generate a second file with that same hash. This attack, called a "pre-image attack", is far more serious. (There's a third type of attack, a "second pre-image attack", which I won't go into.)

In the ordinary sequence of events, someone at Microsoft prepares an update file. Its hash—its SHA-1 hash, in many cases—is calculated; this value is then digitally signed. Someone who wished to create a fake update would have to crack either the signature algorithm or, somehow, produce a fake update that had the same hash value as the legitimate update. But that's a pre-image attack, and SHA-1 is still believed to be secure against those. So: is this update useless? Not quite—there's still a risk.

Recall that SHA-1 is vulnerable to a collision attack. This means that if two updates are prepared simultaneously, one good and one evil, there can be a signed, malicious update. In other words, the threat model here is a corrupt insider. By eliminating use of SHA-1 for updates, Microsoft is protecting users against misbehavior by one of its own employees.

Now, perhaps this is just housekeeping. Microsoft can get SHA-1 out of its code base, and thus discourage its use. And it's past time to do that; the algorithm is about 25 years old and does have serious weaknesses. But it's also recognition that an insider who turns to the Dark Side can be very dangerous.

Yes, "algorithms" can be biased. Here's why.

25 January 2019

My thoughts on algorithmic bias are in an op-ed at Ars Technica.

Tags: ML

Protecting Privacy Differently

9 November 2018

My thesis is simple: the way we protect privacy today is broken and cannot be fixed without a radical change in direction.

My full argument is long; I submitted it to the NTIA's request for comments on privacy. Here's a short summary.

For almost 50 years, privacy protection has been based on the Fair Information Practice Principles (FIPPs). There are several provisions, including purpose specification and transparency, but fundamentally, the underlying principle is notice and consent: users must be informed about collection and consent to it. This is true for both the strong GDPR model and the much weaker US model: ultimately, users have to understand what's being collected and how the data will be used, and agree to it. Unfortunately, the concept no longer works (if indeed it ever did). Arthur Miller (no, not the playwright) put it this way:

A final note on access and dissemination. Excessive reliance should not be placed on what too often is viewed as a universal solvent—the concept of consent. How much attention is the average citizen going to pay to a governmental form requesting consent to record or transmit information? It is extremely unlikely that the full ramifications of the consent will be spelled out in the form; if they were, the document probably would be so complex that the average citizen would find it incomprehensible. Moreover, in many cases the consent will be coerced, not necessarily by threatening a heavy fine or imprisonment, but more subtly by requiring consent as a prerequisite to application for a federal job, contract, or subsidy.

The problem today is worse. Privacy policies are vague and ambiguous; besides, no one reads them. And given all of the embedded content on web pages, no one knows which policies to read.

What should replace notice and consent? It isn't clear. One possibility is use controls: user specify for what their information can be used, rather than who can collect it. But use controls pose their own problems. They may be too complex to use, there are continuity issues, and—at least in the US—there may be legal issues standing in their way.

I suspect that what we need is a fundamentally new paradigm. While we're at it, we should also work on a better definition of privacy harms. People in the privacy community take for granted that too much information collection is bad, but it is often hard to explain to others just what the issue is. It often seems to boil down to "creepiness factor".

These are difficult research questions. Until we have something better, we should use use controls; until we can deploy those, we need regulatory changes about how embedded content is handled. In the US, we should also clarify the FTC's authority to act against privacy violators.

None of this is easy. But our "data shadow" is growing longer every day; we need to act quickly.

A Voting Disaster Foretold

27 October 2018

The 2018 Texas general election is going to be a disaster, and that's independent of who wins or loses. To be more precise, I should say "who appears to win or lose", because we're never really going to know. Despite more than 15 years of warnings, Texas still uses DRE (Direct Recording Electronic) voting machines, where a vote is entered into a computer and there is no independent record of how people actually voted. And now, with early voting having started, people's votes are being changed by the voting machines.

This isn't the first time votes have been miscounted because of DRE machine failures. In 2004, "Carteret County lost 4,438 votes during the early-voting period leading up to Election Day because a computer didn't record them." Ed Felten has often written about the machines' own outputs show inconsistencies. (For some reasons, images are not currently showing on those blog posts, so I've linked to a Wayback Machine copy.)

It doesn't help that the problem here appears to be due to a completely avoidable design error by the vendor. Per the Texas Secretary of State's office, bad things can happen if voters operate controls while the page is rendering. That's an excuse, not a reason, and it's a bad one. Behavior like that is completely unacceptable from a human factors perspective. If the system will misbehave from data entry during rendering, the input controls should be disabled or inputs from them should be ignored during that time— period, end of discussion. There is literally no excuse for not doing this correctly. Programming this correctly is "hard"? Sorry; not an acceptable answer. And judging from how quickly Texas officials "diagnosed" the problem, it appears that they've known about the issue and let it ride. Again, this is completely unacceptable.

I've been warning about buggy voting machine software for more than 10 years:

Ironically, for all that I'm a security expert, my real concern with electronic voting machines is ordinary bugs in the code. These have demonstrably happened. One of the simplest cases to understand is the counter overflow problem: the voting machine used too small a field for the number of votes cast. The machine used binary arithmetic (virtually all modern computers do), so the critical number was 32,767 votes; the analogy is trying to count 10,000 votes if your counter only has 4 decimal digits. In that vein, the interesting election story from 2000 wasn't Florida, it was Bernalillo County, New Mexico; you can see a copy of the Wall Street Journal story about the problem here.
I haven't changed my mind
Bellovin is "much more worried about computer error — buggy code — than cyberattacks," he says. "There have been inexplicable errors in some voting machines. It's a really hard problem to deal with. It's not like, say, an ATM system, where they print out a log of every transaction and take pictures, and there's a record. In voting you need voter privacy — you can't keep logs — and there's no mechanism for redoing your election if you find a security problem later."

Others agree. A recent National Academies report noted long-standing concerns about DRE machines:

The rapid growth in the prominence of DREs brought greater voice to concerns about their use, particularly their vulnerability to software malfunctions and external security risks. And as with the lever machines that preceded them, without a paper record, it is not possible to conduct a convincing audit of the results of an election.
and recommended that
4.11 Elections should be conducted with human-readable paper ballots. These may be marked by hand or by machine (using a ballot-marking device); they may be counted by hand or by machine (using an optical scanner). Recounts and audits should be conducted by human inspection of the human-readable portion of the paper ballots. Voting machines that do not provide the capacity for independent auditing (e.g., machines that do not produce a voter-verifiable paper audit trail) should be removed from service as soon as possible.

4.12 Every effort should be made to use human-readable paper ballots in the 2018 federal election. All local, state, and federal elections should be conducted using human-readable paper ballots by the 2020 presidential election.

This election will undooubtedly end up in court: there's a hotly contested Senate race, and both campaigns are very well-funded. Whatever the outcome, many people will feel that they were disenfranchised—and it didn't have to happen.