8 August 2018
I keep hearing stories of people using "foldering" for covert communications. Foldering is the process of composing a message for another party, but instead of sending it as an email, you leave it in the Drafts folder. The other party then logs in to the same email account and reads the message; they can then reply via the same technique. Foldering has been used for a long time, most famously by then-CIA director David Petraeus and his biographer/lover Paula Broadwell. Why is foldering used? What is it good for, and what are its weaknesses? There's a one-word answer to its strength—metadata—but its utility (to the extent that it had any) is largely that of a bygone era.
Before I start, I need to define a few technical terms. In the email world, there are "MUAs"—Mail User Agents—and "MTAs"—Mail Transfer Agents. They're different.
An MUA is what you use to compose and read email. It could be a dedicated mail program—the Mail app on iPhones and MacOS, Outlook on Windows, etc. An MUA needs to configured with the domain names of the user's outbound and inbound email servers. MUAs live on user machines, like laptops and phones; MTAs are servers, and are run by corporations, ISPs, and mail providers like Google. And there's a third piece, an inbound mail server. A receiving MTA hands off the mail to the inbound mail server; the MUA talks to it and pulls down email from it.
Webmail systems are a bit funny. Technically, they're remote MUAs that you talk to via a web browser. But they still talk to MTAs and inbound mail servers, though you don't see this. The MUA and MTA might be on the same computer for a small operation (perhaps running the open source squirrelmail package); for something the size of Gmail or Hotmail, the webmail servers are on separate machines from the MTAs. However, foldering doesn't involve an MTA. Rather, it involves composing messages and leaving them in some folder. The folders are all stored on disk—as it turns out, on disk managed by the inbound mail server, even though you're composing mail. (Why? Because only inbound mail servers and MUAs know about folders; MTAs don't. The MUA could have a draft mail folder (it probably does), but by sending it to the inbound mail server, you can start composing email on one device and continue from another.)
Webmail systems are, as I said, MUAs. For technical reasons, they generally don't have any permanent folder storage of their own; they just talk to the inbound mail server.
So: foldering via a webmail system involves a web server and an inbound mail server. It does not involve an MTA—and that's important.
If you're trying to engage in covert communications, you're not going to use your own mail systems—it's too obvious what's going on. Accordingly, you'll probably use a free commercial email service such as Google's Gmail or Microsoft's Outlook. The party with whom you're communicating will do the same. Let's follow the path of a typical email from a Gmail user (per the usual conventions in cryptography, we'll call her Alice) to an Outlook user named Bob.
The sender logs in to Gmail, probably via a web browser though possibly via an MUA app. Even back in the mists of time, the login connection was encrypted. However, until 2010, the actual session wasn't encrypted by default, though users were able to turn on encryption since at least 2008. Let's assume that our hypothetical conspirators or lovers were security-conscious, and thus turned on encryption for this link. That meant that no eavesdropper could see what was going on, and in particular could not see who logged in to Gmail or to whom a particular email was being sent. After Alice clicks "Send", though, the webmail MUA hands the message off to the MTA—and that's where the security breaks down. Back then, the MTA-to-MTA traffic was not encrypted; thus, someone—an intelligence agency?—monitoring the Internet backbone would see the emails. Bingo: our conspirators are burned. And even if we're talking about simple legal processes, the sender and recipient of such email messages are (probably) legally metadata and hence are readily available to law enforcement.
Suppose, though, that Alice and Bob used foldering. There are no MTAs involved, hence no sender/receiver metadata, and no unencrypted content flowing anywhere. They're safe—or so they thought…
When Alice logs into Gmail, her IP address is recorded. It, too, is metadata. An eavesdropper doesn't know that it's Alice, but her IP address is visible. More importantly, it's logged by Gmail: user Alice logged in from 203.0.113.42. Oddly enough, "Alice"—it's really Bob, of course—logged in from 198.51.100.17 as well, and those two IP addresses aren't physically located anywhere near each other. That discrepancy might even be logged. Regardless, it's in Gmail's log files, and if Alice or Bob are under suspicion, a simple subpoena for the log files (or a simple hack of the mail server) will show what's going on: these two IP addresses are showing a decidedly odd login pattern, and one of them belongs to a party under suspicion.
So where are we, circa 2010? Suppose neither Alice nor Bob were suspected of anything and they sent email. An intelligence agency monitoring assorted Internet links would see email between the two of them; if one was being targeted, it would be able to pick off the contents of the messages. If they used foldering, though, they would be much safer: there wouldn't be any incriminating unencrypted traffic. The spooks would see traffic from Alice's and Bob's IP addresses to Gmail or Outlook, but that's not suspicious. The login names and the sessions themselves are protected.
Suppose, though, that Alice and/or Bob were under suspicion by law enforcement. A subpoena would get the login IP addresses; the discrepancy would stick out like a sore thumb, and the investigation would proceed apace.
In other words, in 2010 foldering would protect against Internet eavesdropping but not against law enforcement.
The world is very different today. Following the Snowden revelations, many email providers turned on encryption for MTA-to-MTA traffic. As a consequence, our hypothetical intelligence agency can't see that email is flowing between Alice and Bob; it's all protected. If they're being investigated, of course, a subpoena will show the email—but the same sort of subpoena would also show the login IP addresses.
Where does that leave us? Today, an attacker with access to log files, either via subpoena or by hacking a mail server, can see the communication metadata whether Alice and Bob are using foldering or simply sending email. An eavesdropper can't see the communications in either case. This is in contrast to 2010, when an eavesdropper could learn a lot from email but couldn't from a foldering channel.
Conclusion: if Alice and Bob and their mail services take normal 2018 precautions, foldering adds very little security.
7 August 2018
There have been many news stories of late about potential attacks on the American electoral system. Which attacks are actually serious? As always, the answer depends on economics.
There are two assertions I'll make up front. First, the attacker—any attacker—is resource-limited. They may have vast resources, and in particular they may have more resources than the defenders—but they're still limited. Why? They'll throw enough resources at the problem to solve it, i.e., to hack the election, and use anything left over for the next problem, e.g., hacking the Brexit II referendum… There's always another target.
Second, elections are a system. That is, there are multiple interacting pieces. The attacker can go after any of them; the defender has to protect them all. And protecting just one piece very well won't help; after all, "you don't go through strong security, you go around it." But again, the attacker has limited resources. Their strategy, then, is to find the greatest leverage, the point to attack that costs the defenders the most to protect.
There are many pieces to a voting system; I'll concentrate on the major ones: the voting machines, the registration system, electronic poll books, and vote-tallying software. Also note that many of these pieces can be attacked indirectly, via a supply chain attack on the vendors.
There's another point to consider: what are the attacker's goals? Some will want to change vote totals; others will be content with causing enough obvious errors that no one believes the results—and that can result in chaos.
The actual voting machines get lots of attention. That's partly a hangover from the 2000 Bush–Gore election, where myriad technological problems in Florida's voting system (e.g., the butterfly ballot in Palm Beach County and the hanging chads on the punch card voting machines) arguably cost Gore the state and hence the presidential election.
And purely computerized (DRE—Direct Recording Electronic) voting machines are indeed problematic. They make mistakes. If there's ever a real problem, there's nothing to recount. It's crystal-clear to virtually every computer scientist who has studied the issue that DRE machines are a bad idea. But: if you want to change the results of a nation-wide election or set of elections in the U.S., going after DRE machines is probably the wrong idea. Why not? Because it's too expensive.
There are many different election administrations in the U.S.: about 10,000 of them. Yes, sometimes an entire state uses the same type of machine—but each county administers its own machines. Storing the voting machines? Software updates? Done by the county. Progamming the ballot? Done by the county. And if you want to attack them? Yup—you have to go to that county. And voting machines are rarely, if ever, connected to the Internet, which means that you pretty much need physical presence to do anything nasty.
Now, to be sure, if you are at the polling place you may be able to do really nasty things to some voting machines. But it's not an attack that scales well for the attacker. It may be a good way to attack a local election, but nothing larger. A single Congressional race? Maybe, but let's do a back-of-the-envelope calculation. The population of the U.S. is about 325,000,000. That means that each election area has about 32,500 people. (Yes, I know it's very non-uniform. This is a back-of-the-envelope calculation.) There are 435 representatives, so each one has about 747,000 constituents, or about 75 election districts. (Again: back of the envelope.) So: you'd need a physical presence in seven different counties, and maybe many precincts in each county to tamper with the machines there. As I said, it's not an attack that scales very well. We need to fix our voting machines—after all, think of Florida in 2000—but for an attacker who wants to change the result of a national election, it's not the best approach.
There's one big exception: a supply chain attack might be very feasible for a nation-state attacker. There are not many vendors of voting equipment; inserting malware in just a few places could work very well. But there's a silver lining in that cloud: because there are many fewer places to defend than 50 states or 10,000 districts, defense is much less expensive and hence more possible—if we take the problem seriously.
And don't forget the chaos issue. If, say, every voting machine in a populus county of a battleground state showed a preposterous result—perhaps a 100% margin for some candidate, or 100 times as many votes cast as there are registered voters in the area—no one will be believe that that result is valid. What then? Rerun the voting in just that county? Here's what the Constitution says:
The Congress may determine the Time of chusing the Electors, and the Day on which they shall give their Votes;
which Day shall be the same throughout the United States.
The voter registration systems are a more promising target for an attacker. While these are, again, locally run, there is often a statewide portal to them. In fact, 38 states have or are about to have online voter registration.
In 2016, Russia allegedly attacked registration systems in a number of states. Partly, they wanted to steal voter information, but an attacker could easily delete or modify voter records, thus effectively disenfranchising people. Provisional ballots? Sure, if your polling place has enough of them, and if you and the poll workers know what to do. I've been a poll worker. Let's just say that handling exceptional cases isn't the most efficient process. And consider the public reaction if many likely supporters (based on demographics) of a given candidate are the ones who are disproportionately deleted. (Could the attackers register phony voters? Sure, but to what end? In-person voter fraud is exceedingly rare; how many times can Boris and Natasha show up to vote? Again, that doesn't scale. That's also why requiring an ID to vote is solving a non-problem.)
There's another point. Voting software is specialized; it's attack surface should be low. It's possible to get that wrong, as in some now-decertified Virginia voting machines, and there's always the underlying operating system; still, if the machines aren't networked, during voting the only exposure should be via the voting interface.
A lot of registration software, though, is a more-or-less standard web platform, and is therefore subject to all of the risks of any other web service. SQL injection, in particular, is a very real risk. So an attack on the registration system is not only more scalable, it's easier.
Before the election, voter rolls are copied to what are known as poll books. Sometimes, these are paper books; other places use electronic ones. The electronic ones are networked to each other; however, they are generally not connected to the Internet. If that networking is set up incorrectly, there can be risks; generally, though, they're networked on a LAN. That means that you have to be at the polling place to exploit them. In other words, there's some risk, but it's not much greater than the voting machines.
There's one more critical piece: the vote-tallying software. Tallies from each precinct are transmitted to the county's election board; there may be links to the state, to news media, etc. In other words, this software is networked and hence very subject to attack. However: this is used for the election night count; different procedures can be and often are used for the official canvas. And even without attacks, many things can go wrong:
In Iowa, a hard-to-read fax from Scott County caused election officials initially to give Vice President Gore an extra 2,006 votes. In Outagamie County, Wis., a typo in a tally sheet threw Mr. Bush hundreds of votes he hadn't won.But: the ability to do a more accurate count the second time around depends on there being something different to count: paper ballots. That's what saved the day in 2000 in Bernalillo County, New Mexico. The problem: ``The paper tallies, resembling grocery-store receipts, seemed to show that many more ballots had been cast overall than were cast in individual races. For example, tallies later that night would show that, of about 38,000 early ballots cast, only 25,000 were cast for Mr. Gore or Mr. Bush.'' And the cause? Programming the vote-counting system:
As they worked, Mr. Lucero's computer screen repeatedly displayed a command window offering a pull-down menu. From the menu, the two men should have clicked on "straight party." Either they didn't make the crucial click, or they did and the software failed to work. As a result, the Accu-Vote machines counted a straight-party vote as one ballot cast, but didn't distribute any votes to each of the individual party candidates.Crucially, though, once they fixed the programming they could retally those paper ballots. (By the way, programming the tallying computer can itself be complex. Bernalillo County, which had a population of 557,000 then, required 114 different ballots.)
To illustrate: If a voter filled in the oval for straight-party Democrat, the scanner would record one ballot cast but wouldn't allocate votes to Mr. Gore and other Democratic candidates.
There's a related issue: the systems that distribute votes to the world. Alaska already suffered such an attack; it could happen elsewhere, too. And it doesn't have to be via hacking; a denial of service attack could also do the job of causing chaos.
The best way to check the ballot-counting software is risk-limiting audits. A risk-limiting audit checks a random subset of the ballots cast. The closer the apparent margin, the more ballots are checked by hand. "Risk-limiting audits guarantee that if the vote tabulation system found the wrong winner, there is a large chance of a full hand count to correct the results." And it doesn't matter whether the wrong count was due to buggy software or an attack. In other words, if there is a paper trail, and if it's actually looked at, via either a full hand-count or a risk-limiting audit, the tallying software isn't a good target for an attacker. One caveat: how much chaos might there be if the official count or the recount deliver results significantly different than the election night fast count?
There's one more point: much of the election machinery, other than the voting machines themselves, are an ordinary IT installation, and hence are subject to all of the security ills that any other IT organization can be subject to. This specifically includes things like insider attacks and ransomware—and some attackers have been targeting local governments:
Attempted ransomware attacks against local governments in the United States have become unnervingly common. A 2016 survey of chief information officers for jurisdictions across the country found that obtaining ransom was the most common purpose of cyberattacks on a city or county government, accounting for nearly one-third of all attacks.The threat of attacks has induced at least one jurisdiction to suspend online return of absentee ballots. They're wise to be cautious—and probably should have been that cautious to start.
Again, elections are complex. I've only covered the major pieces here; there are many more ways things can go wrong. But of this sample, it's pretty clear that the attackers' best target is the registration system. (Funny, the Russians seemed to know that, too.) Actual voting machines are not a great target, but the importance of risk-limiting audits (even if the only problem is a close race) means that replacing DRE voting machines with something that provides a paper trail is quite important. The vote-counting software is even less interesting if proper audits are done, though don't discount the utility to some parties of chaos and mistrust.
Update: No sooner did I write about how impossible results could lead to chaos than this story appeared about DRE machines in Georgia: "[i]n Habersham County's Mud Creek precinct, … 276 registered voters managed to cast 670 ballots". There were other problems, too. I suspect bugs rather than malice—but we don't really know yet.
19 July 2018
Sometimes, a government agency will post a PDF that doesn't contain searchable text. Most often, it's a scan of a printout. Why? Don't the NSA, the Department of Justice, etc., know how to convert Word (or whatever) directly to PDF? It turns out that they know more than some of their critics do. The reason? With a piece of paper, you know much more about what you're actually disclosing.
It may be possible to strip all of the metadata safely. The NSA, in fact, has a guide on how to do it. (N.B. You'll get a certificate error: many US government agencies have certificates from a US government-specific certificate authority, and outside browsers do not trust it by default. If you do not want to click through the warning messages (if you even can), I've created a mirror of it. And that's legal: by law, US government-created documents are in the public domain.) But the complexity is worrisome—and the list of things that "Sanitize Document" can delete (page 10) is quite amazing. (Sanitizing Word is harder.)
So why is this an issue? Well, people still get it wrong. And it's not a new problem; Bruce Schneier wrote about it years ago and said it was barely newsworthy then. Even, yes, Federal prosecutors can get it wrong.
Printing things onto paper and scanning it is ugly and not as functional, but it does prevent this sort of error.
And there are two more subtle points. First, sensitive networks are often air-gapped from the Internet. Air-gapping—having no physical connection whatsoever to the outside world—is a strong defense, though far from perfect. Getting a PDF file from an air-gapped network to the Internet can be done, but it's painstaking and—if done incorrectly—can expose the sensitive network to attack from the outside. Again, we know how to do this—follow NSA procedures on the sensitive network, burn a CD-R (not a CD-RW) with just the PDF, and carry that to an outside machine—but there's still the chance for human error. And there's one more threat…
What is really in a PDF, and how do you know? Is it just what you see on the screen? Even apart from malice or stupidity, e.g., setting the font color to white, there's a hidden danger: what did the PDF creation or redaction program actually write out? Remember that PDFs are containers; there can be nominally empty sections of the file. What fills those bytes? How do you know, and what is your assurance?
Many years ago, while I was at AT&T, I was working on an important internal project. Someone sent out a Word document with some very sensitive details. Unlike everyone else on the project, I was running an open source OS instead of Windows, so I couldn't just fire up Word. Instead, I used an open source tool to view the file—and I saw something different. The person who created the file had two documents open in Word, and what was nominally empty space was filled with whatever garbage was lying around RAM at the time: in this case the body of an unrelated letter he was sending to someone outside the company. The tool I used to view the file wasn't perfect, so it printed the wrong part of the Word document. The odds are high, of course, that the recipient of that letter received some of our project plans, but if that person did the usual—run Windows and Word—it would never appear, and our corporate secrets would be safe.
The NSA and the Department of Justice, of course, have serious adversaries, ones who won't take a file at face value. Unless you have a lot of confidence in the PDF redaction program, you're much better off scanning a printed version. Sure, there are still some risks, e.g., steganography based on kerning or the like, but they're much less than with a PDF.
So: DoJ has its reasons for sending out these difficult-to-use PDFs. You may not like it—I don't like it—but they're doing it out of caution, not ignorance or stupidity.
14 May 2018
Purists have long objected to HTML email on aesthetic grounds. On functional grounds, it tempts too many sites to put essential content in embedded (or worse yet, remote) images, thus making the messages not findable via search. For these reasons, among others, Matt Blaze remarked that "I've long thought HTML email is the work of the devil". But there are inherent security problems, too (and that, of course, is some of what Matt was referring to). Why?
Although there are no perfect measures for how secure a system is, one commonly used metric is the "attack surface". While handling simple text email is not easy—have you ever read the complete specs for header lines?—it's a relatively well-understood problem. Web pages, however, are very complex. Worse yet, they can contain references to malicious content, sometimes disguised as ads. They thus have a very large attack surface.
Browsers, of course, have to cope with this, but there are two important defenses. First, most browsers check lists of known-bad web sites and won't go there without warning you. Second, and most critically, you have a choice—you can only be attacked by a site if you happen to visit it.
With email, you don't have that choice—the bad stuff comes to you. If your mailer is vulnerable—again, rendering HTML has a large attack surface—simply receiving a malicious email puts you at risk.
4 May 2018
I've been thinking about Facebook's new dating app. I suspect that it has the potential to be very good—or very, very bad.
Facebook is a big data company: they make their money because they can very precisely figure out what users will respond to. What if they applied that to online dating? Maybe it will look more like other dating apps, but remember how much Facebook knows about people. In particular, at this point it has many years of data not just on individuals, but on which of its users have partnered with which others, and (to some extent) on how long these partnerships last. That is, rather than code an algorithm that effectively says, "you two match on the following N points on your questions and answers", Facebook can run a machine learning algorithm that says "you two cluster with these other pairs who went on to serious relationships." (Three times already when typing this, my fingers typed "dataing" instead of "dating". Damn, make that four!)
So what's wrong? Isn't that a goal of a dating app? Well, maybe. The thing about optimization is that you have to be very careful what you ask for—because you may get exactly that, rather than what you actually wanted. What will Facebook's metric for success be? A couple that seriously pairs off, e.g., moves in together and/or marries, fairly soon? A couple that starts more slowly but the relationship lasts longer? A bimodal distributon of quick flameouts and long-term relationships? (Facebook says they're not trying for hookups, so I guess they don't need to buy data from Uber.)
There are, of course, all of the usual issues of preexisting human biases being amplified by ML algorithms, to say nothing of the many privacy issues here. I think, though, that the metric here is less obvious and more important. What is Facebook trying to maximize? And how will they profit from the answers?
2 May 2018
There have been a number of mentions of an attack that Eran Tromer found against Ray Ozzie's CLEAR protocol, including in Steven Levy's Wired article and on my blog. However, there haven't been any clear descriptions of it.
Eran has kindly given me his description of it, with permission to
publish it on my blog. The text below is his.
A fundamental issue with CLEAR approach is that it effectively tells law enforcement officers to trust phones handed to them by criminals, and give such phones whatever unlock keys they request. This provides a powerful avenue of attack for an adversary who uses phones as a Trojan horse.
For example, the following "man-in-the-middle" attack can let a criminal unlock a victim's phone that reached their possession, if that phone is CLEAR-compliant. The criminal would turn on the victim phone, perform the requisite gesture to display the "device unlock request" QR code, and copy this code. They would then program a new "relay" phone to impersonate the victim phone: when the relay phone is turned on, it shows the victim's QR code instead of its own. (This behavior is not CLEAR-compliant, but that's not much of a barrier: the criminal can just buy a non-compliant phone or cobble one from readily-available components). The criminal would plant the relay phone in some place where law enforcement is likely to take keen interet in it, such as a staged crime scene or near a foreign embassy. Law enforcement would diligently collect the phone and, under the CLEAR procedure, turn it on to retrieve the "device unlock request" QR code (which, unbeknownst to them, is actually the victim's code). Law enforcement would then obtain a corresponding search warrant, retrieve the unlock code from the vendor, and helpfully present it to the relay phone — which will promptly relay the code to the criminal, who can then enter the same code into the victim's phone. The victim's code, upon receiving this code, will spill all its secrets to the criminal. The relay phone can even present law enforcement with a fake view of its own contents, so that no anomaly is apparent.
The good news is that this attack requires the criminal to go through the motions anew for for every victim phone, so it cannot easily unlock phones en masse. Still, this would provide little consolation to, say, a victim whose company secrets or cryptocurrency assets have been stolen by a targeted attack.
It it plausible that such man-in-the-middle attacks can be mitigated by modern cryptographic authentication protocols coupled with physical measures such as tamper-resistant hardware or communication latency measurements. But this is a difficult challenge that requires careful design and review, and would introduce extra assumptions, costs and fragility into the system. Blocking communication (e.g., using Faraday cages) is also a possible measure, though notoriously difficult, unwieldy and expensive.
Another problem is that CLEAR phones must resist "jailbreaking", i.e., must not let phone owners modify the operating system or firmware on their own phones. This is because CLEAR critically relies on users not being able to tamper with their phones' unlocking functionality, and this functionality would surely be implemented in software, as part of the operating system or firmware, due to its sheer complexity (e.g., it includes the "device unlock request" screen, QR code recognition, crytographic verification of unlock codes, and transmission of data dumps). In practice, it is well-nigh impossible to prevent jailbreaking in complex consumer devices, and even for state-of-the-art locked-down platforms such as Apple's iOS, jailbreak methods are typically discovered and widely circulated soon after every operating system update. Note that jailbreaking also exacerbates the aforementioned man-in-the-middle attack: to create the relay phone, the criminal may pick any burner phone from a nearby store, and even if such phones are CLEAR-compliant by decree, jailbreaking them would allow them to be reprogrammed as a relay.
Additional risks stem from having an attacker-controlled electronics operating within law enforcement premises. A phone can eavesdrop on investigators's conversations, or even steal private cryptographic keys from investigator's computers. For examples of the how the latter may be done using a plain smartphone or hardware hidden that can fit in a customized phone, see http://cs.tau.ac.il/~tromer/acoustic, http://www.cs.tau.ac.il/~tromer/radioexp, and http://www.cs.tau.ac.il/~tromer/mobilesc. While prudent forensics procedures can mitigate this risk, these too would introduce new costs and complexity.
These are powerful avenues of attack, because phones are flexible devices with the capability to display arbitrary information, communicate wirelessly with adversaries, and spy on their environment. In a critical forensic investigation, you would never want to turn on a phone and run whatever nefarious or self-destructing software may be programmed in it. Moreover, the last thing you'd do is let a phone found on the street issue requests to a highly sensitive system that dispenses unlock codes (even if these requests are issued indirectly, through a well-meaning but hapless law enforcement officer who's just following procedure).
Indeed, in computer forensics, a basic precaution against such attacks is to never turn on the computer in an uncontrolled fashion; rather, you would extract its storage data and analyze it on a different, trustworthy computer. But the CLEAR scheme relies on keeping the phone intact, and even turning it on and trusting it to communicate as intended during the recovery procedure. Telling the guards of Troy to bring any suspicious wooden horse into the city walls, and to grant it an audience with the king, may not be the best policy solution to "Going Greek" debate.