Useful Links

Recent Posts


The Battle of Midway

6 June 2014

This is the 72nd anniversary of the Battle of Midway, June 4-7, 1942. The American victory is often celebrated as a victory attributable to cryptanalysis: U.S. intelligence had broken the main Japanese naval code. It's important to realize, though, that it wasn't just cryptanalysis. In The Codebreakers, Kahn wrote:

Unknown to Yamamoto, his order was also picked up by the Allied listening posts that ringed the Pacific. Its extreme length indicated its importance, and Fabian's unit ... may have first suggested that it might be an operations order.
A US Navy history of the battle says
At the same time, Japanese Navy communication activity in the vicinity of Japan dramatically increased...
Yes—cryptanalysis was the most important piece of intelligence available. However, the cryptanalysts knew what to work on because of metadata.

Machine Learning and the Fourth Amendment

4 June 2014

A recent paper by myself and some colleagues (Prof. Renée Hutchins of the University of Maryland Law School, Prof. Tony Jebara of the Columbia University Computer Science Department, and Sebastian Zimmeck, a Columbia CS PhD student who is also an attorney) shows how to use computer science, and in particular a field called machine learning, to answer two very specific questions in Fourth Amendment law: is there some scientific basis for acceping the mosaic theory; and if so, at what point is it reached? The paper has drawn some press coverage; it's also drawn some criticism from Prof. Orin Kerr of the George Washington University Law School. Not surprisingly, we stand by our conclusions. (I should note that we also think that the article about it is a fair representation of what we said.)

What is the mosaic theory? Basically, it's the concept that while a single observation, say of someone's location, might not be a search under Fourth Amendment law, a whole series of observations collectively might be one. Prof. Kerr explained it this way:

Under the mosaic theory, searches can be analyzed as a collective sequence of steps rather than as individual steps. Identifying Fourth Amendment searches requires analyzing police actions over time as a collective "mosaic" of surveillance; the mosaic can count as a collective Fourth Amendment search even though the individual steps taken in isolation do not.
The question is why this is significant. For that, we turn to the D.C. Circuit's opinion in United States v. Maynard:
The sequence of a person's movements can reveal still more; a single trip to a gynecologist's office tells little about a woman, but that trip followed a few weeks later by a visit to a baby supply store tells a different story. A person who knows all of another's travels can deduce whether he is a weekly church goer, a heavy drinker, a regular at the gym, an unfaithful husband, an outpatient receiving medical treatment, an associate of particular individuals or political groups—and not just one such fact about a person, but all such facts.
The privacy invasion, then, comes not from the individual observations themselves, but from what can be inferred from them:
The whole of one's movements over the course of a month is not constructively exposed to the public because, like a rap sheet, that whole reveals far more than the individual movements it comprises.
In other words, the mosaic theory asserts that the issue is not just the series of observations but also what else they imply:
the whole of one's movements is not exposed constructively even though each individual movement is exposed, because that whole reveals more—sometimes a great deal more—than does the sum of its parts.
This is the definition of the mosaic theory that we're using.

Do mosaics exist, as a scientific concept if not necessarily a legal one? Our review of the academic literature shows that the answer is "yes". Experimenters have found that by using machine learning, they could predict things like ethnicity and partnered status just from location data. Whether or not it can predict "trips to the psychiatrist, the plastic surgeon, the abortion clinic, the AIDS treatment center, the strip club, the criminal defense attorney, the by-the-hour motel, the union meeting, the mosque, synagogue or church, the gay bar and on and on" (to quote Justice Sotomayor's concurrence in United States v. Jones) is still an open question, but the basic idea is sound. That is our first major conclusion: conceputally at least, automated tools can make predictions that go well beyond the directly observed data, thus validating the underpinning of the theory.

The second question we addressed was posed by Prof. Kerr himself:

For example, what is the standard for the mosaic? How should courts aggregate conduct to know when a sufficient mosaic has been created?
Again, the precise questions are not answered in the technical literature; however, there is sufficient data to let us draw a tentative line at about one week. A paper "Limits of Predictability in Human Mobility", by Song, Qu, Blumm, and Barabási, is one source for our conclusion: they found that, not surprisingly, most people's weekday schedules are predictable; more suprisingly, they wrote:
we did not find significant changes in user regularity over the weekends compared with their weekday mobility, which suggested that regularity is not imposed by the work schedule, but potentially is intrinsic to human activities.
In summary, the combination of the empirically determined user entropy and Fano's inequality indicates that there is a potential 93% average predictability in user mobility, an exceptionally high value rooted in the inherent regularity of human behavior. Yet it is not the 93% predictability that we find the most surprising. Rather, it is the lack of variability in predictability across the population.
Sadilek and Krumm's Far Out: Predicting Long-Term Human Mobility said it even more clearly:
[The location of a person] is likely to be a good predictor of [the person's] location exactly one week from now.
In other words, where you are today, whether today is a work day or not, is an excellent predictor of where you'll be a week hence. If the issue is what can be learned beyond the direct observations, a week's worth of data may suffice. This is not firm—again, we need to do more precise experiments aimed at legally interesting questions—but it provides some rational basis for setting a limit. The Massachusetts Supreme Judicial Court held held that two weeks of monitoring "was more than sufficient to intrude upon the defendant's expectation of privacy safeguarded by art. 14.", but they explicitly declined to say what the limit should be. One week may not always be the answer, but there are some grounds, based on both scientific evidence and intuitive reasoning, for adopting our limit for now, pending insights from further research.

Have we settled the question of the mosaic theory? No, absolutely not. Prof. Kerr posed several other difficult questions, not the least of which is whether or not courts should adopt the mosaic theory as a legal matter. That said, we think have answered two of the thornier ones: whether or not mosaics in fact exist, and if so when they occur.