Orin Kerr recently blogged about a 9th Circuit decision that held that scraping a public web site (probably) doesn’t violate the Computer Fraud and Abuse Act (CFAA). Quoting the opinion (and I copied the quote from that blog post):
For all these reasons, it appears that the CFAA’s prohibition on accessing a computer "without authorization" is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA.On its surface, it makes sense—you can’t steal something that’s public—but I think the simplicity of the rule is hiding some very deep questions. One, I think, can most easily be expressed as "what is the cost of the ’attack’"? That is, how much effort must someone expend to get the data? Does that matter? Should it?
Let’s start with the Court’s example: it is hacking (more precisely, a CFAA violation) if someone bypasses a username and password requirement. But what is the role of the username and password? Is it intended as an actual barrier or as a sign saying "Authorized Personnel Only"? Does it matter if the site has trivial password limitations, e.g., 2 digits only?
More concretely, imagine a badly coded website, where you’re prompted for a login and password if you visit the home page, but not if you go directly to some internal page. (For the record, it’s really easy for a neophyte to implement something this badly.) Is that a suitable barrier or warning? What if someone else links to an internal page (as I’ve done, above, to a blog post)? Is clicking on that link, and thus never even seeing the password prompt, a CFAA violation? It’s hard to see how the answer could be "yes", but if you think that that example is too contrived, what about a misconfigured firewall that inadvertently permits access to the interior of a corporate net—is someone who stumbles on that access liable? That’s a very subtle kind of error, and one that’s easy to make.
There are, of course, other forms of access control. One of the simplest is address-based access control: only certain IP addresses may access a certain resource. It’s long been known to be weak, but it’s still used quite frequently, especially on Intranets. Is this a "generally applicable rule"? Is there a difference between an an address rule that says "these three IP addresses may have access" and "anyone but these three may have access"? Mathematically, they’re identical, and it’s actually not harder to specify the latter than the former; one doesn’t have to write 4,294,967,293 separate "allow" rules. Does it matter if a blocked party changes their IP address to evade the blockage? What if their ISP happens to change it, as some consumer ISPs do quite regularly?
I should note that one common use for such restrictions is geoblocking: excluding certain locations from access to content. This may be major league baseball videos (they’re blacked out in areas where there is a local TV channel that carries those games), movies for which a site does not have a world-wide license, and even online gambling if it’s in violation of local laws (as in the US). If someone uses a VPN to evade such a restriction, is that a CFAA offense? What if they use Tor, not to evade the restriction but because they value their privacy but just happen to gain access?
There have also been systems that relied on, more or less, just a username or equivalent, and not a password. One of the best-known cases is that of Andrew "weev" Auernheimer; he and a colleague noticed that a database of AT&T customers could be accessed just by knowing the ICCID from an iPad’s SIM. For that particular situation, it was possible to enumerate the namespace. Was that hacking? In a controversial move, the Justice Department prosecuted; hs conviction was eventually overturned on rather legalistic grounds, and the underlying CFAA issue was never squarely addressed.
Does it matter how hard it is to enumerate the namespace? Suppose the account numbers were sequential, in which case given a single number it’s trivial to find the others. What if the odds on a random number being valid were 1:1,000,000? 1:1,000,000,000,000? Does it matter? Should it?
What all of these scenarios have in common is that they reflect a different degree of effort to gain access to some resource. Sometimes, the effort necessary is known to or knowable by the defender; other times, it may not be. My questions, then, are these:
- Does effort matter?
- Should it?
- How do we define effort? Does allowable effort change over time, as technology improves?
I don’t know the answers to any of these questions, but I think that they’re important. Some situations, e.g., intentionally working around a password requirement, are pretty clearly (all other things being equal, which they may not be; see Orin’s blog post for that) on the wrong side of the law. An address block where a "access unauthorized" message is displayed may also be clear, which suggests that the real issue of access control is intent and warning. But even there, there are numerous subtleties that are beyond the control of the defender.
Consider a situation where a firewall implements an address-based access control mechanism. Furthermore, the firewall is configured to return an ICMP Administratively Prohibited packet when it sees an unauthorized IP address attempting to connect. How will the requester’s software display the error? Will it even know about the prohibition, as opposed to the simple fact that the destination isn’t reachable? Does the exact language of the technical specification matter? It says
A Destination Unreachable message that is received MUST be reported to the transport layer. The transport layer SHOULD use the information appropriatelyIn standards-speak, "SHOULD" is defined:
This word or the adjective "RECOMMENDED" means that there may exist valid reasons in particular circumstances to ignore this item, but the full implications should be understood and the case carefully weighed before choosing a different course.In other words, perhaps some network implementor did not pass on the code, in which case the application couldn’t know.
We seem, then, to be stuck. The court’s decision seems to imply the warning aspect as crucial, but sites can’t always warn people. And why is a password more of a warning than an explicit communication, as was in fact the case here?