Is Web Scraping Legal?: An Update

Chris
Jan 14, 2020
3 min read

Updated: Jan 15, 2020

In April, 2018, we wrote about a dispute between LinkedIn and data analytics company HiQ Labs and the legal implications for the widespread practice of “scraping” information from third-party websites. In September, 2019, the Court of Appeals for the Ninth Circuit weighed in. Here’s where things stand:

Does the CFAA prohibit scraping?

One of the central questions raised in LinkedIn’s appeal to the 9th Circuit was whether the Computer Fraud and Abuse Act provides a remedy against data scraping of the sort conducted by HiQ. The CFAA imposes civil and criminal penalties for “access[ing]” a computer system “without authorization.” 18 U.S.C. § 1030(a)(2)(C). After analyzing both the wording and legislative history of the statute, as well as analogous language in the Stored Communications Act, the court concluded that the CFAA does not apply “when a computer network generally permits public access to its data.” HiQ Labs, Inc. v. LinkedIn Corp., 938 F. 3d 985, 1004 (9th Cir. 2019).

The opinion explicitly left the door open to other causes of action, such as “state law trespass to chattels claims … copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract, or breach of privacy.” In our previous article, we discussed some of those theories and their shortcomings. The advantage of a CFAA claim was that it punished unauthorized access, regardless of whether the scraper copied any proprietary material or caused any damage to the website. Now, the 9th Circuit appears to have definitively blunted that sword, at least with respect to data that is truly “publicly accessible.”

What if you need to log in?

Interestingly, the 9th Circuit distinguished rather than overturning its seemingly contradictory precedent. In its prior opinion in Facebook, Inc. v. Power Ventures, Inc., the court had affirmed liability under the CFAA against Power Ventures, a social networking aggregator which was harvesting data from its users’ Facebook profiles. 844 F.3d 1058 (9th Cir. 2016). In both the Power Ventures and HiQ cases, the platforms had sent cease-and-desist letters demanding that the scraper knock it off. In HiQ, unlike Power Ventures, the court found that this was insufficient to render subsequent access “without authorization” for CFAA purposes. Yet, it insisted there was nothing inconsistent about the two rulings, since the data that Power Ventures collected “was protected by Facebook’s username and password authentication system.”

It’s a curious distinction. After all, it was Facebook’s users, not Facebook itself, that gave Power Ventures access to the data in the first place. It’s not clear why Facebook has the right to “revoke” authorization explicitly granted by its users, whereas LinkedIn has no such right with respect to data access implicitly authorized by virtue of its comparative availability. Given that access to data can be restricted to varying degrees and in a variety of ways, it remains to be seen whether the bright line that the court attempts to draw between “public” and “restricted” websites is sustainable. For now, the HiQ decision seems to imply that the legality of scraping under the CFAA may hinge on whether or not access to the scraped data requires logging into the website.

What about copyright?

LinkedIn did not appeal and the 9th Circuit did not discuss any copyright claim with respect to HiQ’s use of the platform’s content. As previously discussed, copyright law can be an unwieldy tool when it comes to protecting website data, both because the data per se may not be subject to copyright and because enforcing the copyright in court requires registration with the Copyright Office, a problematic exercise when it comes to dynamic website content.

However, another recent case from within the 9th Circuit’s jurisdiction illustrates how copyright can still be highly relevant in scraping cases. In 2017, Ticketmaster brought suit in the Central District of California against various parties that it accused of accessing its website using bots specifically designed to circumvent the website’s technical barriers. In ruling on a motion to dismiss, the judge found that Ticketmaster had stated a plausible claim copyright infringement, not with respect to any scraped data but with respect to the website code with which bots allegedly interacted.

Unlike purely factual information, such as a LinkedIn member’s place of employment, computer software has long been recognized as creative expression subject to copyright. While Ticketmaster did not claim to have any direct evidence that its software had been copied without authorization, the court concluded that such copying could be reasonably inferred since otherwise the bots at issue could not have been programmed in the manner alleged.

The Ticketmaster decision seems consistent with the lessons of HiQ to the extent that it suggests that if technical barriers must be surmounted to obtain access to website’s data, it is more likely that scraping will entail some legal risk.

If you suspect that your company’s content or data is being used unlawfully or have concerns that your bots may be exposing you to liability, schedule a free consultation with Knowmad Law or contact us at info@knowmad.law or 831-275-1401 for further information.