Is Web Scraping Legal?

Chris
Apr 19, 2018
6 min read

Updated: Jan 15, 2020

The Ninth Circuit Court of Appeals is currently hearing oral argument in HiQ v. LinkedIn, a case that could have far-reaching implications for Internet businesses. The case is about scraping.

“Scraping” broadly refers to collecting electronic data from a third party without their permission. Typically, we’re talking about software used to automatically harvest and

sort large amounts of data from public online sources like websites. However, from a legal perspective, the way in which the data is collected doesn’t usually make much difference. Whether you use automated scraping software or employ a team of human scrapers, the result is the same: Company A visits Company B’s website and copies the information it finds there for its own purposes.

Web scraping is widespread on today’s Internet. Companies do it for countless reasons. They might be aggregating the data and repackaging it as part of their own product or services. Other businesses may use scraping to study the market or their competitors for their own internal research.

In the case of the appellant currently before the 9th Circuit, hiQ Labs is a data science company that develops tools to help corporate HR departments keep tabs on their workforces. Included in the information hiQ provides to its clients is data scraped from their employees' public LinkedIn profiles. On May 23, 2017, LinkedIn sent hiQ a cease and desist letter demanding that they stop the practice. Two weeks later, hiQ filed a lawsuit in the Northern District of California, asking the court for a declaratory judgment that scraping LinkedIn’s data was lawful.

On August 14, 2017, the judge granted hiQ’s motion for a temporary restraining order preventing LinkedIn from blocking hiQ’s access to their site while the case was pending, the decision which LinkedIn then appealed to the Ninth Circuit.

In the meantime, scraping has taken on a new political dimension. Mark Zuckerberg’s awkward two-day testimony before Congress last week was necessitated largely by the accusation that Facebook has failed to protect its users’ data from collection by predatory third parties such as Cambridge Analytica.

So what exactly are the legal issues with scraping? For businesses that rely on collecting data from third-party sources, what are the potential sources of liability and how can they minimize them?

Claims in scraping cases can take a number of forms, including copyright infringement, breach of contract, violation of the Computer Fraud and Abuse Act, and even tort theories like trespass to chattels.

Copyright

The most obvious issue with scraping might seem to be copyright. After all, copyright law gives authors the exclusive right to copy, distribute, and display their works (including, for example, websites). Scraping, at a minimum, usually involves copying some kind of web content. Where a party harvesting the content then turns around and includes it as part of their own website or other product, the copyright implications loom even larger.

The problem with using copyright law to curtail collection of data is that data itself is not subject to copyright. Copyright protects creative expression, not facts or information or ideas. Therefore, scraping that involves copying the way a website arranges or presents information could theoretically give rise to a viable copyright claim, but copying the information itself is unlikely to constitute infringement under U.S. law. Even in cases where some arguably copyrightable elements are involved, the scraping could be subject to a fair use defense as long as the copying of those elements was simply incidental to collecting the underlying data.

An additional complication arises with respect to user-generated content, such as the public LinkedIn profiles at issue in the hiQ case, since the website may not even own the copyright in the content being scraped.

Breach of Contract

Given the gaps that intellectual property law leaves with respect to pure data, many website owners turn to contract law to protect what they see as their proprietary content. Most websites, apps, and other online services are subject to some sort of Terms of Use or Terms of Service (“TOU”/“TOS”). These are just names for a type of contract, one between the website owner and anyone who manifests their theoretical agreement by viewing the site or using the service.

For years, it has been standard for many TOUs to include a term aimed at preventing scraping. The lawyers who draft these TOUs have attempted to approach the issue from a number of angles. The TOU might prohibit automated software programs from browsing the site or might restrict the ways in which a site’s data is used or might even require all users to agree not to launch a competing service.

Of course, no one actually reads these “browsewrap” contracts, and the agreement they supposedly reflect is largely a legal fiction. TOUs are generally enforceable up to a point, but courts occasionally view them with suspicion, especially where they contain unusual or onerous terms or where the agreement is not sufficiently brought to users’ attention (see In re Zappos.com, Inc., Customer Data Sec. Breach Litig., 893 F. Supp. 2d 1058 (D. Nev. 2012). Reliance on TOUs is even more dicey with respect to automated scraping software. Even if a bot is capable of understanding the terms of an agreement, it probably lacks the legal capacity to enter into a binding contract.

Even if an anti-scraping term in a TOU is enforceable, the damages are usually going to be difficult to assess. Therefore, website owners have increasingly looked further afield for legal theories to battle scraping.

The Computer Fraud and Abuse Act

The federal Computer Fraud and Abuse Act (CFAA) prohibits obtaining information from a computer system without authorization. [Check out last month’s article on the CFAA] Many states have similar laws originally designed to combat “hacking.” Violation of the CFAA carries both criminal and civil penalties.

The CFAA has become the central law involved in scraping disputes. The theory is that, when access to a website violates the website’s TOU, that access is “without authorization.” In the 2016 decision Facebook, Inc. v. Power Ventures, Inc, the Ninth Circuit ruled in Facebook’s favor on a CFAA claim, finding that the scraping of Facebook data by social media aggregator Power Ventures was clearly prohibited under Facebook’s TOU and therefore a potential CFAA violation (844 F.3d 1058 (9th Cir. 2016)). It is this ruling that LinkedIn cited in its cease and desist letter to hiQ Labs, and it is the precedent that the Ninth Circuit is currently weighing.

The circumstances in the Facebook and LinkedIn cases are distinguishable in some ways. For example, Power Ventures scraped data from private Facebook profiles (with permission from the users), whereas hiQ’s scraping was limited to public profiles. Whether the Court of Appeals will be swayed by those differences or will instead bolster or overturn its prior decision remains to be seen.

Trespass to Chattels

Before Facebook v. Power Ventures brought the CFAA to the forefront of the anti-scraping toolbox, website owners often fell back on state personal-property statutes and common-law tort claims such as “trespass to chattels.”

Before the Internet, trespass to chattels was a musty and seldom-invoked tort theory involving acts like riding your neighbor's horse without their permission. However, over the last few decades, trespass to chattels has found a renewed vigor protecting plaintiffs’ electronic resources when no other cause of action seems to fit.

In 2000, the claim was successfully applied to web scraping in eBay, Inc. v. Bidder's Edge, Inc., 100 F. Supp. 2d 1058, 1069–70 (N.D. Cal. 2000). Since Bidder’s Edge, though, courts have treated the theory with skepticism unless the plaintiff can identify some tangible damage connected to the defendant’s “trespass.” (see Intel Corp. v. Hamidi, 71 P.3d 296 (Cal. 2003)).

The upshot for companies trying to prevent scraping of their data (and, conversely, for companies concerned about liability in connection with their own scraping activities) is that scraping that involves copying a website’s expressive content may be actionable as copyright infringement. The law with respect to collection of pure data, on the other hand, remains murkier. If the activity is expressly prohibited by a valid TOU, it could constitute breach of contract or even a CFAA violation, but the issue is far from settled.

If you suspect that your company’s content or data is being used unlawfully or have concerns that your bots may be exposing you to liability, schedule a free consultation with Knowmad Law or contact us at info@knowmad.law or 831-275-1401 for further information.