Web Scraping Is Vital


Seriously. Even if you have NO idea what it is. Keep reading:

Web Scraping Is Vital /img/web-scraping-is-not-a-crime.jpg

Web scraping means collecting data from websites automatically, by writing programs, sometimes very simple ones, that do all the job (I have written several tutorial myself on how to do it. It is what Google, and every other search engine, do every second, if they must provide any service. In general, it is hard to overestimate the importance of this activity for students, researchers, businesses, watchdogs and “fact checkers” of any kind.

Journalists, for example, have used scrapers to collect data that rooted out extremist cops, tracked lobbyists, and uncovered an underground market for adopted children.

Web scraping is essential for democracy, and human rights in general. This is why The Markup filed an amicus brief in a case before the U.S. Supreme Court this week that threatens to make scraping illegal.

The case itself is about other questions, not directly about scraping: someone was prosecuted under the american Computer Fraud and Abuse Act (CFAA), which prohibits unauthorized access to a computer network such as computer hacking. Since the accused was allowed to access the database for work, the question is whether the court will broadly define his troubling activities as “exceeding authorized access”_ to extract data, which is what would make it a crime under the CFAA.

The problem is that _“such a definition could also affect journalists. Or, as Justice Neil Gorsuch put it… lead in the direction of perhaps making a federal criminal of us all”.

If that line of thought enters jurisprudence, websites that don’t want to be scraped (even if when it would be in the public intereset, and perfectly legal until today) could then just “change the fine print on their terms of service to label the aggregation of that information unauthorized.”

And the U.S. Supreme Court, depending on how it rules, could decide that violating those terms of service is a crime under the CFAA. Even when it is normal news gathering by journalists.

What sort of work is at risk?

To have an idea of how bad this could be, check the Markup article to know, quoting again, how web scraping has been successfully used, among other things, to “collect data that rooted out extremist cops, tracked lobbyists, and uncovered an underground market for adopted children”.

