If Open Data are so good, why aren't all Public Data already open?

(this page is part of my Open Data, Open Society report. Please follow that link to reach the introduction and Table of Content, but don’t forget to check the notes to readers!)

Much of the current civic activity around Open Data still happens in the conditions described in a blog post from Mash the State: the great independent civic websites using public data are mostly having to scrape and steal it. Very few councils will even acknowledge them, let alone co-operate with them."

Sometimes this happens because of issues which are much more general than PSI availability, from limits to freedom of speech to lack of affordable Internet connections and other physical infrastructures. Very often, however, at least in the EU, PSI data aren’t available for a combination of much less serious reasons. The Danish addresses study, for example, also indicates that in the Central Business Register (CVR) and the utilities sector, usage of the official addresses is still limited due to technical, traditional and legislative barriers. Here’s a summary of the most common reasons why PSI data aren’t open yet:

  • Pure and simple lack of real awareness about the importance and benefits of Open Data is still the norm in many government organizations (even if they should digitize all their procedures and documents anyway for their own good or to comply with some local e-Government directive, if they haven’t done it yet). Side by side with ignorance, lack of explicit guidelines on data reuse from upper levels and fear to lose control are powerful motivators to do nothing, hence maintaining data locked.
  • Legal barriers, or (even worst) serious confusion about the legal status of data. This happens when data come under restrictive or unclear terms of use,or simply without any terms of use at all, which is even worse. Under current legislation and international treaties, the default status of any creative work, including PSI data, is “All rights reserved” for many decades, so no re-use is possible without explicit authorization. But when datasets were produced assembling data by many different public and private bodies without a clear single policy (not an unusual case), even figuring out who is entitled to authorize reuse can become a costly legal procedure.
  • Fear of embarrassment deriving from publishing low quality material: “we can’t publish this data, because there are errors in it” (Zijlstra, Business case for PSI). Torkington reports the same issue from New Zealand: “serious problems exist in some datasets. Sometimes corners were cut in gathering the data, or there’s a poor chain of provenance for the data so it’s impossible to figure out what’s trustworthy and what’s not."
  • Last but not least, money. We explained that raw data are like soil: a generic foundation upon which wealth is created in many different ways which are basically impossible to predict. The “dark side” of this power is that the administrations that first see the extra money generated by Open Data almost never are the same who created and should have opened them in the first place. This makes quite difficult, for a public body without other sources of external funding and no policy imposed from the top, to see anything beyond its own real or perceived short term benefits coming from selling data, no matter if much more public money will be spent or not gained in the big picture. Even when data are already available at no charge to the public really opening them, that is deciding the proper license, getting approval for it and reformatting everything for online publication in the right formats is an extra expense that is very often perceived as not easily affordable or justifiable