Why data are important

(this page is part of my Open Data, Open Society report. Please follow that link to reach the introduction and Table of Content, but don’t forget to check the notes to readers!)

First of all, what are data? Borrowing from, and rearranging, a definition attempted by Peter Murray-Rust as summarized on the Digital Curation Blog, by data we mean single pieces of information of every nature (from pictures to numbers, textual definitions, maps, audio…) that:

  • are direct descriptions of facts (e.g the path followed by a river, as drawable on a map, average temperatures in some city, tax brackets in some country…) or are closely related to facts, and as such are not copyrightable
  • are reproducible without ambiguities when the method used to generate them is known in all its details. An aerial photograph is data because two identical cameras taking a shot from the same point in the same moment with the same settings would produce (to all practical purposes) the same picture, whereas oral description of the same scene by two individuals can be very different. Different people may, and very often will in the real world, produce different data to describe the same phenomena, since besides being different, they may use use different methods and starting hypotheses to generate them.
  • are parts, or can be immediately used as parts, of larger information or knowledge structures
  • have (almost always) much more meaning and value when linked among them and completed by metadata. Metadata are simply data about other data, rather than about some facts. The day when a collection of digital pictures was taken would be a common metadata for all those digital pictures
  • can, due to all the characteristics above, be expressed and stored in digital formats, even when they weren’t originally generated in that form, and once digital can be processed by computers directly in those formats, to build other data, find metadata and take decisions.

An Economist report on data in February 2010 calls our age “the age of Big Data”, because every year individuals, businesses and Public Administrations create (and rely on) amounts of digital data that are orders of magnitudes bigger than a few years ago. Data are digital when, whatever their nature is, they can be encoded as series of digits, that is bits representing ones and zeroes that can be stored in any kind of bit container, from computer hard disks to DVDs, floppy disks, SSD memory cards and so on, and can be directly transmitted in the same format, that is as sequences of bits, across all kinds of telecommunication networks.

Digital technologies have made terribly easy and cheap to generate, store and (when there’s a will to do it) publish data. Quick and effective exploitation of digital data is every year more important for any organization at every level, from cost savings and transparent reporting to decision making. This is true partly because organizations must make decisions anyway, and today those decisions are based on data that are digital, and partly because digital data are so many that it’s easier than ever to overlook, forget or misrepresent something. The same applies to single citizens whenever they must make important, well informed decisions, be it in the voting booth or in their work.

The same Economist Report sums the importance of data saying that they have become “an economic raw input almost on par with capital and labour”. The Digital Britain Final Report recognizes data as “an innovation currency… the lifeblood of the knowledge economy”. If all this is true, and it’s hard to deny it is, giving data is like giving stimulus money, or at least sharing great lobbying power, but at a much smaller cost for taxpayers. Starting from these facts, this report looks at how much the value of data increases when they circulate and can be reused without restrictions.

How much are PSI data worth? It is hard if not impossible, for reasons that will be explained later, to give answers that are really complete, accurate and reliable. This said, here are a few numbers. According to a MEPSIR study conducted by the European Commission in 2006, the overall market size for PSI in the EU Member States and Norway was estimated at EURO 27 billions. A previous study (PIRA) had found in 2000 an ‘investment value' (public sector investments in the acquisition of PSI) of EUR9.5 billions and an ‘economic value' (part of national income attributable to industries and activities built on the exploitation of PSI) of EUR68 billions. Dr Rufus Pollock of Cambridge University, lead author of a UK report on the economic value of open data, has calculated that current plans to set UK government data free will create an estimated 6 billion GBP in additional value for the UK.

In Germany alone, the market for geo-information increased from EUR1 billion in 2000 (mainly from utility and engineering companies doing planning and maintenance systems) to EUR1.6 billion in 2006, with more than half the demand driven by a navigation market based on “free” private data. At about the same time, however, that is in 2007, the German government’s revenue from PSI was only EUR164,000. In Denmark, open publication of the official Danish addresses database had direct financial benefits around EUR 62 millions (~DKK 471 millions) in the period 2005-2009, with total costs until 2009 around EUR 2 millions. In 2010 it is estimated that social benefits from the agreement will be about EUR 14 millions (around 70% in the private sector), while costs will total about EUR 0.2 million.

Antoinette Graves, Office of Fair Trading OFT UK, noted in her 2009 presentation “The Price of Everything but the Value of Nothing” that:

  • “PSI is valuable and vitally important for businesses… a lot of products just could not be made, or could not be made in the form that they were, without access to and reuse of public sector information. When problems arose it was often due to public sector bodies that were doing something themselves that gave them an incentive to restrict access to the upstream level”.
  • “calculations indicated that the net value of public sector information in the United Kingdom is about 590 million GBP per year”