Real impact of data manipulation or misunderstanding

(this page is part of my 2011 report on “Open Data: Emerging trends, issues and best practices”. Please follow that link to reach the Introduction and Table of Content, but don’t forget to also check the notes for readers! of the initial report of the same project, “Open Data, Open Society”)

The fix for the risk that data is manipulated is to not only open government data and procedures, but to simplify the latter (which eventually also greatly reduces cost) as much as possible. Abundance of occasions to secretly play with data and how they are managed is a symptom of excessive, or peak complexity: again, problems and risks with Open Data are a symptom of a [pre-existing] problem that is somewhere else.

Regardless of the real probability of data alterations before they are published, the major problem happens after. We already mentioned in the first report the fact that, while correct interpretation of public data from the majority of average citizens is absolutely critical, the current situation, even in countries with (theoretical) high alfabetization and Internet access rates, is one in which most people still lack the skills needed for such analyses. Therefore, there surely is space for both intentional manipulation of PSI and for misunderstanding it. After the publication of the first report, we’ve encountered several examples of this danger, which are reported in the rest of this paragraph.

Before describing those cases, and in spite of them, it is necessary to point out one thing. While the impact on the general public (in terms of raising interest and enhancing participation) on the Open Data activity of 2010 is been, in many cases and as of today, still minimal, it is also true that there has been no big increase in demagogy, more or less manipulated scandals and conflictual discussion caused by Open Data. There has certainly been something of this in the Cablegate but that’s not really relevant because, as we’ve already explained, what Wikileaks did is intrinsically different from Open Data. So far, negative or at least controversial reactions by manipulation and misunderstanding of Open Data haven’t happened to such a scale to justify not opening PSI.

This said, let’s look at some recent example of misunderstanding and/or manipulation based on (sometimes open) public digital data.

Nicolas Kayser-Bril mentioned a digital map of all the religious places in Russia, that shows [also] “mosques that are no longer in use, so as to convey the idea that Muslims were invading Russia."

In September 2010 the Italian National Institute of Geophysics and Vulcanology officially declared in September 2010 that they were evaluating whether to stop publishing online Italy’s seismic data, as they had been doing for years. The reason was that, following the March 2009 earthquake in Italy, the data were being used to “come to conclusions without any basis at all”, both by the press, to sell more, and by local politicians trying to hide the lack of preventive measures, like enforcing anti seismic construction codes.

Still in Italy, Daniele Belleri runs a Milan crime mapping blog called “Il giro della Nera”, making a big effort to explain to his readers the limits of the maps he publishes, and the potential for misunderstanding if they are used without preparation, or with wrong expectations. This is a synthesis of Belleri’s explanation, also covered in other websites, that is applicable to any map-based PSI analysis and presentation, not just to crime mapping:

In general, a map is just a map, not reality. It doesn't always and necessarily provide scientific evidence. Crime maps, for example, are NOT safety maps, as most citizens would, more or less consciously, like them to be: a tool that tells them where to buy a house their according to the level of criminality in the district.
When used in that way, crime maps can give unprepared users two false impressions: the first, obvious one, is that certain areas are only criminal spaces, exclusively inhabited by criminals. The other is to encourage a purely egoistic vision of the city, where the need for safety becomes paranoia and intolerance and all that matters is to be inside some gated community. This doesn't lower crime levels at all: the only result is to increase urban segregation.

To make things worse, crime data not analyzed and explained properly don’t just contribute to strengthten egoistic attitudes and lock the urban areas that are actually the most plagued by crime into their current difficult state undefinitely. Sometimes, they may even perpetuate beliefs that are, at least in part, simply false. Of course, when those beliefs not grounded in facts already existed, open crime data can help, by finding and proving the gaps between perception of criminality and reality. Belleri, for example, notes that residents of Milan consider the outskirts of their city more dangerous than downtown Milan, while Londoners think the opposite about London… but in both cities the truth emerging from data is exactly the opposite (at least for certain categories of crime) of what their residents believe.