(this page is part of my 2011 report on “Open Data: Emerging trends, issues and best practices”. Please follow that link to reach the Introduction and Table of Content, but don’t forget to also check the notes for readers! of the initial report of the same project, “Open Data, Open Society”)
Proper licensing of Public data is essential. The more Open Data activities continue, the clearer this rule becomes. What distinguishes Open Data from “mere” transparency is reuse. Paraphrasing Eaves, until a government get the licensing issue right, Open Data cannot bring all the possible benefits in that country. If there are no guarantees that public data can be used without restriction, very little happens in practice, and when it happens it may be something against the public interest.
Canadian Company Public Engines Inc, that is paid by local police departments to collect, process and analyze official crime data, also publishes online, with a proprietary license, anonymized summaries of those data. When in 2010 another company, Report See Inc, scraped those data from their website to reuse them, Public Engines sued.
Reporting this, D. Eaves rightly points out that both companies are right: one is trying to protect its investment, the other is simply trying to reuse what IS public data, by getting it from the ONLY place where it’s available. This is what happens when public officials leave the ownership of public data to the third parties hired to collect them. Please note that, in practice, it makes very little difference whether those third parties are private, for-profit corporations or even other Public Administrations. Unless, of course, there are national laws already in place that define in advance what is the license of all present and future Public Data, no matter how they were generated and by whom, those data can be lost in any moment for society. In all other cases, the legal status of data will be either officially closed and locked, or uncertain enough to prevent most or all reuses. In February 2011, the news came that, even if they weren’t the original copyright holders, Public Engines had been able to put together enough legal claims to convince Report See to give up.
Disputes like this should not happen and would not happen if all contracts regarding collection and management of PSI clearly specified that all the resulting data either go directly into the public domain (after being anonymized if necessary, of course) or remain exclusive property of the government. Even ignoring data openness, this is essential for at least three other reasons. The first is to protect a public administration from having to pay twice for those data, if it needs it again in the future for some other internal activity, not explicitly mentioned in the initial contract. The second reason is to not spend more than what is absolutely necessary to respond to public records requests, that is to comply with Freedom of Information laws.
The final reason is to guarantee quality assurance and detection of abuses at the smallest cost, that is sharing it with all the citizens using the public services based on those data. A real world example of this point comes from the “Where’s My Villo?” service in Brussels. Villo! is a citi-wide bike-sharing scheme started in May 2009, through a partnerships with a private company: JCDecaux finances the infrastructure and operates it, in exchange for advertising space on the bikes themselves and on billboards at the bike sharing stations. The availability of bikes and parking spaces of each station is published online in real time on the official Villo’s website.
When the quality of service decreased, some citizens started “Where’s My Villo?”, another website that reuses those data to measure where and how often there aren’t enough available bikes and parking spaces, in a way that made it impossible for JCDecaux to deny the problems and stimulated it to fix them. Both this happy ending and the fact that it came at almost no cost to the city, because citizens could monitor the service by themselves, were possible just because the data from the official website were legally and automatically reusable.