Keep past and future separate
(this page is part of my 2011 report on “Open Data: Emerging trends, issues and best practices”. Please follow that link to reach the Introduction and Table of Content, but don’t forget to also check the notes for readers! of the initial report of the same project, “Open Data, Open Society”)
For the same reason why it is important to always distinguishes between political and economical advantages (or disadvantages) of Open Data, it is necessary to keep decisions about future data (those that will arrive in the future, due to new contracts, public services and so on) separate from those about data that already exist. At the end of 2010, T. Steinberg wrote that the idea that Government should publish everything non-private it can now is “rather dangerous”, and that it would be much better to release nothing until someone actually asked for it, and at that point doing it right, that is with an open license and so on. The first reasons for Steinberg’s concern is that asking for everything as soon as possible would “stress the system too much, by spreading thin the finite amount of good will, money and political capital”. The second is that many existing old data and data archival systems are, in practice, so uninteresting that it wouldn’t make sense to spend resources in opening them.
Even if these concerns were always true, it is important to realize that they apply (especially the second) to already existing data, not to future ones. The two classes of data have, or can have, very different constraints. Existing data may still exist only in paper format and/or be locked by closed or unclear licenses, or not relevant anymore for future decisions.
Opening future data, instead, is almost always more important, useful urgent, easier and cheaper than digitizing or even only reformatting material that in many cases is already too old to make immediate, concrete differencies. While this argument is probably not always true when we look at Open data for transparency, it probably is when it comes to economic development.
Therefore, features and guidelines that should be present in all future data generation and management processes include:
standardization: the less, obviously open, formats are used for data of the same type, the easier it is to merge and correlate them. The formats that have to be standardized are not only those at the pure software level. Even more important is, for example, to adopt by law standard identificators for government suppliers, names and machine-readable identifiers of budget voices and so on
preparation for future digitization: new digital systems should explicitly be designed from the beginning so that it will be possible, when non-digital records will be digitized, to add them to the databases without modifying losses.
The first two features have obvious technical advantages regardless of data openness. The last two, being critical, are discussed separately in the next paragraph.