What are protocols and formats anyways?

(this page is part of the Family Guide to Digital Freedom, 2007 edition. Please do read that introduction to know more about the Guide, especially if you mean to comment this page. Thanks)

What are protocols and formats anyways?

We have already mentioned protocols and file formats several times, saying how important it is that they are open. Now let’s look at them a bit more in detail, in order to understand what these things are and how, and above all why, they can be made open (or closed, for that matter).

What are computer and communication protocols?

A protocol is a set of rules defining which messages two entities can exchange, to accomplish a task. The two entities can be either humans or computers. In the human case, for example, all the things to do or not do on the first (or on the third) date constitute a protocol, even if it is not an immutable one.

To understand even better just how important protocols are, try to compare operating systems and computers to people’s brains, and protocols or file formats to languages. Imagine that each human being was skillful enough to build his or her very own, one-of-a-kind computer, running unique software.

As long as all these computers were still using the same communication protocol, you could still have an Internet, just as millions of human beings with largely different brains can still communicate about many things without any ambiguity, as long as they use the same language and, of course, don’t lie. Protocols also define how the messages are formatted, i.e. of which and how many symbols they can be composed, and how to handle errors or interruptions in the communication.

What are file formats?

Any document stored as a sequence of digital bits inside a computer is a file. A file format is the set of rules that specify which bits mean what depending on their position and order within the file itself. A certain sequence of bits right at the beginning of the file, for example may mean “this is a text document created (and editable) only with the program called XYZ 2007”. Theoretically, the same sequence of bits later on may mean an altogether different thing, like the letter W, or “Underline the following word”.

The critical role of standards

Computer formats and communication protocols have real value only when they are formally recognized as official standards. As far as we are concerned, a standard is any set of rules which describe in full detail how to accomplish some generic task, which has been accepted by the computers, individuals or companies usually performing that task. A standard can be proprietary or non proprietary, open or closed.

In this context, closed means that not everybody is allowed to know what the rules are, or is allowed to use the standard altogether. You might have to pay a fee, commit to respecting some terms of use, or be simply told that you have no business looking into the standard.

Non-proprietary means that the rules do not belong to any single individual or company, but to some (generally non-profit, more or less open) community which has been acknowledged as competent and the ultimate court whenever the standard itself is concerned. A non-proprietary standard is something that only the whole community maintaining it can change. A proprietary standard belongs to one (maybe for-profit) company. Even if it is entirely published, that company can change it at will, whenever they feel like it, and without being forced in any way to inform all others of which changes where made. End users don’t notice this fact because they continue to do the same things in the same ways with their computers. Of course, this is true only as long as the company which owns the proprietary standard continues to release its software and as long as the end users can afford it.

For proprietary software producers, closed standards for file formats are an excellent way to force their customers to keep buying only their products. Once a personal diary, a contract or a business report have been saved inside a computer in a format which can only be read by one software program, never mind copyright! That document belongs to the developer or company who developed that program. There is no way to retrieve it, unless one is a very competent programmer with a lot of spare time, if the original program itself cannot be used anymore because it became too expensive or for any other reason.

Only through closed standards software producers can ask and justify higher prices at every release: their position is that they only have incentive to innovate (for the common good, of course) if they know that they can get such prices for new versions forced on end users every few years. If any bright idea should pass through the slow procedures of some committee, they say, and eventually be made public so everybody can make a profit out of it, it would be the death of innovation. The truth, instead, is that society can progress only if most computer file formats remain stable, completely open and there are no unnecessary duplicates. Humankind went in just a few centuries from runes on stone to wireless instant messaging, passing through printing presses, typewriters and fax machines. This happened exactly because the alphabets remained almost unchanged in that whole period, preserving knowledge: if every generation had had to stop to rewrite every written document in another alphabet, nobody would have get anything done.