What in the world is this "source code" anyway?

(this page is part of the Family Guide to Digital Freedom, 2007 edition. Please do read that introduction to know more about the Guide, especially if you mean to comment this page. Thanks)

The source code of a software program is the complete description, in a (theoretically) human readable programming language, of what that program must do. For 95% of human beings, this sounds like one of the most boring and useless things to know or look at. However, even if almost nobody has to look at any source code, the way it is managed is still crucial for our lives and civilù rights, not to mention, sometimes, national security.

Source versus machine code

Internally, computers do not need source code, nor would they be able to understand it. The only instructions that any microprocessor, the central electronic circuit of every computer, can directly understand are machine codes. These are special sequences of bits (1’s and 0’s), each of which corresponds to one specific operation which the processor is able to execute at very high speed: things like sum, multiply, copy data from one location to another and so on. Any software program, no matter how complex it is, is just a sequence of instructions in machine code.

It is possible to write programs directly in machine code, but it is a very boring, complicated and error prone activity. Since only very elementary instructions are available, it is necessary to write a lot of them for even the simplest task and, even for a competent programmer, the resulting code is very hard to read and figure out, especially when it is necessary to update or modify programs written by somebody else. For all these reasons, machine code is written by hand only in special situations where it is absolutely necessary to maximize the performances of the computer by giving to it the smallest possible number of low level instructions.

How source code is used

Today it is possible to describe the behavior of a software program in a wide variety of computer languages at a much higher level than would be possible with machine code. These descriptions, that is the source code of the actual programs, are then translated into machine code by specialized programs called compilers. The process is semiautomatic, since the compiler needs specific instructions to perform the conversion in the optimal way.

The development of compilers and the related possibility of writing and modifying only source code in high level computer languages has been a huge step forward for software engineering and society as a whole.

Every software language has high level operators which allow one to express the basic steps of any generic procedure: some of these operators mean things like “if this is true, do X, otherwise do Y” or “write these data to a file named Z”, others can describe with one or two keywords very long sequence of repetitive instructions or very complex mathematical operations.

Thanks to all these features, writing programs in a high level software language takes much, much less time than doing the same thing in machine code. The same applies to correcting errors, adding new capabilities to a program or merging two programs into a new one. Machine code is also very dependent on the physical structure of the processor which executes it: as a general rule, the machine code written for one processor cannot run without heavy changes or a complete rewriting, on any different processor model.

This doesn’t happen with source code, because it is written at a higher abstraction level. Besides productivity at the initial stage, the first time a program is created, this gives another huge practical advantage: portability, that is the capability of generating different versions of machine code, each optimized for one specific processor or operating system. In order to create a different version, the programmer only needs to give different instructions to the compiler: there is no need to rewrite the whole program from scratch.

Why it is necessary to know what source code is

Besides the huge improvements to design and maintenance, source code is also essential to figure out how a program really works. In other words, studying the source code of real, widely used software and, if possible, improving it, is by far also the best possible way to learn programming well enough to make a living (or anything else really meaningful) out of it.

Source code is also all that is needed to generate a working copy of a software program perfectly equal to the official one distributed by the original author. For this reason, controlling by law how source code can be distributed, or if, how and by who it can be modified, is a very powerful economic weapon for everybody who wanted to either stimulate or prevent competition in the software industry. Last but not least, access to source code, to check without intermediaries the presence of security problems, is vital for every software program used by military equipment or when, for example, the software shall be used to protect state secrets.