What is Literate Programming?

A traditional computer program consists of a text file containing program code. Scattered in amongst the program code are comments which describe the various parts of the code.

In literate programming the emphasis is reversed. Instead of writing code containing documentation, the literate programmer writes documentation containing code. No longer does the English commentary injected into a program have to be hidden in comment delimiters at the top of the file, or under procedure headings, or at the end of lines. Instead, it is wrenched into the daylight and made the main focus. The ``program'' then becomes primarily a document directed at humans, with the code being herded between ``code delimiters'' from where it can be extracted and shuffled out sideways to the language system by literate programming tools.

The effect of this simple shift of emphasis can be so profound as to change one's whole approach to programming. Under the literate programming paradigm, the central activity of programming becomes that of conveying meaning to other intelligent beings rather than merely convincing the computer to behave in a particular way. It is the difference between performing and exposing a magic trick.

In order to program in a literate style, particular tools are required. The traditional approach (used in the FunnelWeb system) is to have some sort of text-file-in/text-file-out utility that reads a literate program (containing a program commentary peppered with scraps of program text) and writes out a file containing all the program code and a file containing typesetter commands representing the entire input document, documentation, code, and all (Figure 1).

\begin{figure}
\begin{verbatim}
+-----------------------------------------...
...ditional architecture of literate programming tools.}
\smallskip
\end{figure}

Given the coming age of hypertext systems, this is probably not the best approach. However, it does mesh beautifully with current text files and command line interfaces, the expectation of linear presentations in the documents we read, and the particular requirements of current programming languages and typesetting systems. It is certainly not a bad approach.

With this structure in place, the literate programming system can provide far more than just a reversal of the priority of comments and code. In its full blown form, a good literate programming facility can provide total support for the essential thrust of literate programming, which is that computer programs should be written more for the human reader than for the compiler. In particular, a literate programming system can provide:

1cm 1cm Re-ordering of code: Programming languages often force the programmer to give the various parts of a computer program in a particular order. For example, the Pascal programming language[BSI82]
[1] imposes the ordering: constants, types, variables, procedures, code. Pascal also requires that procedures appear in an order consistent with the partial ordering imposed by the static call graph (but forward declarations allow this to be bypassed). In contrast, the literate style requires that the programmer be free to present the computer program in any order whatsoever. The facility to do this is implemented in literate programming tools by providing text macros that can be defined and used in any order.

1cm 1cm Typeset code and documentation: Traditionally program listings are dull affairs consisting of pages of fan-form paper imprinted with meandering coastlines of structured text in a boring font. In contrast, literate programming systems are capable of producing documentation that is superior in two ways. First, because most of the documentation text is fed straight to the typesetter, the programmer can make use of all the power of the underlying typesetter, resulting in documentation that has the same presentation as an ordinary typeset document. Second, because the literate programming utility sees all the code, it can use its knowledge of the programming language and the features of the typesetting language to typeset the program code as if it were appearing in a technical journal. It is the difference between:

    while sloth<walrus loop
       sloth:=sloth+1;
    end loop

1cm 1cm and

while sloth<walrus loop
                slothsloth+1;
end loop

1cm 1cm Unfortunately, while FunnelWeb provides full typesetting of the documentation, it typesets all of its code in the style of the first of these two examples. To typeset in the style of the second requires knowledge of the programming language, and the current version of FunnelWeb is programming language independent. At a later stage, it is possible that FunnelWeb will be modified to read in a file containing information about the target programming language to be used to assist in typesetting the code properly.

1cm 1cm Cross referencing: Because the literate tool sees all the code and documentation, it is able to generate extensive cross referencing information in the typeset documentation. This makes the printed program document more easy to navigate and partially compensates for the lack of an automatic searching facility when reading printed documentation.

In the end, the details don't matter. The most significant benefit that literate programming offers is its capacity to transform the state of mind of the programmer. It is now legend that the act of explaining something can transform one's understanding of it. This is one of the justifications behind the powerful combination of research and teaching in universities[Rosovsky90]
[1]. Similarly, by constantly explaining the unfolding program code in English to an imaginary reader, the programmer transforms his perception of the code, laying it open, prone, to the critical eye.

The result of this exposure is a higher quality of programming. When exposed to the harsh light of the literate eye, bugs crawl out, special cases vanish, and sloppy code evaporates. As a rule literate programs take longer to write than ordinary programs, but the total development time is the same or less because the time taken to write and document the program carefully is compensated for by a reduced debugging and maintenance time. Thus literate programming does not merely assist in the preparation of documentation, but also makes significant contributes to the process of programming itself. In practice this has turned out to be a contribution far more important than the mere capacity to produce typeset documentation.

For more information on literate programming, the reader is directed to Knuth's early founding work [Knuth83]
[1] and [Knuth84]
[1]. For more recent information refer to [Smith91]
[1], which provides a comprehensive bibliography up to 1990.