Donald Knuth coined the term "literate programming" to refer to a programming approach whereby a programmer develops a program "in the order demanded by logic and flow of their thoughts" [1]. Rather than produce source code that is commented with textual descriptions, a textual description is produced that describes the structure and semantics of code chunks embedded within the prose.

Tools can then be used to produce reader friendly documentation woven from the source, as well as an executable/compilable tangled form. Knuth's original tool was called "Web" [2], however other tools have since been developed that are language-agnostic [3].

The following code fragment from the literate programming Wikipedia page demonstrates how the "web" system works [1]. The text '<>=' defines a macro that is associated with the code that follows it.

    <<Scan file>>=
    while (1) {
        <<Fill buffer if it is empty; break at end of file>>
        c = *ptr++;
        if ( c > ' ' && c < 0177 ) {
            /* visible ASCII codes */
            if ( !in_word) {
                word_count++;
                in_word = 1;
            }
            continue;
        }
        if ( c == '\n' ) line_count++;
        else if ( c != ' ' && c != '\t') continue;
        in_word = 0;
            /* c is newline, space, or tab */
    }
    @

The macro '<>' could then be used in any other code chunk.

A problem with such an approach is the possibility that while a reader may think they fully understand the code they are reading, it is possible that they do not notice a specific interaction between various code chucks. It would be necessary for the reader to reference the tangled code in order to be sure they are properly understanding interactions within the system.

A related problem is that there are no limitation on how macros are used, allowing code to be intermixed in arbitrary ways. Software developed using the system may become increasingly hard to maintain as others are forced edit the source files.

This document describes Quasi, a tool for quasi-literate programming. It has been developed in the spirit of Knuth's literate programming but, by providing a far less powerful tool, it also simplifies the process from the perspective of a maintenance programmer.

Background

While this tool was inspired by literate programming, it is derived from an earlier tool called "extract" that is used for extracting SQL definitions from web-application requirements documents for use in database initialisation scripts. The tool would scan text files and extract the text of pre-formatted sections that matched a user supplied pattern.

For example, this command would extract the following block of SQL:

    extract -p "users_table" source_file.txt >> output_file.sql
    ~users_table~
    CREATE TABLE users
    (
    USER        INT(11)  NOT NULL AUTO_INCREMENT
    PRIMARY KEY (USER)
    );
    ~

This allowed the definitions of SQL tables and SQL Stored Procedures to be directly developed and documented within a requirements document.
This provided the motivation for developing a similar tool to also extract source code from documentation.


Concept

Similar to "extract", "quasi" extracts sections of pre-formatted text from documentation and appends it to target text files. Unlike "extract", rather than matching a supplied pattern, the identifier in the pre-formatted text section is used as the file path of the target file relative to a user supplied base directory.

For example, this command would extract the following block of source code and append it to the file 'source/c/quasi.c':

    quasi source source/mtx/quasi.mtx
    ~c/quasi.c~
    int main( int argc, char** argv )
    {
        return 0;
    }
    ~

The tool does sanitation of the filenames, ensuring that parent directory ('..') commands aren't included and therefore that output files remain under the specified base directory. If the specified base directory already exists the tool will exit with an error, unless the '-f' flag is passed as the first command argument.

    quasi -f source source/mtx/quasi.mtx

If the identifier of the pre-formatted block section is prefixed by an exclamation mark the file is truncated on opening. It is advisable that when a file is truncated in this manner that the code fragment be a comment warning that the file is generated:

    ~!c/quasi.c~
    /*   !!!   Warning this file is auto-generated   !!!   */
    ~

Quasi is implemented to process text files that use the MaxText text format [4]. If code fragments are not appropriate for the output documentation they can be commented using the standard MaxText commenting character, causing them to be ignored by MaxText, but still be processed by Quasi. This is useful for hiding code comments, or perhaps includes.

    !
    Include various standard includes.

    ~!c/quasi.c!~
    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    ~
    !

The key difference between literate programming tools and Quasi is that Quasi forces the programmer to construct all target source files in a linear fashion, however, separate files may still be constructed in parallel. It is thought that an additional benefit of this approach is that it will enable programmers to better modularise their software, as there is very little overhead in creating new files.

Bibliography

[1] Wikipedia: literate programming. http://en.wikipedia.org/wiki/Literate_programming
[2] The CWEB System of Structured Documentation http://www-cs-faculty.stanford.edu/~uno/cweb.html
[3] Noweb - A Simple, Extensible Tool for Literate Programming http://www.cs.tufts.edu/~nr/noweb/
[4] MaxText will be released publicly soon.