Sky-0.0.3: Printing data

This is part of a series of posts about writing a simple interpreter for a small Lisp-like language. Please see here for an overview of the series.

In the last two entries we defined a basic data model with a handful of types. Today we’ll add some functionality with real, externally-observable effects: printing. Specifically, we’ll write functions that print Sky data to a C stream using representations that can be read back in to construct equivalent objects1.

You can see the code as of this post here, and a comparison against last time here.

At the highest level, we have a function print that dispatches to an appropriate function based on the type of value being printed, plus println, which calls print and then prints a newline.

void print(FILE *stream, value_t value)
{
    enum type_tag tag = get_type_tag(value);

    switch(tag) {
    case TAG_INT:    print_integer(stream, value); return;
    case TAG_CHAR:   print_character(stream, value); return;
    case TAG_STRING: print_string(stream, value); return;
    case TAG_SYMBOL: print_symbol(stream, value); return;
    case TAG_LIST:   print_list(stream, value); return;
    default:         abort();
    }
}

void println(FILE *stream, value_t value)
{
    print(stream, value);
    putc('\n', stream);
}

The function to print an integer is the least interesting of them, especially since we’re not going to support printing in different bases.

static void print_integer(FILE *stream, value_t value)
{
    intptr_t i = integer_data(value);
    fprintf(stream, "%" PRIdPTR, i);
}

It may be worth noting that PRIdPTR is a preprocessor macro, defined in inttypes.h along with several others for the same purpose, that expands to whatever the correct format specifier is for intptr_t on that platform. On my x86-64 machine running Fedora, it expands to "ld", making (via the automatic concatenation of adjacent string literals) "%" PRIdPTR equivalent to "%ld" (the format specifier for a long).

The function to print a character is more involved because it has to handle a few different scenarios. Borrowing the syntax from Common Lisp, all character literals start with #\, followed by:

  • If it’s a non-whitespace character with a graphical representation, the character itself. For example, the character “a” is printed as #\a, and “!” is printed as #\!.
  • If it’s a whitespace character, a symbolic representation based on the character’s name. For example, the space character is printed as #\space, and newline as #\newline.
  • Otherwise, the character’s numeric value in hexadecimal, prefixed by “x”. For example, the character code of ASCII DEL is 7F in hexadecimal (127 in decimal), so it’s printed as #\x7F.

Here it is:

static void print_character(FILE *stream, value_t value)
{
    int c = character_data(value);

    fputs("#\\", stream);

    if (isgraph(c)) {
        putc(c, stream);
        return;
    }

    switch(c) {
    case '\b': fputs("backspace", stream); return;
    case '\t': fputs("tab", stream); return;
    case '\n': fputs("newline", stream); return;
    case '\v': fputs("vtab", stream); return;
    case '\f': fputs("formfeed", stream); return;
    case '\r': fputs("return", stream); return;
    case ' ':  fputs("space", stream); return;
    default:   fprintf(stream, "x%02X", c); return;
    }
}

The functions to print strings and symbols both call out to print_string_1 and pass a bool argument, symbol, to identify whether the value should be printed as a symbol. The only difference here is that strings are printed within double-quotes while symbols aren’t2.

Like in print_character, we have a switch statement to control how certain characters are represented. This time we print a space as, well, a space, and other whitespace characters in their standard backslash-escaped string syntax. Because we want what we print to be valid syntax (so it can be read back in), we also backslash-escape double-quotes to keep them from terminating the string prematurely, and backslash characters to keep them from being interpreted as escapes. Characters without printable representations are again represented in hex, like \x7F.

static void print_string_1(FILE *stream, value_t value, bool symbol)
{
    ptrdiff_t len = string_length(value);

    if (!symbol) putc('"', stream);

    for (ptrdiff_t i = 0; i < len; i++) {
        int c = string_ref(value, i);

        switch(c) {
        case '\b': fputs("\\b", stream); break;
        case '\t': fputs("\\t", stream); break;
        case '\n': fputs("\\n", stream); break;
        case '\v': fputs("\\v", stream); break;
        case '\f': fputs("\\f", stream); break;
        case '\r': fputs("\\r", stream); break;
        case '"':  fputs("\\\"", stream); break;
        case '\\': fputs("\\\\", stream); break;
        default:
            if (isprint(c))
                putc(c, stream);
            else
                fprintf(stream, "\\x%02X", c);
            break;
        }
    }

    if (!symbol) putc('"', stream);
}

Then we have the actual print_string and print_symbol functions, which both simply call print_string_1 with appropriate arguments.

static void print_string(FILE *stream, value_t value)
{
    print_string_1(stream, value, false);
}

static void print_symbol(FILE *stream, value_t value)
{
    value_t name = symbol_name(value);
    print_string_1(stream, name, true);
}

Last but not least we have print_list. In Lisps, lists are printed delimited between parentheses, with a space between each element. A list containing the strings “cool” and “list” will therefore look like ("cool" "list"). Besides printing those delimiters, it just loops through the list and invokes print on each element.

static void print_list(FILE *stream, value_t value)
{
    putc('(', stream);

    while (value != NIL) {
        value_t fst = first(value);
        print(stream, fst);
        value = rest(value);
        if (value != NIL)
            putc(' ', stream);
    }

    putc(')', stream);
}

Error handling and the lack thereof

C I/O functions like putc, fputs, and fprintf can all report errors via their return value. We’re ignoring that for now, but a production language would certainly need to pay attention and respond appropriately.

Convenient printing in GDB

If you use these printing functions in GDB, you will quickly get tired of typing call println(stdout, value). Luckily, GDB supports user-defined commands, so we can make things more convenient.

I added a file .gdbinit in Sky’s src directory which defines two commands:

define pp
  set $tmp = $arg0
  call println(stdout, $tmp)
end

define pr
  pp $
end

The command pp prints its argument, and pr calls pp on the last output. Here’s a quick demo:

~/code/sky/src/> gdb sky
(gdb) source .gdbinit
(gdb) br main
(gdb) r
(gdb) set $s1 = make_string("cool", 4)
(gdb) set $s2 = make_string("list", 4)
(gdb) set $list = cons($s1, cons($s2, NIL))
(gdb) pp $list
("cool" "list")
(gdb) print make_symbol(make_string("sky", 3))
$1 = 6317808
(gdb) pr
sky

If you add the src directory to GDB’s auto-loading safe path, it will source .gdbinit automatically (meaning you won’t need to type source .gdbinit as I did in the first GDB command above).

Next time

That’s all for now. Next time we’ll add the functionality to read Sky data.


  1. This is also called serialization to differentiate it from writing arbitrary bytes. Different languages make this distinction in different ways. In Common Lisp terms, we’re writing prin1 as opposed to write or princ. In Python terms, we’re writing an equivalent to print("%r" % obj) as opposed to print("%s" % obj)↩︎

  2. Note that there’s a problem with this. A symbol can have any string as its name, and in this implementation symbols with certain names can’t be read back in correctly. For instance, a symbol with the name "foo bar", would be printed foo bar, but that would be read back in as two symbols (foo and bar). We’ll come back to this later. ↩︎