This is part of a series of posts about writing a simple interpreter for a small Lisp-like language. Please see here for an overview of the series.
In the last two entries we defined a basic data model with a handful of types. Today we’ll add some functionality with real, externally-observable effects: printing. Specifically, we’ll write functions that print Sky data to a C stream using representations that can be read back in to construct equivalent objects1.
You can see the code as of this post here, and a comparison against last time here.
At the highest level, we have a function print
that dispatches to an
appropriate function based on the type of value being printed, plus println
,
which calls print
and then prints a newline.
void print(FILE *stream, value_t value)
{
enum type_tag tag = get_type_tag(value);
switch(tag) {
case TAG_INT: print_integer(stream, value); return;
case TAG_CHAR: print_character(stream, value); return;
case TAG_STRING: print_string(stream, value); return;
case TAG_SYMBOL: print_symbol(stream, value); return;
case TAG_LIST: print_list(stream, value); return;
default: abort();
}
}
void println(FILE *stream, value_t value)
{
print(stream, value);
putc('\n', stream);
}
The function to print an integer is the least interesting of them, especially since we’re not going to support printing in different bases.
static void print_integer(FILE *stream, value_t value)
{
intptr_t i = integer_data(value);
fprintf(stream, "%" PRIdPTR, i);
}
It may be worth noting that PRIdPTR
is a preprocessor macro, defined in
inttypes.h
along with several others for the same purpose, that
expands to whatever the correct format specifier is for intptr_t
on that
platform. On my x86-64
machine running Fedora, it expands to "ld"
, making
(via the automatic concatenation of adjacent string literals) "%" PRIdPTR
equivalent to "%ld"
(the format specifier for a long
).
The function to print a character is more involved because it has to handle a
few different scenarios. Borrowing the syntax from Common Lisp, all character
literals start with #\
, followed by:
- If it’s a non-whitespace character with a graphical representation, the
character itself. For example, the character “a” is printed as
#\a
, and “!” is printed as#\!
. - If it’s a whitespace character, a symbolic representation based on the
character’s name. For example, the space character is printed as
#\space
, and newline as#\newline
. - Otherwise, the character’s numeric value in hexadecimal, prefixed by “x”. For
example, the character code of ASCII
DEL
is 7F in hexadecimal (127 in decimal), so it’s printed as#\x7F
.
Here it is:
static void print_character(FILE *stream, value_t value)
{
int c = character_data(value);
fputs("#\\", stream);
if (isgraph(c)) {
putc(c, stream);
return;
}
switch(c) {
case '\b': fputs("backspace", stream); return;
case '\t': fputs("tab", stream); return;
case '\n': fputs("newline", stream); return;
case '\v': fputs("vtab", stream); return;
case '\f': fputs("formfeed", stream); return;
case '\r': fputs("return", stream); return;
case ' ': fputs("space", stream); return;
default: fprintf(stream, "x%02X", c); return;
}
}
The functions to print strings and symbols both call out to print_string_1
and
pass a bool
argument, symbol
, to identify whether the value should be
printed as a symbol. The only difference here is that strings are printed within
double-quotes while symbols aren’t2.
Like in print_character
, we have a switch
statement to control how certain
characters are represented. This time we print a space as, well, a space, and
other whitespace characters in their standard backslash-escaped string syntax.
Because we want what we print to be valid syntax (so it can be read
back in),
we also backslash-escape double-quotes to keep them from terminating the string
prematurely, and backslash characters to keep them from being interpreted as
escapes. Characters without printable representations are again represented in
hex, like \x7F
.
static void print_string_1(FILE *stream, value_t value, bool symbol)
{
ptrdiff_t len = string_length(value);
if (!symbol) putc('"', stream);
for (ptrdiff_t i = 0; i < len; i++) {
int c = string_ref(value, i);
switch(c) {
case '\b': fputs("\\b", stream); break;
case '\t': fputs("\\t", stream); break;
case '\n': fputs("\\n", stream); break;
case '\v': fputs("\\v", stream); break;
case '\f': fputs("\\f", stream); break;
case '\r': fputs("\\r", stream); break;
case '"': fputs("\\\"", stream); break;
case '\\': fputs("\\\\", stream); break;
default:
if (isprint(c))
putc(c, stream);
else
fprintf(stream, "\\x%02X", c);
break;
}
}
if (!symbol) putc('"', stream);
}
Then we have the actual print_string
and print_symbol
functions, which both
simply call print_string_1
with appropriate arguments.
static void print_string(FILE *stream, value_t value)
{
print_string_1(stream, value, false);
}
static void print_symbol(FILE *stream, value_t value)
{
value_t name = symbol_name(value);
print_string_1(stream, name, true);
}
Last but not least we have print_list
. In Lisps, lists are printed delimited
between parentheses, with a space between each element. A list containing the
strings “cool” and “list” will therefore look like ("cool" "list")
. Besides
printing those delimiters, it just loops through the list and invokes print
on
each element.
static void print_list(FILE *stream, value_t value)
{
putc('(', stream);
while (value != NIL) {
value_t fst = first(value);
print(stream, fst);
value = rest(value);
if (value != NIL)
putc(' ', stream);
}
putc(')', stream);
}
Error handling and the lack thereof
C I/O functions like putc
, fputs
, and fprintf
can all report errors via
their return value. We’re ignoring that for now, but a production language would
certainly need to pay attention and respond appropriately.
Convenient printing in GDB
If you use these printing functions in GDB, you will quickly get tired of typing
call println(stdout, value)
. Luckily, GDB supports user-defined
commands, so we can make things more convenient.
I added a file .gdbinit
in Sky’s src
directory which defines two commands:
define pp
set $tmp = $arg0
call println(stdout, $tmp)
end
define pr
pp $
end
The command pp
prints its argument, and pr
calls pp
on the last output.
Here’s a quick demo:
~/code/sky/src/> gdb sky
(gdb) source .gdbinit
(gdb) br main
(gdb) r
(gdb) set $s1 = make_string("cool", 4)
(gdb) set $s2 = make_string("list", 4)
(gdb) set $list = cons($s1, cons($s2, NIL))
(gdb) pp $list
("cool" "list")
(gdb) print make_symbol(make_string("sky", 3))
$1 = 6317808
(gdb) pr
sky
If you add the src
directory to GDB’s auto-loading safe path, it
will source .gdbinit
automatically (meaning you won’t need to type source
.gdbinit
as I did in the first GDB command above).
Next time
That’s all for now. Next time we’ll add the functionality to read
Sky data.
-
This is also called serialization to differentiate it from writing arbitrary bytes. Different languages make this distinction in different ways. In Common Lisp terms, we’re writing
prin1
as opposed towrite
orprinc
. In Python terms, we’re writing an equivalent toprint("%r" % obj)
as opposed toprint("%s" % obj)
. ↩︎ -
Note that there’s a problem with this. A symbol can have any string as its name, and in this implementation symbols with certain names can’t be read back in correctly. For instance, a symbol with the name
"foo bar"
, would be printedfoo bar
, but that would be read back in as two symbols (foo
andbar
). We’ll come back to this later. ↩︎