SourceForge.net Logo
sourceforge.net > tpl project > tpl home > tpl User Guide

Overview

Serialization in C

Tpl is a library for serializing C data. The data is stored in its natural binary form. The API is small and tries to stay "out of the way". Compared to using XML, tpl is faster and easier to use in C programs. Tpl can serialize many C data types, including structures.

Applications

Here are some ways that tpl can be used:

Tip
A convenient file format
Tpl makes a convenient file format. For example, suppose a program creates 2D polygons, where each one is a list of x,y coordinates and an RGB color. A tpl having the format string A(A(ff)ccc) could be used as the file format. The tpl API makes it easy to read and write (serialize) such structured data.

Typed data storage

The "data type" of a tpl is explicitly stated as a format string. There is never any ambiguity about the type of data stored in a tpl. Some examples:

The tpl image

A tpl image is the serialized form of a tpl, stored in a memory buffer or file.

XML

While a tpl image is a binary entity, you can view any tpl image in XML format using the included tplxml utility, located in the lang/perl directory.

tplxml file.tpl > file.xml

The utility is bidirectional— you can convert XML back to tpl the same way. The file extension is not important; tplxml inspects its input to see if it's tpl or XML. You can also pipe data into it instead of giving it a filename. The tplxml utility is slow. Its purpose is two-fold: debugging (manual inspection of the data in a tpl), and interoperability with XML-based programs. The resulting XML is often ten times the size of the original binary tpl image.

Perl

There is a Perl module in lang/perl/Tpl.pm. The Perl API is convenient for writing Perl scripts that interoperate with C programs, and need to pass structured data back and forth. It is written in pure Perl.

Platforms

The tpl software was developed for POSIX systems and has been tested on:

32-bit and 64-bit support

Tpl supports compilation in either 32-bit or 64-bit memory models.

BSD licensed

This software is made available under the BSD license. It is free and open source.

Obtaining tpl

Please follow the link to download on the tpl website.

Getting help

You can email the tpl mailing list at tpl-discuss@lists.sourceforge.net or the author at troydhanson@comcast.net for any questions.

Build and install

As source

The simplest way to use tpl is to copy the source files tpl.h and tpl.c (from the src/ directory) right into your project, and build them with the rest of your source files. No special compiler flags are required. (Note to MinGW and Cygwin users: tpl must be built as a library on these platforms.)

As a library

Alternatively, to build tpl as a library, from the top-level directory, run:

./configure
make
make install

This installs a static library libtpl.a and a shared library (e.g., libtpl.so), if your system supports them, in standard places. The installation directory can be customized using ./configure —prefix=/some/directory. Run configure —help for further options.

Test suite

You can compile and run the built-in test suite by running:

cd tests/
make

API concepts

To use tpl, you need to know the order in which to call the API functions, and the background concepts of format string, arrays and index numbers.

Order of functions

Creating a tpl is always the first step, and freeing it is the last step. In between, you either pack and dump the tpl (if you're serializing data) or you load a tpl image and unpack it (if you're deserializing data).

Table: Order of usage
Step If you're serializing… If you're deserializing…
1. tpl_map() tpl_map()
2. tpl_pack() tpl_load()
3. tpl_dump() tpl_unpack()
4. tpl_free() tpl_free()

Format string

When a tpl is created using tpl_map(), its data type is expressed as a format string. Each character in the format string requires an argument of the correct type, shown in the table below. An example:

tpl_node *tn;
char c;
int i[10];
tn = tpl_map("ci#", &c, i, 10);  /* ci# is our format string */
Table: Supported format characters
Type Description Required argument type
i 32-bit signed int int32_t* or equivalent
u 32-bit unsigned int uint32_t* or equivalent
I 64-bit signed int int64_t* or equivalent
U 64-bit unsigned int uint64_t* or equivalent
c character (byte) char*
s string char**
f 64-bit double precision float double* (on some platforms)
# length; modifies preceding iuIUcsf int
B binary buffer (arbitrary-length) tpl_bin*
S structure (…) struct *
A array (…) none

Data type sizes

The sizes of data types such as long and double vary by platform. This must be kept in mind because most tpl format characters require a pointer argument to a specific-sized type, listed above. You can use explicit-sized types such as int32_t (defined in inttypes.h) in your program if you find this helpful.

#include <inttypes.h>
...
int32_t i;
tpl_node *tn;
tn = tpl_map("i", &i);
The trouble with double

Unfortunately there are no standard explicit-sized floating-point types— no float64_t, for example. If you plan to serialize double on your platform using tpl's f format character, first be sure that your double is 64 bits. Second, if you plan to deserialize it on a different kind of CPU, be sure that both CPU's use the same floating-point representation such as IEEE 754.

Arrays

Arrays come in two kinds:

  1. fixed-length

  2. variable-length

Before explaining all the concepts, it's illustrative to see how both kinds of arrays are used. Let's pack the integers 0 through 9 both ways.

Example: Packing 0-9 as a fixed-length array
#include "tpl.h"
int main() {
    tpl_node *tn;
    int x[] = {0,1,2,3,4,5,6,7,8,9};

    tn = tpl_map("i#", x, 10);
    tpl_pack(tn,0);                         /* pack all 10 elements at once */
    tpl_dump(tn, TPL_FILE, "/tmp/fixed.tpl");
    tpl_free(tn);
}

Note that the length of the fixed-length array (10) was passed as an argument to tpl_map(). Now let's see how we would pack 0-9 as a variable-length array:

Example: Packing 0-9 as a variable-length array
#include "tpl.h"
int main() {
    tpl_node *tn;
    int x;

    tn = tpl_map("A(i)", &x);
    for(x = 0; x < 10; x++) tpl_pack(tn,1);  /* pack one element at a time */
    tpl_dump(tn, TPL_FILE, "/tmp/variable.tpl");
    tpl_free(tn);
}

Notice how we called tpl_pack in a loop, once for each element 0-9. You might also notice that this time, we passed 1 as the final argument to tpl_pack. This is an index number that designates which variable-length array we're packing (in this case, there is only one.)

Index numbers

Index numbers identify a particular variable-length array in the format string. Each A(…) in a format string has its own index number. The index numbers are assigned left-to-right starting from 1. Examples:

A(i)        /* index number 1 */
A(i)A(i)    /* index numbers 1 and 2 */
A(A(i))     /* index numbers 1 and 2 (nesting doesn't matter) */
Special index number 0

The special index number 0 designates all the format characters that are not inside an A(…). Examples of what index 0 does (and does not) designate:

S(ius)      /* index 0 designates the whole thing */
iA(c)u      /* index 0 designates the i and the u */
c#A(i)S(ci) /* index 0 designates the c# and the S(ci) */

An index number is passed to tpl_pack and tpl_unpack to specify which variable-length array (or non-array, in the case of index number 0) to act upon.

Strings

Tpl can serialize C strings:

Example: Packing a string
    #include "tpl.h"

    int main() {
        tpl_node *tn;
        char *s = "hello, world!";
        tn = tpl_map("s", &s);
        tpl_pack(tn,0);  /* copies "hello, world!" into the tpl */
        tpl_dump(tn,TPL_FILE,"string.tpl");
        tpl_free(tn);
    }

When deserializing (unpacking) a C string, space for it will be allocated automatically, but you are responsible for freeing it:

Example: Unpacking a string
    #include "tpl.h"

    int main() {
        tpl_node *tn;
        char *s;
        tn = tpl_map("s", &s);
        tpl_load(tn,TPL_FILE,"string.tpl");
        tpl_unpack(tn,0);   /* allocates space, points s to "hello, world!" */
        printf("unpacked %s\n", s);
        free(s);            /* our responsibility to free s */
        tpl_free(tn);
    }

Note that we had to free(s). It's easy to forget since tpl allocated it for us, but anytime you unpack an s format character, you are taking responsibility for ultimately freeing it.

Note
char* vs char[ ]
The s format character is only for use with char* types. In the example above, s is a char*. If it had been a char s[14], we would use the format characters c# to pack or unpack it, as a fixed-length character array. (This unpacks the characters "in-place", instead of into a dynamically allocated buffer). Also, a fixed-length buffer described by c# need not be NUL-terminated.

Binary buffers

Packing an arbitrary-length binary buffer (tpl format character B) makes use of the tpl_bin structure. You must declare this structure and populate it with the address and length of the binary buffer to be packed.

Example: Packing a binary buffer
    #include "tpl.h"
    #include <sys/time.h>

    int main() {
        tpl_node *tn;
        tpl_bin tb;

        /* we'll use a timeval as our guinea pig */
        struct timeval tv;
        gettimeofday(&tv,NULL);

        tn = tpl_map( "B", &tb );
        tb.sz = sizeof(struct timeval);  /* size of buffer to pack */
        tb.addr = &tv;                   /* address of buffer to pack */
        tpl_pack( tn, 0 );
        tpl_dump(tn, TPL_FILE, "bin.tpl");
        tpl_free(tn);
    }

When you unpack a binary buffer, tpl will automatically allocate it, and will populate your tpl_bin structure with its address and length. You are responsible for eventually freeing the buffer.

Example: Unpacking a binary buffer
    #include "tpl.h"

    int main() {
        tpl_node *tn;
        tpl_bin tb;

        tn = tpl_map( "B", &tb );
        tpl_load( tn, TPL_FILE, "bin.tpl" );
        tpl_unpack( tn, 0 );
        tpl_free(tn);

        printf("binary buffer of length %d at address %p\n", tb.sz, tb.addr);
        free(tb.addr);  /* our responsibility to free it */
    }

Structures

There is a standard way and a shortcut way to pack and unpack structures. The standard way is to ignore the fact that it's a structure, and to just map variables to the individual structure fields. The shortcut way uses the S(…) format specification, and passes only the structure's address rather than the address of each individual field.

struct my_data {
    char c;
    int i;
} md;
tn = tpl_map("ci", &md.c, &md.i);    /* standard way */
tn = tpl_map("S(ci)", &md);          /* shortcut way */

When using the S(…) shortcut, the only allowed characters inside the parentheses are iucsfIU#. The tpl software calculates the address of each field in the structure based on the sizes of the preceding fields and the padding required for each field to be aligned. When using S(…), omit the normally-required arguments for the parenthesized format characters. The one exception is for fixed-length arrays; when # occurs inside as S(…), pass the length argument as usual— e.g., tpl_map("S(f#i)", &mf, 10);

Wildcard structure unpacking

In the special case of unpacking a tpl whose full format string is S(…), it is permissible to omit the contents inside the parentheses, and use * instead. In this special mode, the actual format string will be extracted from the tpl that is being unpacked, including the lengths of any fixed-length arrays.

tn = tpl_map( "S(*)", &mf);

A tpl created with the special S(*) format string can only be used for unpacking. In most cases, wildcard unpacking doesn't help you much— by saving a few keystrokes (omitting the full format string), you're sacrificing type-safety. But there are a few scenarios in which it might be favorable:

  1. when the packing and unpacking programs share the same structure definition, (say, in a common #include) a change to the structure would only have to be reflected in a change to the format string in one place (the packing code)

  2. unpacking C "unions" where the packed tpl may have one of several format strings, will cause the unpacking code to automatically select the right one

  3. as a variation of the preceding "union" scenario, a message-passing program could accomodate different "versions" of a message. For example, imagine a client-server scenario, where older clients send messages (tpl images) with the format S(i), but newer clients send S(if). The server uses S(*) to unpack either kind of message into a structure that can accomodate either.

API

tpl_map

The only way to create a tpl is to call tpl_map(). The first argument is the format string. This is followed by a list of arguments as required by the particular characters in the format string. E.g,

tpl_node *tn;
int i;
tn = tpl_map( "A(i)", &i );

The function creates a mapping between the items in the format string and the C program variables whose addresses are given. Later, the C variables will be read or written as the tpl is packed or unpacked.

This function returns a tpl_node* on success, or NULL on failure.

tpl_pack

The function tpl_pack() packs data into a tpl. The arguments to tpl_pack() are a tpl_node* and an index number.

tn = tpl_map("A(i)A(c)", &i, &c);
for(i=0; i<10; i++) tpl_pack(tn, 1);    /* pack 0-9 into index 1 */
for(c='a; c<='z'; c++) tpl_pack(tn, 2); /* pack a-z into index 2 */

Index number 0

It is necessary to pack index number 0 only if the format string contains characters that are not inside an A(…), such as the i in the format string iA(c).

Variable-length arrays

Adding elements to an array

To add elements to a variable-length array, call tpl_pack() repeatedly. Each call adds another element to the array.

Zero-length arrays are ok

It's perfectly acceptable to pack nothing into a variable-length array, resulting in a zero-length array.

Packing nested arrays

In a format string containing a nested, variable-length array, such as A(A(s)), the inner, child array should be packed prior to the parent array.

When you pack a parent array, a "snapshot" of the current child array is placed into the parent's new element. Packing a parent array also empties the child array. This way, you can pack new data into the child, then pack the parent again. This creates distinct parent elements which each contain distinct child arrays.

Tip
When dealing with nested arrays like A(A(i)), pack them from the "inside out" (child first), but unpack them from the "outside in" (parent first).

The example below creates a tpl having the format string A(A(c)).

Example: Packing nested arrays
#include "tpl.h"

int main() {
    char c;
    tpl_node *tn;

    tn = tpl_map("A(A(c))", &c);

    for(c='a'; c<'c'; c++) tpl_pack(tn,2);  /* pack child (twice) */
    tpl_pack(tn, 1);                        /* pack parent */

    for(c='1'; c<'4'; c++) tpl_pack(tn,2);  /* pack child (three times) */
    tpl_pack(tn, 1);                        /* pack parent */

    tpl_dump(tn, TPL_FILE, "test40.tpl");
    tpl_free(tn);
}

This creates a nested array in which the parent has two elements: the first element is the two-element nested array a, b; and the second element is the three-element nested array 1, 2, 3. The nested unpacking example shows how this tpl is unpacked.

tpl_dump

After packing a tpl, tpl_dump() is used to write the tpl image to a file, memory buffer or file descriptor. There are three corresponding modes:

tpl_dump( tn, TPL_FILE, "file.tpl" );  /* write to file */
tpl_dump( tn, TPL_MEM, &addr, &len );  /* write to memory  */
tpl_dump( tn, TPL_FD, 2);              /* write to file descriptor */

The first argument is the tpl_node* and the second is one of these constants:

TPL_FILE

Writes the tpl to a file whose name is given in the following argument. The file is created with permissions 664 (rw-rw-r—) unless further restricted by the process umask.

TPL_MEM

Writes the tpl to a memory buffer. The following two arguments must be a void** and a size_t*. The function will allocate a buffer and store its address and length into these locations. The caller is responsible to free() the buffer when done using it.

TPL_FD

Writes the tpl to the file descriptor given in the following argument. The descriptor can be either blocking or non-blocking, but will busy-loop if non-blocking and the contents cannot be written immediately.

The return value is 0 on success, or -1 on error.

The tpl_dump() function does not free the tpl. Use tpl_free() to release the tpl's resources when done.

Tip
Back-to-back tpl images require no delimiter
If you want to store a series of tpl images, or transmit sequential tpl images over a socket (perhaps as messages to another program), you can simply dump them sequentially without needing to add any delimiter for the individual tpl images. Tpl images are internally delimited, so tpl_load will read just one at a time even if multiple images are contiguous.

tpl_load

This API function reads a previously-dumped tpl image from a file, memory buffer or file descriptor, and prepares it for subsequent unpacking. The format string specified in the preceding call to tpl_map() will be cross-checked for equality with the format string stored in the tpl image.

tn = tpl_map( "A(i)", &i );
tpl_load( tn, TPL_FILE, "demo.tpl" );

The first argument to tpl_load() is the tpl_node*. The second argument is one of the constants:

TPL_FILE

Loads the tpl from the file named in the following argument.

TPL_MEM

Loads the tpl from a memory buffer. The following two arguments must be a void* and a size_t, specifying the buffer address and size, respectively. The caller must not free the memory buffer until after freeing the tpl with tpl_free(). (If the caller wishes to hand over responsibility for freeing the memory buffer, so that it's automatically freed along with the tpl when tpl_free() is called, the constant TPL_UFREE may be bitwise-OR'd with TPL_MEM to achieve this.)

TPL_FD

Loads the tpl from the file descriptor given in the following argument. The descriptor is read until one complete tpl image is loaded; no bytes past the end of the tpl image will be read. The descriptor can be either blocking or non-blocking, but will busy-loop if non-blocking and the contents cannot be read immediately.

During loading, the tpl image will be extensively checked for internal validity.

This function returns 0 on success or -1 on error.

tpl_unpack

The tpl_unpack() function unpacks data from the tpl. When data is unpacked, it is copied to the C program variables originally specified in tpl_map(). The first argument to tpl_unpack is the tpl_node* for the tpl and the second argument is an index number.

tn = tpl_map( "A(i)A(c)", &i, &c );
tpl_load( tn, TPL_FILE, "nested.tpl" );
while (tpl_unpack( tn, 1) > 0) printf("i is %d\n", i); /* unpack index 1 */
while (tpl_unpack( tn, 2) > 0) printf("c is %c\n", c); /* unpack index 2 */

Index number 0

It is necessary to unpack index number 0 only if the format string contains characters that are not inside an A(…), such as the i in the format string iA(c).

Variable-length arrays

Unpacking elements from an array

For variable-length arrays, each call to tpl_unpack() unpacks another element. The return value can be used to tell when you're done: if it's positive, an element was unpacked; if it's 0, nothing was unpacked because there are no more elements. A negative retun value indicates an error (e.g. invalid index number). In this document, we usually unpack variable-length arrays using a while loop:

while( tpl_unpack( tn, 1 ) > 0 ) {
    /* got another element */
}
Array length

When unpacking a variable-length array, it may be convenient to know ahead of time how many elements will need to be unpacked. You can use tpl_Alen() to get this number.

Unpacking nested arrays

In a format string containing a nested variable-length array such as A(A(s)), unpack the outer, parent array before unpacking the child array.

When you unpack a parent array, it prepares the child array for unpacking. After unpacking the elements of the child array, the program can repeat the process by unpacking another parent element, then the child elements, and so on. The example below unpacks a tpl having the format string A(A(c)).

Example: Unpacking nested arrays
#include "tpl.h"
#include <stdio.h>

int main() {
    char c;
    tpl_node *tn;

    tn = tpl_map("A(A(c))", &c);

    tpl_load(tn, TPL_FILE, "test40.tpl");
    while (tpl_unpack(tn,1) > 0) {
        while (tpl_unpack(tn,2) > 0) printf("%c ",c);
        printf("\n");
    }
    tpl_free(tn);
}

The file test40.tpl is from the nested packing example. When run, this program prints:

a b
1 2 3

tpl_free

The final step for any tpl is to release it using tpl_free(). Its only argument is the the tpl_node* to free.

tpl_free( tn );

This function does not return a value (it is void).

tpl_Alen

This function takes a tpl_node* and an index number and returns an int specifying the number of elements in the variable-length array.

num_elements = tpl_Alen(tn, index);

This is mainly useful for programs that unpack data and need to know ahead of time the number of elements that will need to be unpacked. (It returns the current number of elements; it will decrease as elements are unpacked).

tpl_peek

This function peeks into a file or a memory buffer containing a tpl image and and returns a copy of its format string, which the caller must free. There are two forms:

fmt = tpl_peek(TPL_FILE, "file.tpl");
fmt = tpl_peek(TPL_MEM, addr, sz);

It returns NULL on error (such as a non-existent file, or an invalid tpl image). A program that processes tpl images having varying format strings could use this function to switch to an appropriate unpacking routine.

tpl_hook

A program can customize tpl's memory management and error reporting by setting built-in "hooks" to non-default values. A global structure called tpl_hook encapsulates the hooks. A program can reconfigure any hook by specifying an alternative function whose prototype matches the default. For example:

#include "tpl.h"
extern tpl_hook_t tpl_hook;
int main() {
    tpl_hook.oops = printf;
    ...
}
Table: Configurable hooks
Hook Description Default
tpl_hook.oops log error messages tpl_oops (printf prototype)
tpl_hook.malloc allocate memory malloc
tpl_hook.realloc reallocate memory realloc
tpl_hook.free free memory free
tpl_hook.fatal log fatal message and exit tpl_fatal (printf prototype)
tpl_hook.gather_max tpl_gather max image size 0 (unlimited)

By default the oops and fatal hooks point to built-in functions which write the error message to stderr. The fatal hook must not return; it must exit.

tpl_gather

The prototype for this function is:

int tpl_gather( int mode, ...);

The mode argument is one of three constants listed below, which must be followed by the mode-specific required arguments:

TPL_GATHER_BLOCKING,    int fd, void **img, size_t *sz
TPL_GATHER_NONBLOCKING, int fd, tpl_gather_t **gs, tpl_gather_cb *cb, void *data
TPL_GATHER_MEM,         void *addr, size_t sz, tpl_gather_t **gs, tpl_gather_cb *cb, void *data
Note
tpl_hook.gather_max
All modes honor tpl_hook.gather_max, specifying the maximum byte size for a tpl image to be gathered (the default is unlimited, signified by 0). If a source attempts to send a tpl image larger than this maximum, whatever partial image has been read will be discarded, and no further reading will take place; in this case tpl_gather will return a negative (error) value to inform the caller that it should stop gathering from this source, and close the originating file descriptor if there is one. (The whole idea is to prevent untrusted sources from sending extremely large tpl images which would consume too much memory.)

TPL_GATHER_BLOCKING

In this mode, tpl_gather blocks while reading file descriptor fd until one complete tpl image is read. No bytes past the end of the tpl image will be read. The address of the buffer containing the image is returned in img and its size is placed in sz. The caller is responsible for eventually freeing the buffer. The function returns 1 on success, 0 on end-of-file, or a negative number on error.

TPL_GATHER_NONBLOCKING

This mode is for non-blocking, event-driven programs that implement their own file descriptor readability testing using select() or the like. In this mode, tpl images are gathered in chunks as data becomes readable. Whenever a full tpl image has been gathered, it invokes a caller-specified callback to do something with the image. The arguments are the file descriptor fd which the caller has determined to be readable and which must be in non-blocking mode, a pointer to a file-descriptor-specific handle which caller has declared (explained below); a callback to invoke when a tpl image has been read; and an opaque pointer that will passed to the callback.

For each file descriptor on which tpl_gather will be used, the caller must declare a tpl_gather_t* and initialize it to NULL. Thereafter it will be used internally by tpl_gather whenever data is readable on the descriptor.

The callback will only be invoked whenever tpl_gather() has accumulated one complete tpl image. It must have this prototype:

int (tpl_gather_cb)(void *img, size_t sz, void *data);

The callback can do anything with the tpl image but it must not free it. It can be copied if it needs to survive past the callback's return. The callback should return 0 under normal circumstances, or a negative number to abort; that is, returning a negative number causes tpl_gather itself to discard any remaining full or partial tpl images that have been read, and to return a negative number (-4 in particular) to signal its caller to close the file descriptor.

The return value of tpl_gather() is negative if an error occured or 0 if a normal EOF was encountered— both cases require that the caller close the file descriptor (and stop monitoring it for readability, obviously). If the return value is positive, the function succeeded in gathering whatever data was currently readable, which may have been a partial tpl image, or one or more complete images.

Typical Usage

The program will have established a file descriptor in non-blocking mode and be monitoring it for readability, using select(). Whenever it's readable, the program calls tpl_gather(). In skeletal terms:

tpl_gather_t *gt=NULL;
int rc;
void fd_is_readable(int fd) {
  rc = tpl_gather( TPL_GATHER_NONBLOCKING, fd, &gt, callback, NULL );
  if (rc <= 0) {
      close(fd);               /* got eof or fatal */
      stop_watching_fd(fd);
  }
}
int callback( void *img, size_t sz, void *data ) {
  printf("got a tpl image\n"); /* do something with img. do not free it. */
  return 0;                    /* normal (no error) */
}

TPL_GATHER_MEM

This mode is identical to TPL_GATHER_NONBLOCKING except that it gathers from a memory buffer instead of from a file descriptor. In other words, if some other layer of code— say, a decryption function (that is decrypting fixed-size blocks) produces tpl fragments one-by-one, this mode can be used to reconstitute the tpl images and invoke the callback for each one. Its parameters are the same as for the TPL_GATHER_NONBLOCKING mode except that instead of a file descriptor, it takes a buffer address and size. The return values are also the same as for TPL_GATHER_NONBLOCKING noting of course there is no file descriptor to close on a non-positive return value.