The DA file format specification

DA
Filename extension:.da
Shebangsat line (optional):#!/@ -tda
Magic:(see shebangsat)
Encoding:text (UTF-8)
Type of format:static database
Container for:anything
Contained by:compress, gzip, bzip2
Expanded from:classic name-colon-value
Expanded to:TDA
Applied to:SA

Introduction

DA is a formalization of the classic name-colon-value format. This is a text-based file format representing information in key-value pairs. It is designed both for human editing by means of a text editor, and to be read and generated programatically.

DA extends the classic format by allowing values to contain multiline and binary data. For a quick idea of what DA looks like, see the example at the end of this document.

Basic structure

If the first byte of the file is a number sign (#), all bytes up to and including the first newline are ignored.

A DA file consists of a sequence of

	<name>:<type><value>

where

Name

Lines preceding <name> that consist of nothing but whitespace are ignored.

<name> is an array of bytes of any value except the colon (:) character.

Exception: The backslash (\) acts as an escape character. The backslash itself is ignored and the byte immediately following it is stripped of any special meaning. To include a colon in <name>, precede the colon with a backslash. To include a backslash in <name>, precede the backslash with a backslash.

To have on the first line a <name> that starts with a number sign, precede it with a backslash.

By convention, <name> fields consist of printable UTF-8 characters. But readers should be prepared for any byte values.

By convention, the <name>  “#” (immediately followed by the terminating colon) denotes a comment entry whose value is unstructured meta information intended for human eyes only.

Type

<type> is a variable length field that specifies the encoding of <value>.

Value: plain

If the first byte of <type> is a space (ASCII 32), then length of <type> is 1 byte.

<value> consists of all bytes verbatim until a newline or EOF is encountered. If <value> ends in a newline (as opposed to EOF), the newline character is included in <value>.

Value: C string

If the first byte of <type> is a double quote ("), then length of <type> is 1 byte.

<value> is interpreted as a C string literal ending on the first unescaped double quote.

The following escape sequences are supported:

\nnewline
\thorizontal tab
\vvertical tab
\bbackspace
\rcarriage return
\fform feed
\abell
\\backslash
\"double quote
\000octal value
\xhhhex value
\line continuation

Any whitespace after the ending double quote and up to the next newline (or EOF) is ignored.

Value: hexstring

If the first byte of <type> is a less-than sign (<) and the byte immediately following it is not a less-than sign, then length of <type> is 1 byte.

<value> is interpreted as an ASCII hexstring ending on the first greater-than sign (>).

Within <value>, bytes not matching the regular expression [0-9A-Fa-f] are ignored.

Any whitespace after the ending greater-than sign and up to the next newline (or EOF) is ignored.

Value: here document

If the first byte of <type> is a less-than sign (<) and the byte immediately following it is also a less-than sign, then <type> extends up to the next newline.

The subsequent bytes of <type>, following the initial “<<”, and before the first whitespace character, define an delimiting identifier.

<value> consists of all bytes verbatim until the the first line consisting of nothing but delimiting identifier. To be recognized as a delimiter, the identifier must start from the beginning of the line and must be immediately followed by newline. The delimiting line is not included in <value>.

A missing delimiter, causing <value> to extend to the end of the file, is not considered an error. By convention, when there is no collision with <value>, a delimiting identifier of “EOF” is used to communicate this intention, but any other string is legal too.

Value: error

A <type> field starting with any other byte value (than specified above) is an error. Readers should terminate processing and issue an error message.

Example file

    #!/@ -tda
    #: Example DA file
    #: 2008-03-20 / ttl

    #: plain (classic) name-value entries

    title: Unix Programming Environment
    author: Brian W. Kernighan, Rob Pike

    #: plain entries with hierachy in names

    price/list: $52.00
    price/Amazon.com: $32.76
    price/Amazon.co.uk: £30.99

    #: C string

    average-customer-review:"5 star: 25\n4 star: 6\n2 star: 2\n 2 star: 1\n"

    #: binary data encoded as hex

    image:<457676664e376987ebfed345de76987ed457645763458876345ededca3
    aadd3387ebfed345de76987ed457645763458876345ededca3239487239487234>

    #: multiline entry

    back-cover-text:<<EOD
    Designed for first-time and experienced users, this book describes
    the UNIX® programming environment and philosophy in detail.
    Readers will gain an understanding not only of how to use the system,
    its components, and the programs, but also how these fit into the
    total environment.
    EOD