The DA file format specification
| DA | |
|---|---|
| Filename extension: | .da |
| Shebangsat line (optional): | #!/@ -tda |
| Magic: | (see shebangsat) |
| Encoding: | text (UTF-8) |
| Type of format: | static database |
| Container for: | anything |
| Contained by: | compress, gzip, bzip2 |
| Expanded from: | classic name-colon-value |
| Expanded to: | TDA |
| Applied to: | SA |
Introduction
DA is a formalization of the classic name-colon-value format. This is a text-based file format representing information in key-value pairs. It is designed both for human editing by means of a text editor, and to be read and generated programatically.
DA extends the classic format by allowing values to contain multiline and binary data. For a quick idea of what DA looks like, see the example at the end of this document.
Basic structure
If the first byte of the file is a number sign (#), all bytes up to and including the first newline are ignored.
A DA file consists of a sequence of
<name>:<type><value>
where
Name
Lines preceding <name> that consist of nothing but whitespace are ignored.
<name> is an array of bytes of any value except the colon (:) character.
Exception: The backslash (\) acts as an escape character. The backslash itself is ignored and the byte immediately following it is stripped of any special meaning. To include a colon in <name>, precede the colon with a backslash. To include a backslash in <name>, precede the backslash with a backslash.
To have on the first line a <name> that starts with a number sign, precede it with a backslash.
By convention, <name> fields consist of printable UTF-8 characters. But readers should be prepared for any byte values.
By convention, the <name> “#” (immediately followed by the terminating colon) denotes a comment entry whose value is unstructured meta information intended for human eyes only.
Type
<type> is a variable length field that specifies the encoding of <value>.
Value: plain
If the first byte of <type> is a space (ASCII 32), then length of <type> is 1 byte.
<value> consists of all bytes verbatim until a newline or EOF is encountered. If <value> ends in a newline (as opposed to EOF), the newline character is included in <value>.
Value: C string
If the first byte of <type> is a double quote ("), then length of <type> is 1 byte.
<value> is interpreted as a C string literal ending on the first unescaped double quote.
The following escape sequences are supported:
| \n | newline |
| \t | horizontal tab |
| \v | vertical tab |
| \b | backspace |
| \r | carriage return |
| \f | form feed |
| \a | bell |
| \\ | backslash |
| \" | double quote |
| \000 | octal value |
| \xhh | hex value |
| \ | line continuation |
Any whitespace after the ending double quote and up to the next newline (or EOF) is ignored.
Value: hexstring
If the first byte of <type> is a less-than sign (<) and the byte immediately following it is not a less-than sign, then length of <type> is 1 byte.
<value> is interpreted as an ASCII hexstring ending on the first greater-than sign (>).
Within <value>, bytes not matching the regular expression [0-9A-Fa-f] are ignored.
Any whitespace after the ending greater-than sign and up to the next newline (or EOF) is ignored.
Value: here document
If the first byte of <type> is a less-than sign (<) and the byte immediately following it is also a less-than sign, then <type> extends up to the next newline.
The subsequent bytes of <type>, following the initial “<<”, and before the first whitespace character, define an delimiting identifier.
<value> consists of all bytes verbatim until the the first line consisting of nothing but delimiting identifier. To be recognized as a delimiter, the identifier must start from the beginning of the line and must be immediately followed by newline. The delimiting line is not included in <value>.
A missing delimiter, causing <value> to extend to the end of the file, is not considered an error. By convention, when there is no collision with <value>, a delimiting identifier of “EOF” is used to communicate this intention, but any other string is legal too.
Value: error
A <type> field starting with any other byte value (than specified above) is an error. Readers should terminate processing and issue an error message.
Example file
#!/@ -tda
#: Example DA file
#: 2008-03-20 / ttl
#: plain (classic) name-value entries
title: Unix Programming Environment
author: Brian W. Kernighan, Rob Pike
#: plain entries with hierachy in names
price/list: $52.00
price/Amazon.com: $32.76
price/Amazon.co.uk: £30.99
#: C string
average-customer-review:"5 star: 25\n4 star: 6\n2 star: 2\n 2 star: 1\n"
#: binary data encoded as hex
image:<457676664e376987ebfed345de76987ed457645763458876345ededca3
aadd3387ebfed345de76987ed457645763458876345ededca3239487239487234>
#: multiline entry
back-cover-text:<<EOD
Designed for first-time and experienced users, this book describes
the UNIX® programming environment and philosophy in detail.
Readers will gain an understanding not only of how to use the system,
its components, and the programs, but also how these fit into the
total environment.
EOD