April 6, 2014

Proposal for FEN-like notations for tafl-derived games: TSN and TEN

Having taken an interest in tafl (table) games, I am interested in how to concisely represent the state of a game in a format which can be easily consumed by humans and computers. The Forsyth-Edwards Notation (FEN) for chess captures the initial board state (starting positions) as:
rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1
Inspired by FEN, I propose the "Tafl Standard Notation" (TSN) for all tafl-derived games, which captures the essential information in a simple format. Many tafl games have no documented rule sets, so this notation is designed to be flexible enough to handle locale-specific customizations such as size and initial board positions. A standard-formatted Ard Ri (view) initial board state is:
7 8 2aaa2/3a3/a1ddd1a/aadkdaa/a1ddd1a/3a3/2aaa2 a 1 0
This form contains six fields using UTF-8 encoding, each separated by a single space (ASCII #32):
  1. The board size (7) — Since all tafl boards are square, the field will be a single integer representing the length of each side (or, alternately, the height and width) of the board. The (mathematical) square of this number represents the total number of spaces/positions on the board. This must be a positive integer and is provided first so parsers can easily validate the row descriptions which follow.
  2. The number of defenders at the start of the game (8) — Tafl games are asymmetric in nature and the attackers always outnumber the defenders two to one. This must be a positive integer and must be evenly divisible by four, since the defenders will surround the king on all four sides and each "quadrant" of the board lays pieces out in an equivalent pattern. A value of 8 indicates there are sixteen attackers. This value can be used to derive how many pieces from each side have been captured.
  3. The piece positions (2aaa2/3a3/a1ddd1a/aadkdaa/a1ddd1a/3a3/2aaa2) — This field itself contains sub-fields, separated by a forward slash (/, ASCII #47). Each sub-field describes one horizontal row (rank), ordered from bottom to top. Characters in each sub-field describe one or more the vertical column (file), ordered from left to right. In the Ard Ri example, the first character of the first sub-field describes algebraic chess notation's a1 position. Each row description is comprised of at least one of:
    • A lower-case a (ASCII #97) — An attacker piece occupying a single square.
    • A lower-case d (ASCII #100) — A defender piece occupying a single square.
    • A lower-case k (ASCII #107) — The king piece occupying a single square.
    • A single digit (1 through 9, ASCII #49 through #57) or an upper-case letter of the English alphabet (A through Z, ASCII #65 through 90) — The number of empty squares until a piece or the end of the row, in "packed" form using base/radix 36. 0 (ASCII #48) is not allowed as it does not describe at least one square. For example, "1" represents a single empty square, "9" represents nine empty squares, "A" represents ten empty squares, and "Z" represents thirty-five empty squares. The form "111" is allowed, but "3" is preferred.
    Only one king piece may be defined. The number of squares described by each field (row) must equal the width of the board. The number of rows must equal the height of the board.
  4. The active side (a) — This side which will move based on the currently positioned pieces. Allowed values are a (ASCII #97) for the attackers, and d (ASCII #100) for the defenders.
  5. The current ply (1) — A positive integer representing the number of plies (or half-moves) since the last capture. The first (initial) ply is always "1" (ASCII #49). The current turn can be computed from the current ply by dividing by the number of sides (two) and rounding up to the nearest integer. Under some game rules, if a specific number of plies is reached without a capture, the game ends as a draw.
  6. The number of plies since the last capture (0) — An non-negative integer which starts at "0" for the initial move, and is incremented upon each ply. Upon capture, the value is reset to "0" (ASCII #48) for the next ply.
Examples for other initial board states:
  • Alea Evangelii
    19 24 2a2a7a2a2/J/a4a7a4a/7a1a1a7/6a1d1d1a6/a1a2a7a2a1a/4a4d4a4/3a4d1d4a3/4d2d1d1d2d4/3a2d1dkd1d2a3/4d2d1d1d2d4/3a4d1d4a3/4a4d4a4/a1a2a7a2a1a/6a1d1d1a6/7a1a1a7/a4a7a4a/J/2a2a7a2a2 d 1 0
  • Hnefatafl
    11 12 3aaaaa3/5a5/B/a4d4a/a3ddd3a/aa1ddkdd1aa/a3ddd3a/a4d4a/B/5a5/3aaaaa3 a 1 0
For the standard format, any trailing spaces or fields can be ignored. To allow extension of the standard format to include additional data (such as custom rule sets), a seventh field, separated from the preceding field by a space (ASCII #32) may be provided in "Taft Extended Notation" or "Taft Extensible Notation" (TEN). The field uses UTF-8 encoding. The simplest extended form for the Ard Ri example provided above is:
7 8 2aaa2/3a3/a1ddd1a/aadkdaa/a1ddd1a/3a3/2aaa2 a 1 0 -
This is the only defined variant of TEN, whose eighth field is:
  • A dash (-, ASCII #45) indicating no custom data is available — This would be provided for compatibility with parsers which expect eight fields.
To allow easy parsing and versioning of user-specific data contained within the eighth field, an additional variant is recommended:
7 8 2aaa2/3a3/a1ddd1a/aadkdaa/a1ddd1a/3a3/2aaa2 a 1 0 example:1:escape=corners
In this variant, the field is defined as:
  • A prefixed set of custom data (example:1:escape=corners) — The prefix should include an identifier (example), a version (1), and the custom data (escape=corners) separated colons by (:, ASCII #58).
Revised on August 10, 2014.