# Difference between revisions of "Projects/Liberty/Appendix 3"

Purple-bobby (talk | contribs) |
Purple-bobby (talk | contribs) (→Huffman Style Processing: omission from Sliding-Window Pointer decode) |
||

Line 157: | Line 157: | ||

Decode Sliding-window Pointer | Decode Sliding-window Pointer | ||

Let pointer := Huff-Decode( P-tree ) | Let pointer := Huff-Decode( P-tree ) | ||

+ | If pointer <> 0 Then | ||

+ | Let pointer := pointer - 1 | ||

+ | Let pointer := 2^pointer + Get-Bits( pointer ) | ||

+ | End if | ||

Return pointer | Return pointer | ||

## Latest revision as of 02:50, 10 January 2013

## Appendix 3 - HUS/VIP Decompression

In the .hus and .vip format files, the attribute, X displacement, and Y displacement of each stitch are held compressed using a version of the ARJ compression. An attribute has the values: 0x80 (stitch), 0x81 (jump), 0x84 (thread stop), and 0x90 (end stop); that is bit 7 is always set, bit 0 indicates jump without stitch, bit 2 indicates thread change stop, and bit 4 end of pattern stop; bits 1, 3, 5, 6 appear unused.

The compression uses a sliding window of past uncompressed values to source repeated strings of bytes, instead of outputting the string of bytes, a pointer (and length) back to an earlier version of the string is (coded and) output. A feature of this is, a self repeating string can be copied before it is fully output, for example, if 'ABC' has just been output and there is now a string 'ABCABCABC', the length would be 9, but the pointer would only point back 3 bytes (to the 'A') as the 'ABC' is copied to the output, they become available to copy again.

When there are no bytes to copy in the recent output, the new byte has to be output as is, a length of zero (or less than 3) indicates verbatim byte and no pointer. This is encoded as values 0 to 255 indicate verbatim byte and no pointer, values 256 or greater indicate length, but the true length is the value minus 256 plus 3.

### How to Decompress

Allocate an output buffer of enough bytes to hold the uncompressed data, access is needed to some of the past decoded bytes. Either fetch bytes and add them to the output, or fetch length and pointer, then copy past bytes to the output. Repeat until output buffer is full.

Pseudo-code of Sliding Window Decompression:

Let uncompressed := new vector[0 to uncompressed-size - 1] of byte -- allocate output buffer Let count := 0 -- the number of bytes output, index to output buffer While count < uncompressed-size Do Let code := Fetch-Byte-or-Length If code < 256 Then -- is byte Let uncompressed[count] := code Let count := count + 1 Else Let length := code - 256 + 3 -- min length is 3 Let pointer := Fetch-Pointer For length Times Do If count < uncompressed-size Then -- not buffer overrun If count > pointer Then -- not buffer under-run Let uncompressed[count] := uncompressed[count - pointer - 1] Else Let uncompressed[count] := 0 End if End if Let count := count + 1 End for End if End while

### Huffman Coding of Bytes or Lengths and Pointers

The sliding-window byte/length and pointer codes are packaged, the first 16 bits gives the package length in sliding-window codes.

To reduce the bit length of the byte or length symbols and the pointers, they are Huffman coded to reduce the length of the most frequent values. The Huffman tree is passed to the compressed output by sending the length of the Huffman code for each symbol in the symbol-set, run of zero encoding is used to fill in the length table of the many unused codes (of zero length). For the byte/length symbol Huffman code lengths, these are themselves Huffman coded.

The number of Huffman code byte/length symbol lengths is given in the next 5 bits (value 0 to 31, but actually limited to 19). If the number is zero then there is only one symbol which is contained in the next 5 bits and its code length is zero. After the third symbol, the next 2 bits provide a 0 to 3 length run of zeros. After the 'number' of symbols the rest of the lengths are set to zero.

Each symbol is processed thus, the next 3 bits give a value 0 to 7; the values 0 to 6 are used directly, but if the value is 7, then a series of next bits, up to 13. The series ends after the first 0 value bit or after the 13th bit, each 1 value bit increments the (length) value (giving a range 7 to 20); I suspect any value over 18 is an error, so a maximum of eleven 1 bits followed by a twelfth bit of 0. The value 0 to 18 is used to set the length of the next code.

A (virtual) Huffman tree is built using these 19 ordered length and symbol values. The next variable length codes can be used to uniquely identify a value.

The next 9 bits give the number byte/length symbols defined (and is typically 511 since the end code is coded as 510 - a bug?), again a 'number' of zero implies the next 9 bits give the only code used and its Huffman code does not need to be transmitted. For 'number' of times, a variable length Huffman code appear in the next bits and gives a value 0 to 18.

There are up to 19 symbols used to compress the byte/length symbol lengths. The symbol values 0 to 2 allow for a run of zeros: 0 implies run-length of one, 1 implies 3 plus the value of the next 4 bits, and 2 implies 20 plus the value of the next 9 bits. The remaining 16 values give the Huffman code length 1 to 16 bits (that is symbol value minus 2).

A (virtual) Huffman tree is built using these 511 ordered length and symbol values.

There are 511 byte/length symbols, byte ranges from 0 to 255 and usable length ranges from 3 to 256, where a length of 257 marks the end of the compressed data.

The pointer values are Huffman encoded and appear next, similar to above, the next 5 bits gives the number pointer symbols ranging from 0 to 19 (or 15). Again, 0 implies there is only one symbol and it is in the next 5 bits. Otherwise, the next 3 bits give a code length 0 to 6, and 7 again is used to increment length by each 1 bit until a 0 bit. The end of the table is filled with 0.

### Fetch Byte or Length

Use the next variable length code to find the value of the Byte or Length, by descending the Huffman tree.

### Fetch Pointer

Use the next variable length code to find the value of the Pointer, by descending the Huffman tree.

### Huffman Style Processing

Initialize

Let NT = 19 Let NC = 511 Let NP = 15 Let package-remaining := 0 -- symbol/length code-length code-lengths Let T-len := new vector[0 to NT - 1] of (0 to 16) -- symbol/length code-lengths Let C-len := new vector[0 to NC - 1] of (0 to 16) -- pointer code-lengths Let P-len := new vector[0 to NP - 1] of (0 to 16)

Decode Sliding-window Byte/Length

-- prepare Huffman decoding trees If package-remaining = 0 Then Let package-remaining := Get-Bits( 16 ) Let T-tree := Build-T-tree Let C-tree := Build-C-tree( T-tree ) Let P-tree := Build-P-tree End if Let package-remaining := package-remaining - 1 Let byte-length := Huff-Decode( C-tree ) Return byte-length

Decode Sliding-window Pointer

Let pointer := Huff-Decode( P-tree ) If pointer <> 0 Then Let pointer := pointer - 1 Let pointer := 2^pointer + Get-Bits( pointer ) End if Return pointer

Build T tree

T-tree.Delete-All Let n := Get-Bits( 5 ) Let n := Min( n, NT ) If n = 0 Then Let c := Get-Bits( 5 ) T-tree.Append( length := 0, bits := 0, symbol := c ) Else Let i := 0 While i < n Do Let c := Get-Bits( 3 ) If c = 7 Then Let b := Get-Bits( 1 ) While b = 1 Do Let c := c + 1 Let b := Get-Bits( 1 ) End while End if Let T-len[i] := c Let i := i + 1 If i = 3 Then Let c := Get-Bits( 2 ) While c > 0 Do Let T-len[i] := 0 Let i := i + 1 Let c := c - 1 End while End if End while While i < NT Do Let T-len[i] := 0 Let i := i + 1 End while Let b := 0 For len := 1 to 16 Do Let b := b * 2 -- shift a 0 into LSB For c := 0 to NT-1 Do If T-len[c] = len Then T-tree.Append( length := len, bits := b, symbol := c ) Let b := b + 1 End If End for End for End If Return T-tree

Build C tree ( T-tree )

C-tree.Delete-All Let n := Get-Bits( 9 ) Let n := Min( n, NC ) If n = 0 Then Let c := Get-Bits( 9 ) C-tree.Append( length := 0, bits := 0, symbol := c ) Else Let i := 0 While i < n Do Let c := Huff-Decode( T-tree ) If c <= 2 Then If c = 0 Then Let c := 1 Else If c = 1 Then Let c := 3 + Get-Bits( 4 ) Else Let c := 20 + Get-Bits( 9 ) End if While c > 0 Do Let C-len[i] := 0 Let i := i + 1 Let c := c - 1 End while Else Let C-len[i] := c - 2 Let i := i + 1 End if End while While i < NC Do Let C-len[i] := 0 Let i := i + 1 End while Let b := 0 For len := 1 to 16 Do Let b := b * 2 -- shift a 0 into LSB For c := 0 to NC-1 Do If C-len[c] = len Then C-tree.Append( length := len, bits := b, symbol := c ) Let b := b + 1 End If End for End for End If T-tree.Delete-All Return C-tree

Build P tree

P-tree.Delete-All Let n := Get-Bits( 5 ) Let n := Min( n, NP ) If n = 0 Then Let c := Get-Bits( 5 ) P-tree.Append( length := 0, bits := 0, symbol := c ) Else Let i := 0 While i < n Do Let c := Get-Bits( 3 ) If c = 7 Then Let b := Get-Bits( 1 ) While b = 1 Do Let c := c + 1 Let b := Get-Bits( 1 ) End while End if Let P-len[i] := c Let i := i + 1 End while While i < NP Do Let P-len[i] := 0 Let i := i + 1 End while Let b := 0 For len := 1 to 16 Do Let b := b * 2 -- shift a 0 into LSB For c := 0 to NP-1 Do If P-len[c] = len Then P-tree.Append( length := len, bits := b, symbol := c ) Let b := b + 1 End If End for End for End If Return P-tree

Huff Decode ( tree )

Let sym := 0 Let b := 0 -- bits Let i := 0 -- current tree entry Let n := tree.length Let l := 0 -- code length For i := 0 to n - 1 If l < tree[i].length Then Let b := b * 2^(tree[i].length - l) Let b := b + Get-Bits( tree[i].length - l ) Let l := tree[i].length End if If tree[i].length = l And tree[i].bits = b Then Let sym := tree[i].symbol Exit for End if End for Return sym