Decoding LZW-encoded data is simpler and easier to be done efficiently because dictionary searches are not needed. Thus the decoder does not need the extra index values in the string elements as the encoder did for the fast search.
The encoding step wrote only index values to the encoded result. When decoding, one of the crucial steps is, naturally, to recreate the original string from the dictionary using such an index value.
The index value points to a string in the dictionary. However, as we remember, the "string" in the dictionary only consists of the last byte of the string and an index to the prefix for that string. Likewise the "string" at that prefix index only contains the last byte and an index to the rest of the prefix.
Thus it becomes clear that to reconstruct a string from an index we need to traverse the dictionary strings backwards, following each successive prefix index until this prefix index is the empty index.
We can build the reversed string into a temporary array and then output the byte values by traversing the array backwards. This temporary array can be reused for all the string reconstructions in order to avoid repeated allocations.
While this temporary array requires memory, we have to remember that the dictionary itself now takes less memory, and thus the total memory consumption is not larger than when encoding (in fact, it's less).
Again, we assume that all 256 possible byte values are used. If the encoding used less, then the decoding should do the same. (How many values were used when encoding can be calculated from the byte value map created by the encoder.)
If the encoder uses a freely specifiable maximum bitsize for the dictionary size then the encoder should write that value to the output so that the decoder can read it and act accordingly.
Note that "first byte of the string at <index>
" in
the pseudocode refers to the decoded string. In other words, the last byte
found when recreating the string (which becomes the first byte when writing
it to the output).
<index>
← first index value in the input.
<index>
to the result.
<old>
← <index>
<index>
← next index value in the input.
<index>
exist in the dictionary?
<index>
to the
result.
B
← first byte of the string at
<index>
<old>B
to the dictionary.
B
← first byte of the string at
<old>
<old>B
to the dictionary.
<old>B
to the output.
<old>
← <index>
Decoding with dynamic bit sizes is very similar to how it was done in the encoding step. Notice, however, that there's one twist:
In the encoding step the maximum dictionary index value after which
the bitsize had to be incremented was calculated as the maximum value
which could be represented with that many bits (ie.
"(1 << bitsize) - 1
").
However, in the decoding stage this value must be one less (that is,
"(1 << bitsize) - 2
") for
the decoding to work properly. In other words, when the dictionary reaches
this size, the current bitsize must be incremented and the new maximum
size value calculated again with that formula.
Copyright 2007: Juha Nieminen