An efficient LZW implementation

5. The basic LZW decoding algorithm

Decoding LZW-encoded data is simpler and easier to be done efficiently because dictionary searches are not needed. Thus the decoder does not need the extra index values in the string elements as the encoder did for the fast search.

Recreating a string from an index

The encoding step wrote only index values to the encoded result. When decoding, one of the crucial steps is, naturally, to recreate the original string from the dictionary using such an index value.

The index value points to a string in the dictionary. However, as we remember, the "string" in the dictionary only consists of the last byte of the string and an index to the prefix for that string. Likewise the "string" at that prefix index only contains the last byte and an index to the rest of the prefix.

Thus it becomes clear that to reconstruct a string from an index we need to traverse the dictionary strings backwards, following each successive prefix index until this prefix index is the empty index.

We can build the reversed string into a temporary array and then output the byte values by traversing the array backwards. This temporary array can be reused for all the string reconstructions in order to avoid repeated allocations.

While this temporary array requires memory, we have to remember that the dictionary itself now takes less memory, and thus the total memory consumption is not larger than when encoding (in fact, it's less).

The basic decoding algorithm

Again, we assume that all 256 possible byte values are used. If the encoding used less, then the decoding should do the same. (How many values were used when encoding can be calculated from the byte value map created by the encoder.)

If the encoder uses a freely specifiable maximum bitsize for the dictionary size then the encoder should write that value to the output so that the decoder can read it and act accordingly.

Note that "first byte of the string at <index>" in the pseudocode refers to the decoded string. In other words, the last byte found when recreating the string (which becomes the first byte when writing it to the output).

Initialize the dictionary (with the first 256 entries).
<index> ← first index value in the input.
Write the string at <index> to the result.
<old> ← <index>
<index> ← next index value in the input.
Does <index> exist in the dictionary?
- Yes:
  1. Write the string at <index> to the result.
  2. B ← first byte of the string at <index>
  3. Add <old>B to the dictionary.
- No:
  1. B ← first byte of the string at <old>
  2. Add <old>B to the dictionary.
  3. Write the string for <old>B to the output.
<old> ← <index>
If there are indices left in the input, jump to step 5.

Decoding with dynamic bit sizes

Decoding with dynamic bit sizes is very similar to how it was done in the encoding step. Notice, however, that there's one twist:

In the encoding step the maximum dictionary index value after which the bitsize had to be incremented was calculated as the maximum value which could be represented with that many bits (ie. "(1 << bitsize) - 1").

However, in the decoding stage this value must be one less (that is, "(1 << bitsize) - 2") for the decoding to work properly. In other words, when the dictionary reaches this size, the current bitsize must be incremented and the new maximum size value calculated again with that formula.

Part 6: Making a gif codec

Return to the main page.