Stanford EE274: Data Compression I 2023 I Lecture 4 - Huffman Codes

Stanford EE274: Data Compression I 2023 I Lecture 4 - Huffman Codes

Huffman Coding

  • Huffman codes are prefix codes that will be discussed in the lecture.
  • Huffman coding is a greedy algorithm for constructing a prefix code for a given set of probabilities.
  • The algorithm works by repeatedly merging the two nodes with the smallest probabilities until only one node is left.
  • The expected code length of a Huffman code is between the entropy of the source and the entropy plus 1.
  • Huffman coding can be implemented efficiently using a heap or priority queue.
  • Table-based decoding is a low-level technique used in compression algorithms, especially for large volumes of data such as videos and logs.

Entropy and Related Concepts

  • Recap of previous lecture:
    • Kraft's inequality: A necessary and sufficient condition for a prefix code.
    • Entropy: A fundamental quantity for lossless compression, defined as the expected number of bits per symbol.
    • Joint entropy for IID variables: Can be calculated in two steps using independence and identical distribution.
    • K Divergence: A distance measure between two probability distributions, greater than or equal to zero with equality if and only if the distributions are equal.
    • Main theorem of compression: For any prefix code, the expected code length is lower bounded by the entropy.
    • Shannon codes: Not optimal, but can achieve entropy or get arbitrarily close to it when working with blocks.
  • Quiz review:
    • Entropy of a Bernoulli random variable:
    • H(X) = -p log p - (1-p) log (1-p)
    • H(0) = H(1) = 0
    • H(1/2) = 1
    • K Divergence between two Bernoulli random variables:
    • D(p || q) = p log (p/q) + (1-p) log ((1-p)/(1-q))
  • The maximum Kullback-Leibler (KL) Divergence between two probability distributions, (P) and (Q), can be infinite when (P=1) and (Q=0).
  • Shannon codes are not optimal for highly skewed distributions, resulting in an expected code length that is much larger than the entropy.
  • Block coding can improve the efficiency of Shannon codes by grouping symbols together and assigning shorter codewords to more probable pairs.
  • The optimal prefix code for a given distribution can be found by satisfying two conditions:
    • Symbols with higher probabilities should have shorter or equal length codewords compared to symbols with lower probabilities.
    • The two longest codewords should have the same length.

Practical Considerations and Examples

  • The Fibonacci sequence is an adverse serial case for Huffman coding, resulting in long code words and a maximum depth close to the number of symbols.
  • A practical example of Huffman coding is shown using the frequency of letters on a keyboard, where commonly used letters have shorter code words.
  • Huffman coding was used to compress various files, including a book, a Java library, and an encrypted file.
  • The compression rate for the book was 44%, and for the Java library, it was also around 40%.
  • The encrypted file, however, could not be compressed because encryption algorithms typically output uniformly distributed data, making it appear random to compressors.
  • Compressing encrypted data is not effective as encryption destroys the data's structure, resulting in a compression rate of one (no compression).

Next Lecture

  • The next lecture will cover theoretical concepts such as entropy and block coding, exploring their importance and interaction with probability and large laws of numbers.
  • Professor Saki will lead the next lecture, followed by discussions on practical block and stream codes used in state-of-the-art compressors.

Overwhelmed by Endless Content?