Stanford EE274: Data Compression I 2023 I Lecture 8 - Beyond IID distributions: Conditional entropy

()
Stanford EE274: Data Compression I 2023 I Lecture 8 - Beyond IID distributions: Conditional entropy

Entropy Coding

  • Huffman coding is fast and optimal for single symbols or small blocks but has exponential complexity for larger blocks.
  • Arithmetic coding achieves entropy efficiently in linear time but involves expensive division operations.
  • ANS (Asymmetric Numeral Systems) variants, RS and TS, are faster than arithmetic coding and better than Huffman coding.
  • RS has fast decoding compared to arithmetic coding, making it suitable for applications where decoding time is crucial.
  • When the application requires the best compression ratio and is willing to sacrifice encoding or decoding speeds, arithmetic coding is the best choice.
  • If extremely fast encoding and decoding are required, Huffman coding is the best option.
  • For adaptive decoding, arithmetic coding is the best choice.
  • To achieve close to optimal compression with good speed, arithmetic coding or range encoding can be used.
  • Range encoding exploits parallel processing on modern processors for faster encoding and decoding.

Non-IID Data

  • Entropy coding assumes IID (independent and identically distributed) data, but real-life data is often non-IID.
  • Non-IID data has dependencies between symbols, such as the probability of a symbol being influenced by previous symbols.
  • Concatenating files with different distributions results in a mixed entropy that is not optimal for either file.
  • Markov chains are stochastic processes where the probability of the next symbol depends only on the previous symbol.
  • A stationary stochastic process is a process where the probabilities do not change over time.
  • IID (independent and identically distributed) processes are a special case of stationary processes where all the symbols are independent and have the same distribution.
  • A Markov process is a stochastic process where the probability of the next symbol depends only on the previous symbol.
  • Marginalization is the process of summing over all the values of a random variable to get the probability distribution of another random variable.
  • A Markov chain is a stochastic process where the probability of the next state depends only on the current state, not on any previous states.
  • A stationary Markov chain is one where the probability distribution of the states does not change over time.
  • A first-order Markov chain is a Markov chain where the probability of the next state depends only on the current state, not on any previous states.
  • A K-order Markov chain is a Markov chain where the probability of the next state depends only on the previous K states.
  • A stationary process is one where the mean does not change over time.
  • A non-stationary process is one where the mean changes over time.
  • To convert a non-stationary process into a stationary process, one can take the difference between consecutive values.
  • Conditional entropy of U given V is defined as the average uncertainty in U given that V is known.
  • Knowing more about V can decrease or increase the entropy of U, but on average, more knowledge reduces uncertainty.
  • The joint entropy of U and V is equal to the entropy of U plus the entropy of V given U: H(U,V) = H(U) + H(V|U).
  • Conditioning reduces entropy, meaning that H(U|V) ≤ H(U), with equality if and only if U and V are independent.
  • The chain rule of entropies states that H(U,V) = H(U) + H(V|U).
  • In terms of compression, the best way to jointly compress U and V is to first compress U on its own and then compress V given the knowledge of U.
  • Joint entropy of random variables U and V is less than or equal to the sum of their individual entropies.

Entropy Rate

  • Entropy rate of a stationary process is defined as the limit of the average number of bits per symbol required to compress the process.
  • For IID sequences, the entropy rate is equal to the entropy of each symbol.
  • For K-order Markov processes, the entropy rate is equal to the entropy of the (K+1)th symbol given the first K symbols.
  • English text has an entropy rate of 4.76 bits per letter when modeled as an IID process, 4.03 bits per letter when modeled as a first-order Markov process, 2.8 bits per letter when modeled as a fourth-order Markov process, and 1.3 bits per letter when modeled using a human predictor.
  • The best compression for a first-order Markov source is H(U|U_1).
  • For a general stationary source, the best compression is the entropy rate.
  • The Shannon-McMillan theorem states that the probabilities of blocks are roughly equal to 2 to the power of minus the entropy rate.

Overwhelmed by Endless Content?