Stanford EE274: Data Compression I 2023 I Lecture 6 - Arithmetic Coding
18 Apr 2024 (5 months ago)
Arithmetic Coding
- Arithmetic coding is a data compression technique that assigns shorter binary representations to more probable sequences.
- It treats the entire data as a single block, making it a streaming algorithm with linear complexity of O(n).
- The algorithm maps input sequences to a range on the number line and communicates that range to the decoder.
- Shorter intervals on the number line require more bits to describe, while longer intervals require fewer bits.
- More probable sequences correspond to larger intervals and hence shorter bit lengths.
- Arithmetic coding achieves an expected code length that is within two bits of the entropy of the source.
- Encoding and decoding with arithmetic coding are linear time processes.
- Unlike Huffman coding, arithmetic coding does not require a pre-constructed code book.
- Arithmetic coding can handle varying probabilities at each step.
- Precision issues can arise as the intervals become very small, which can be addressed by releasing common initial digits.
Comparison with Block Coding
- Block codes can achieve entropy as the block size increases, but the convergence rate of 1/B is not ideal.
- Exponential complexity in codebook size makes block codes impractical for large block sizes.
- The overhead of 1/B can be significant compared to the entropy, especially when the entropy is small.
Comparison with Huffman Coding
- Arithmetic coding is more efficient than Huffman coding in terms of compression, but it is slower due to the arithmetic operations involved.
Modern Codes
- Two modern codes, developed in the last 10-20 years, are TS and range coding, which offer faster decompression speeds while still achieving similar compression ratios to arithmetic coding.