\require{amstext} \require{amsmath} \require{amssymb} \require{amsfonts}
Full notes can be viewed over here: rendered version and LaTex Source Code
Entropy
- Calculation of entropy, joint entropy, conditional entropy, differential entropy for continuous pdf, relative entropy.
- Remember the entropy and the mutual information non-negative
- Usage of the chain rule for entropy.
- Entropy is the minimum/limit of the expected length of a uniquely decodable code. “Fundamental Limit of Compression”
- Fact: Conditionally Independent If X,Y conditionally independent given Z, then X and Y only independent if given Z. i.e. H(X|Y,Z) = H(X|Z) \qquad H(Y|X,Z) = H(Y|Z) since independent variable does not reduce the uncertainty(entropy)
- Important Inequality: \ln x \le x-1
Fano’s Inequality
- For any estimator \hat{X} such that X-Y-\hat{X}, the probability of error P_e: P_e=Pr(\hat{X}\not = X) \ge \frac{H(X|Y) - 1}{\log(|\mathcal{X}|)}
- If we consider the error such that \hat{X}\not = X, therefore we can replace the \mathcal{X} by \mathcal{X} -1 , we would get a stronger version of the Fano’s inequality: P_e=Pr(\hat{X}\not = X) \ge \frac{H(X|Y) - 1}{\log(|\mathcal{X}| - 1)}
Practical prefix-free coding
Arithmetic Coding
- Start by dividing the interval [0,1) (this means that 0 is inclusive and 1 is exclusive) using the probability of each symbol
- Assume the symbols with probability as following: P(a) = 0.45, P(b)=0.3, p(c)=0.25
- Then we would divide the interval into 3 groups.
- [0,0.45)\quad[0.45,0.75)\quad[0.75, 1)
- Each interval corresponds to each of the symbol
- After dividing the interval, choose the interval according to the first symbol X_1, if symbol a then [0,0.45) etc.
- Then divide the interval [0,0.45) into sub-intervals with the same probability distribution.
- Repeat until finishes with the last symbol X_n, and keep a record of the final probability interval.
- Find the largest dyadic interval of the form: [\frac{j}{2^l}, \frac{j+1}{2^l}) lies inside the final probability interval.
- The codeword is the binary representation of j, and l is the code length.
- The length of the binary codeword is at most \lceil \log_2 \frac{1}{P(X_1,…,X_n)} \rceil + 1. The expected code length L is therefore: L < H(X) + 2/N where N is the length of the sequence.
Arithmetic vs Huffman
- If the probability is not dyadic (integer powers of 1/2), then arithmetic uses lower number of blocks of symbols, Huffman requires longer blocks of symbols.
- The complexity of the algorithm grows exponentially with k for Huffman, but linearly with arithmetic coding, therefore arithmetic coding is always more practical compared to Huffman.
Channel Coding Theorem and Capacity
DMC - Discrete Memoryless Channel
- Channel Capacity: C = \max_{P_x}I(X;Y)
- Define the Channel Rate R, R < C to enable probability of error P_e^{(n)} \rightarrow 0 as n \rightarrow 0.
- Use the channel n times to transmit k bits of message. R = \frac{k}{n} bits/transmission. Total number of messages is 2^k=2^{nR}
- Construct the codebook of length 2^{nR}, each codeword generated i.i.d according to the P_X that maximize the mutual information. Label the codebook as $\{ X^n(1),…,X^n(2^{nR})\}.
- A message i\in\{1,2,…,2^{nR}\} is transmitted as X^n(i)$
- The receiver receives Y^n and uses joint typicality decoding. It decodes the codeword \hat{W} if X^n(\hat{W}) and Y^n together is jointly typical w.r.t P_{XY}.
- If no such codeword \hat{W} is found, then error arises.