File failed to load: https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.1/extensions/TeX/amsfonts.js

3F7 Information theory Tripos Revision

Posted by Jingbiao on April 10, 2021, Reading time: 3 minutes.

Full notes can be viewed over here: rendered version and LaTex Source Code

Entropy


  • Calculation of entropy, joint entropy, conditional entropy, differential entropy for continuous pdf, relative entropy.
  • Remember the entropy and the mutual information non-negative
  • Usage of the chain rule for entropy.
  • Entropy is the minimum/limit of the expected length of a uniquely decodable code. “Fundamental Limit of Compression”
  • Fact: Conditionally Independent If X,Y conditionally independent given Z, then X and Y only independent if given Z. i.e. H(X|Y,Z)=H(X|Z)H(Y|X,Z)=H(Y|Z)
    since independent variable does not reduce the uncertainty(entropy)
  • Important Inequality: lnxx1

Fano’s Inequality

  • For any estimator ˆX such that XYˆX, the probability of error Pe: Pe=Pr(ˆXX)H(X|Y)1log(|X|)
  • If we consider the error such that ˆXX, therefore we can replace the X by X1, we would get a stronger version of the Fano’s inequality: Pe=Pr(ˆXX)H(X|Y)1log(|X|1)

Practical prefix-free coding


Arithmetic Coding

  1. Start by dividing the interval [0,1) (this means that 0 is inclusive and 1 is exclusive) using the probability of each symbol
    • Assume the symbols with probability as following: P(a)=0.45, P(b)=0.3, p(c)=0.25
    • Then we would divide the interval into 3 groups.
      • [0,0.45)[0.45,0.75)[0.75,1)
      • Each interval corresponds to each of the symbol
  2. After dividing the interval, choose the interval according to the first symbol X1, if symbol a then [0,0.45) etc.
  3. Then divide the interval [0,0.45) into sub-intervals with the same probability distribution.
  4. Repeat until finishes with the last symbol Xn, and keep a record of the final probability interval.
  5. Find the largest dyadic interval of the form: [j2l,j+12l)
    lies inside the final probability interval.
  6. The codeword is the binary representation of j, and l is the code length.
  • The length of the binary codeword is at most log21P(X1,,Xn)+1. The expected code length L is therefore: L<H(X)+2/N
    where N is the length of the sequence.

Arithmetic vs Huffman

  1. If the probability is not dyadic (integer powers of 1/2), then arithmetic uses lower number of blocks of symbols, Huffman requires longer blocks of symbols.
  2. The complexity of the algorithm grows exponentially with k for Huffman, but linearly with arithmetic coding, therefore arithmetic coding is always more practical compared to Huffman.

Channel Coding Theorem and Capacity


DMC - Discrete Memoryless Channel

  • Channel Capacity: C=maxPxI(X;Y)
  • Define the Channel Rate R, R<C to enable probability of error P(n)e0 as n0.
  • Use the channel n times to transmit k bits of message. R=kn bits/transmission. Total number of messages is 2k=2nR
  • Construct the codebook of length 2nR, each codeword generated i.i.d according to the PX that maximize the mutual information. Label the codebook as $\{ X^n(1),…,X^n(2^{nR})\}.
  • A message i{1,2,,2nR}istransmittedasX^n(i)$
  • The receiver receives Yn and uses joint typicality decoding. It decodes the codeword ˆW if Xn(ˆW) and Yn together is jointly typical w.r.t P_{XY}.
  • If no such codeword ˆW is found, then error arises.


App ready for offline use.