Gaussian Process with GPML toolbox

Statistical machine learning

Posted by Jingbiao on October 25, 2021, Reading time: 4 minutes.

\( \require{amstext} \require{amsmath} \require{amssymb} \require{amsfonts} \)

Function initialization

1
2
3
meanfunc = @meanZero;      % zero mean function
covfunc = @covSEiso;       % Squared Exponental covariance function
likfunc = @likGauss;       % Gaussian likelihood

Hyperparameter initialization

1
2
% initial values for the log hyperparameters
hyp = struct('mean', [], 'cov', [-1 0], 'lik', 0);
  • mean: here zero/empty mean function is used
  • cov [log(ell), log(sf)]:
    • ell is the characteristic length-scale $l$
    • sf is the signal standard deviation
  • lik: log of the noise standard deviation, measure the ‘uncertainty’ of the training point

Covariance functions

Covariance functions (also called kernels) are the key components in Gaussian processes. They encode all assumptions about the form of function that we are modelling. In general, covariance represents some form of distance or similarity. Consider two input points (locations) $x_i$ and $x_j$ with corresponding observed values $y_i$ and $y_j$. If the inputs $x_i$ and $x_j$ are close to each other, we expect that $y_i$ and $y_j$ will be close as well. This measure of similarity is embedded in the covariance function.

Useage - Squared Exponential covariance function

A range of covariance functions are implemented in GPML toolbox. An example of Squared Exponential covariance function with isotropic distance measure is shown here.

1
2
% initial values for the log hyperparameters
hyp = struct('mean', [], 'cov', [-1 0], 'lik', 0);

\( k(x,z) = sf^2 * \exp(-(x-z)^T * inv(P) * (x-z)/2) \) Similarly,can be written in this way and with the noise term: \( k(x,z) = \sigma_f^2 * e^{(-\frac{(x-z)^T (x-z)}{2l^2})} + \sigma_n^2 \)

Useage - Periodic Covariance Function

1
2
% initial values for the log hyperparameters
hyp = struct('mean', [], 'cov', [0 0 0], 'lik', 0);
  • cov [log(ell), log(p), log(sf)]:
    • ell is the characteristic length-scale $l$
    • p is the period
    • sf is the signal standard deviation

Training to get optimized parameters

1
hyp2 = minimize(hyp, @gp, -100, @infGaussLik, meanfunc, covfunc, likfunc, x, y);
  • Minimize the negative log likelihood function
  • Return the updated parameters
  • The third parameter is the length of the run. If it is positive, it gives the maximum number of line searches, if negative its absolute gives the maximum allowed number of function evaluations.

Inference

There are 3 modes using gp

1
2
3
[nlZ dnlZ          ] = gp(hyp, inf, mean, cov, lik, x, y); % Training 
[ymu ys2 fmu fs2   ] = gp(hyp, inf, mean, cov, lik, x, y, xs); % Prediction
[ymu ys2 fmu fs2 lp] = gp(hyp, inf, mean, cov, lik, x, y, xs, ys); % Prediction with ys
Para Description
hyp struct of column vectors of mean/cov/lik hyperparameters
inf function specifying the inference method
mean prior mean function
cov prior covariance function
lik likelihood function
x n by D matrix of training inputs
y column vector of length n of training targets
xs ns by D matrix of test inputs
ys column vector of length nn of test targets
nlZ returned value of the negative log marginal likelihood
dnlZ struct of column vectors of partial derivatives of the negative log marginal likelihood w.r.t. mean/cov/lik hyperparameters
ymu column vector (of length ns) of predictive output means
ys2 column vector (of length ns) of predictive output variances
fmu column vector (of length ns) of predictive latent means
fs2 column vector (of length ns) of predictive latent variances
lp column vector (of length ns) of log predictive probabilities

Reference

  1. Documentation for GPML Matlab Toolbox: http://www.gaussianprocess.org/gpml/code/matlab/doc/
  2. Evelinag: Covariance function explained. http://evelinag.com/Ariadne/covarianceFunctions.html