GPRegression

class numpy_ml.nonparametric.GPRegression(kernel='RBFKernel', alpha=1e-10)[source]

A Gaussian Process (GP) regression model.

\[\begin{split}y \mid X, f &\sim \mathcal{N}( [f(x_1), \ldots, f(x_n)], \alpha I ) \\ f \mid X &\sim \text{GP}(0, K)\end{split}\]

for data \(D = \{(x_1, y_1), \ldots, (x_n, y_n) \}\) and a covariance matrix \(K_{ij} = \text{kernel}(x_i, x_j)\) for all \(i, j \in \{1, \ldots, n \}\).

Parameters:
  • kernel (str) – The kernel to use in fitting the GP prior. Default is ‘RBFKernel’.
  • alpha (float) – An isotropic noise term for the diagonal in the GP covariance, K. Larger values correspond to the expectation of greater noise in the observed data points. Default is 1e-10.
fit(X, y)[source]

Fit the GP prior to the training data.

Parameters:
  • X (ndarray of shape (N, M)) – A training dataset of N examples, each with dimensionality M.
  • y (ndarray of shape (N, O)) – A collection of real-valued training targets for the examples in X, each with dimension O.
predict(X, conf_interval=0.95, return_cov=False)[source]

Return the MAP estimate for \(y^*\), corresponding the mean/mode of the posterior predictive distribution, \(p(y^* \mid x^*, X, y)\).

Notes

Under the GP regression model, the posterior predictive distribution is

\[y^* \mid x^*, X, y \sim \mathcal{N}(\mu^*, \text{cov}^*)\]

where

\[\begin{split}\mu^* &= K^* (K + \alpha I)^{-1} y \\ \text{cov}^* &= K^{**} - K^{*'} (K + \alpha I)^{-1} K^*\end{split}\]

and

\[\begin{split}K &= \text{kernel}(X, X) \\ K^* &= \text{kernel}(X, X^*) \\ K^{**} &= \text{kernel}(X^*, X^*)\end{split}\]

NB. This implementation uses the inefficient but general purpose np.linalg.inv routine to invert \((K + \alpha I)\). A more efficient way is to rely on the fact that K (and hence also \(K + \alpha I\)) is symmetric positive (semi-)definite and take the inner product of the inverse of its (lower) Cholesky decompositions:

\[Q^{-1} = \text{cholesky}(Q)^{-1 \top} \text{cholesky}(Q)^{-1}\]

For more details on a production-grade implementation, see Algorithm 2.1 in Rasmussen & Williams (2006).

Parameters:
  • X (ndarray of shape (N, M)) – The collection of datapoints to generate predictions on
  • conf_interval (float in (0, 1)) – The percentage confidence bound to return for each prediction. If the scipy package is not available, this value is always set to 0.95. Default is 0.95.
  • return_cov (bool) – If True, also return the covariance (cov*) of the posterior predictive distribution for the points in X. Default is False.
Returns:

  • y_pred (ndarray of shape (N, O)) – The predicted values for each point in X, each with dimensionality O.
  • conf (ndarray of shape (N, O)) – The % conf_interval confidence bound for each y_pred. The conf % confidence interval for the i’th prediction is [y[i] - conf[i], y[i] + conf[i]].
  • cov (ndarray of shape (N, N)) – The covariance (cov*) of the posterior predictive distribution for X. Only returned if return_cov is True.

marginal_log_likelihood(kernel_params=None)[source]

Compute the log of the marginal likelihood (i.e., the log model evidence), \(p(y \mid X, \text{kernel_params})\).

Notes

Under the GP regression model, the marginal likelihood is normally distributed:

\[y | X, \theta \sim \mathcal{N}(0, K + \alpha I)\]

Hence,

\[\log p(y \mid X, \theta) = -0.5 \log \det(K + \alpha I) - 0.5 y^\top (K + \alpha I)^{-1} y + \frac{n}{2} \log 2 \pi\]

where \(K = \text{kernel}(X, X)\), \(\theta\) is the set of kernel parameters, and n is the number of dimensions in K.

Parameters:kernel_params (dict) – Parameters for the kernel function. If None, calculate the marginal likelihood under the kernel parameters defined at model initialization. Default is None.
Returns:marginal_log_likelihood (float) – The log likelihood of the training targets given the kernel parameterized by kernel_params and the training inputs, marginalized over all functions f.
sample(X, n_samples=1, dist='posterior_predictive')[source]

Sample functions from the GP prior or posterior predictive distribution.

Parameters:
  • X (ndarray of shape (N, M)) – The collection of datapoints to generate predictions on. Only used if dist = ‘posterior_predictive’.
  • n_samples (int) – The number of samples to generate. Default is 1.
  • dist ({"posterior_predictive", "prior"}) – The distribution to draw samples from. Default is “posterior_predictive”.
Returns:

samples (ndarray of shape (n_samples, O, N)) – The generated samples for the points in X.