# GPRegression¶

class numpy_ml.nonparametric.GPRegression(kernel='RBFKernel', alpha=1e-10)[source]

A Gaussian Process (GP) regression model.

$\begin{split}y \mid X, f &\sim \mathcal{N}( [f(x_1), \ldots, f(x_n)], \alpha I ) \\ f \mid X &\sim \text{GP}(0, K)\end{split}$

for data $$D = \{(x_1, y_1), \ldots, (x_n, y_n) \}$$ and a covariance matrix $$K_{ij} = \text{kernel}(x_i, x_j)$$ for all $$i, j \in \{1, \ldots, n \}$$.

Parameters: kernel (str) – The kernel to use in fitting the GP prior. Default is ‘RBFKernel’. alpha (float) – An isotropic noise term for the diagonal in the GP covariance, K. Larger values correspond to the expectation of greater noise in the observed data points. Default is 1e-10.
fit(X, y)[source]

Fit the GP prior to the training data.

Parameters: X (ndarray of shape (N, M)) – A training dataset of N examples, each with dimensionality M. y (ndarray of shape (N, O)) – A collection of real-valued training targets for the examples in X, each with dimension O.
predict(X, conf_interval=0.95, return_cov=False)[source]

Return the MAP estimate for $$y^*$$, corresponding the mean/mode of the posterior predictive distribution, $$p(y^* \mid x^*, X, y)$$.

Notes

Under the GP regression model, the posterior predictive distribution is

$y^* \mid x^*, X, y \sim \mathcal{N}(\mu^*, \text{cov}^*)$

where

$\begin{split}\mu^* &= K^* (K + \alpha I)^{-1} y \\ \text{cov}^* &= K^{**} - K^{*'} (K + \alpha I)^{-1} K^*\end{split}$

and

$\begin{split}K &= \text{kernel}(X, X) \\ K^* &= \text{kernel}(X, X^*) \\ K^{**} &= \text{kernel}(X^*, X^*)\end{split}$

NB. This implementation uses the inefficient but general purpose np.linalg.inv routine to invert $$(K + \alpha I)$$. A more efficient way is to rely on the fact that K (and hence also $$K + \alpha I$$) is symmetric positive (semi-)definite and take the inner product of the inverse of its (lower) Cholesky decompositions:

$Q^{-1} = \text{cholesky}(Q)^{-1 \top} \text{cholesky}(Q)^{-1}$

For more details on a production-grade implementation, see Algorithm 2.1 in Rasmussen & Williams (2006).

Parameters: X (ndarray of shape (N, M)) – The collection of datapoints to generate predictions on conf_interval (float in (0, 1)) – The percentage confidence bound to return for each prediction. If the scipy package is not available, this value is always set to 0.95. Default is 0.95. return_cov (bool) – If True, also return the covariance (cov*) of the posterior predictive distribution for the points in X. Default is False. y_pred (ndarray of shape (N, O)) – The predicted values for each point in X, each with dimensionality O. conf (ndarray of shape (N, O)) – The % conf_interval confidence bound for each y_pred. The conf % confidence interval for the i’th prediction is [y[i] - conf[i], y[i] + conf[i]]. cov (ndarray of shape (N, N)) – The covariance (cov*) of the posterior predictive distribution for X. Only returned if return_cov is True.
marginal_log_likelihood(kernel_params=None)[source]

Compute the log of the marginal likelihood (i.e., the log model evidence), $$p(y \mid X, \text{kernel_params})$$.

Notes

Under the GP regression model, the marginal likelihood is normally distributed:

$y | X, \theta \sim \mathcal{N}(0, K + \alpha I)$

Hence,

$\log p(y \mid X, \theta) = -0.5 \log \det(K + \alpha I) - 0.5 y^\top (K + \alpha I)^{-1} y + \frac{n}{2} \log 2 \pi$

where $$K = \text{kernel}(X, X)$$, $$\theta$$ is the set of kernel parameters, and n is the number of dimensions in K.

Parameters: kernel_params (dict) – Parameters for the kernel function. If None, calculate the marginal likelihood under the kernel parameters defined at model initialization. Default is None. marginal_log_likelihood (float) – The log likelihood of the training targets given the kernel parameterized by kernel_params and the training inputs, marginalized over all functions f.
sample(X, n_samples=1, dist='posterior_predictive')[source]

Sample functions from the GP prior or posterior predictive distribution.

Parameters: X (ndarray of shape (N, M)) – The collection of datapoints to generate predictions on. Only used if dist = ‘posterior_predictive’. n_samples (int) – The number of samples to generate. Default is 1. dist ({"posterior_predictive", "prior"}) – The distribution to draw samples from. Default is “posterior_predictive”. samples (ndarray of shape (n_samples, O, N)) – The generated samples for the points in X.