GPRegression
¶
-
class
numpy_ml.nonparametric.
GPRegression
(kernel='RBFKernel', alpha=1e-10)[source]¶ A Gaussian Process (GP) regression model.
\[\begin{split}y \mid X, f &\sim \mathcal{N}( [f(x_1), \ldots, f(x_n)], \alpha I ) \\ f \mid X &\sim \text{GP}(0, K)\end{split}\]for data \(D = \{(x_1, y_1), \ldots, (x_n, y_n) \}\) and a covariance matrix \(K_{ij} = \text{kernel}(x_i, x_j)\) for all \(i, j \in \{1, \ldots, n \}\).
Parameters: -
predict
(X, conf_interval=0.95, return_cov=False)[source]¶ Return the MAP estimate for \(y^*\), corresponding the mean/mode of the posterior predictive distribution, \(p(y^* \mid x^*, X, y)\).
Notes
Under the GP regression model, the posterior predictive distribution is
\[y^* \mid x^*, X, y \sim \mathcal{N}(\mu^*, \text{cov}^*)\]where
\[\begin{split}\mu^* &= K^* (K + \alpha I)^{-1} y \\ \text{cov}^* &= K^{**} - K^{*'} (K + \alpha I)^{-1} K^*\end{split}\]and
\[\begin{split}K &= \text{kernel}(X, X) \\ K^* &= \text{kernel}(X, X^*) \\ K^{**} &= \text{kernel}(X^*, X^*)\end{split}\]NB. This implementation uses the inefficient but general purpose np.linalg.inv routine to invert \((K + \alpha I)\). A more efficient way is to rely on the fact that K (and hence also \(K + \alpha I\)) is symmetric positive (semi-)definite and take the inner product of the inverse of its (lower) Cholesky decompositions:
\[Q^{-1} = \text{cholesky}(Q)^{-1 \top} \text{cholesky}(Q)^{-1}\]For more details on a production-grade implementation, see Algorithm 2.1 in Rasmussen & Williams (2006).
Parameters: - X (
ndarray
of shape (N, M)) – The collection of datapoints to generate predictions on - conf_interval (float in (0, 1)) – The percentage confidence bound to return for each prediction. If the scipy package is not available, this value is always set to 0.95. Default is 0.95.
- return_cov (bool) – If True, also return the covariance (cov*) of the posterior predictive distribution for the points in X. Default is False.
Returns: - y_pred (
ndarray
of shape (N, O)) – The predicted values for each point in X, each with dimensionality O. - conf (
ndarray
of shape (N, O)) – The % conf_interval confidence bound for each y_pred. The conf % confidence interval for the i’th prediction is[y[i] - conf[i], y[i] + conf[i]]
. - cov (
ndarray
of shape (N, N)) – The covariance (cov*) of the posterior predictive distribution for X. Only returned if return_cov is True.
- X (
-
marginal_log_likelihood
(kernel_params=None)[source]¶ Compute the log of the marginal likelihood (i.e., the log model evidence), \(p(y \mid X, \text{kernel_params})\).
Notes
Under the GP regression model, the marginal likelihood is normally distributed:
\[y | X, \theta \sim \mathcal{N}(0, K + \alpha I)\]Hence,
\[\log p(y \mid X, \theta) = -0.5 \log \det(K + \alpha I) - 0.5 y^\top (K + \alpha I)^{-1} y + \frac{n}{2} \log 2 \pi\]where \(K = \text{kernel}(X, X)\), \(\theta\) is the set of kernel parameters, and n is the number of dimensions in K.
Parameters: kernel_params (dict) – Parameters for the kernel function. If None, calculate the marginal likelihood under the kernel parameters defined at model initialization. Default is None. Returns: marginal_log_likelihood (float) – The log likelihood of the training targets given the kernel parameterized by kernel_params and the training inputs, marginalized over all functions f.
-
sample
(X, n_samples=1, dist='posterior_predictive')[source]¶ Sample functions from the GP prior or posterior predictive distribution.
Parameters: - X (
ndarray
of shape (N, M)) – The collection of datapoints to generate predictions on. Only used if dist = ‘posterior_predictive’. - n_samples (int) – The number of samples to generate. Default is 1.
- dist ({"posterior_predictive", "prior"}) – The distribution to draw samples from. Default is “posterior_predictive”.
Returns: samples (
ndarray
of shape (n_samples, O, N)) – The generated samples for the points in X.- X (
-