# Nonparametric models¶

## K-Nearest Neighbors

The k-nearest neighbors (KNN) model is a nonparametric supervised learning approach that can be applied to classification or regression problems. In a classification context, the KNN model assigns a class label for a new datapoint by taking a majority vote amongst the labels for the k closest points (“neighbors”) in the training data. Similarly, in a regression context, the KNN model predicts the target value associated with a new datapoint by taking the average of the targets associated with the k closes points in the training data.

Models

## Gaussian Process Regression

A Gaussian process defines a prior distribution over functions mapping $$X \rightarrow \mathbb{R}$$, where X can be any finite (or infinite!)-dimensional set.

Let $$f(x_k)$$ be the random variable corresponding to the value of a function f at a point $$x_k \in X$$. Define a random variable $$z = [f(x_1), \ldots, f(x_N)]$$ for any finite set of points $$\{x_1, \ldots, x_N\} \subset X$$. If f is distributed according to a Gaussian Process, it is the case that

$z \sim \mathcal{N}(\mu, K)$

for

$\begin{split}\mu &= [\text{mean}(x_1), \ldots, \text{mean}(x_N)] \\ K_{ij} &= \text{kernel}(x_i, x_j)\end{split}$

where mean is the mean function (in Gaussian process regression it is common to define mean(x) = 0), and kernel is a kernel / covariance function that determines the general shape of the GP prior over functions, p(f).

In Gaussian process regression (AKA simple Kriging  ), a Gaussian process is used as a prior on functions and is combined with the Gaussian likelihood from the linear model via Bayes’ rule to compute a posterior over functions f:

$\begin{split}y \mid X, f &\sim \mathcal{N}( [f(x_1), \ldots, f(x_n)], \alpha I ) \\ f \mid X &\sim \text{GP}(0, K)\end{split}$

Due to the conjugacy of the Gaussian Process prior with the regression model’s Gaussian likelihood, the posterior will also be Gaussian and can be computed in closed form.

Models

References

  Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  Krige, D. G., (1951). “A statistical approach to some mine valuations and allied problems at the Witwatersrand”, Master’s thesis of the University of Witwatersrand.
  Matheron, G., (1963). “Principles of geostatistics”, Economic Geology, 58, 1246-1266.

## Kernel Regression

Kernel regression is another nonparametric approach to nonlinear regression. Like the Gaussian Process regression approach (or, more generally, all regression models), kernel regression attempts to learn a function f which captures the conditional expectation of some targets y given the data X, under the assumption that

$y_i = f(x_i) + \epsilon_i \ \ \ \ \text{where } \mathbb{E}[\epsilon | \mathbf{x}] = \mathbb{E}[\epsilon] = 0$

Unlike the Gaussian Process regression approach, however, kernel regression does not place a prior over f. Instead, it models $$f = \mathbb{E}[y | X] = \int_y \frac{p(X, y)}{p(X)} y \ \text{d}y$$ using a kernel function, k, to estimate the smoothed data probabilities. For example, the Nadaraya-Watson estimator   uses the following probability estimates:

$\begin{split}\hat{p}(X) &= \prod_{i=1}^N \hat{p}(x_i) = \prod_{i=1}^N \sum_{j=1}^N \frac{k(x_i - x_j)}{N} \\ \hat{p}(X, y) & \prod_{i=1}^N \hat{p}(x_i, y_i) = \prod_{i=1}^N \sum_{j=1}^N \frac{k(x_i - x_j) k(y_i - y_j)}{N}\end{split}$

Models

References

  Nadaraya, E. A. (1964). “On estimating regression”. Theory of Probability and Its Applications, 9 (1), 141-2.
  Watson, G. S. (1964). “Smooth regression analysis”. Sankhyā: The Indian Journal of Statistics, Series A. 26 (4), 359–372.