# Nonparametric models¶

## K-Nearest Neighbors

The k-nearest neighbors (KNN) model is a nonparametric supervised learning approach that can be applied to classification or regression problems. In a classification context, the KNN model assigns a class label for a new datapoint by taking a majority vote amongst the labels for the k closest points (“neighbors”) in the training data. Similarly, in a regression context, the KNN model predicts the target value associated with a new datapoint by taking the average of the targets associated with the k closes points in the training data.

**Models**

## Gaussian Process Regression

A Gaussian process defines a prior distribution over functions mapping \(X \rightarrow \mathbb{R}\), where X can be any finite (or infinite!)-dimensional set.

Let \(f(x_k)\) be the random variable corresponding to the value of a function f at a point \(x_k \in X\). Define a random variable \(z = [f(x_1), \ldots, f(x_N)]\) for any finite set of points \(\{x_1, \ldots, x_N\} \subset X\). If f is distributed according to a Gaussian Process, it is the case that

for

where mean is the mean function (in Gaussian process regression it is common to define mean(x) = 0), and kernel is a kernel / covariance function that determines the general shape of the GP prior over functions, p(f).

In Gaussian process regression (AKA simple Kriging [2] [3]), a Gaussian process is used as a prior on functions and is combined with the Gaussian likelihood from the linear model via Bayes’ rule to compute a posterior over functions f:

Due to the conjugacy of the Gaussian Process prior with the regression model’s Gaussian likelihood, the posterior will also be Gaussian and can be computed in closed form.

**Models**

**References**

[1] | Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA. |

[2] | Krige, D. G., (1951). “A statistical approach to some mine valuations and
allied problems at the Witwatersrand”, Master’s thesis of the University of
Witwatersrand. |

[3] | Matheron, G., (1963). “Principles of geostatistics”, Economic Geology, 58, 1246-1266. |

## Kernel Regression

Kernel regression is another nonparametric approach to nonlinear regression.
Like the Gaussian Process regression approach (or, more generally, all
regression models), kernel regression attempts to learn a function f which
captures the conditional expectation of some targets **y** given the data
**X**, under the assumption that

Unlike the Gaussian Process regression approach, however, kernel regression
does not place a prior over f. Instead, it models \(f = \mathbb{E}[y |
X] = \int_y \frac{p(X, y)}{p(X)} y \ \text{d}y\) using a kernel function, k, to estimate the smoothed data probabilities.
For example, the `Nadaraya-Watson`

estimator [4] [5] uses the following probability estimates:

**Models**

**References**

[4] | Nadaraya, E. A. (1964). “On estimating regression”. Theory of
Probability and Its Applications, 9 (1), 141-2. |

[5] | Watson, G. S. (1964). “Smooth regression analysis”. Sankhyā: The Indian
Journal of Statistics, Series A. 26 (4), 359–372. |

## See Also

The trees module contains other classic nonparametric approaches, including decision trees, random forests, and gradient boosted decision trees.