# LinearRegression¶

class numpy_ml.linear_models.LinearRegression(fit_intercept=True)[source]

An ordinary least squares regression model fit via the normal equation.

Notes

Given data matrix X and target vector y, the maximum-likelihood estimate for the regression coefficients, $$\beta$$, is:

$\hat{\beta} = \Sigma^{-1} \mathbf{X}^\top \mathbf{y}$

where $$\Sigma^{-1} = (\mathbf{X}^\top \mathbf{X})^{-1}$$.

Parameters: fit_intercept (bool) – Whether to fit an intercept term in addition to the model coefficients. Default is True.
update(X, y)[source]

Incrementally update the least-squares coefficients for a set of new examples.

Notes

The recursive least-squares algorithm [1] [2] is used to efficiently update the regression parameters as new examples become available. For a single new example $$(\mathbf{x}_{t+1}, \mathbf{y}_{t+1})$$, the parameter updates are

$\beta_{t+1} = \left( \mathbf{X}_{1:t}^\top \mathbf{X}_{1:t} + \mathbf{x}_{t+1}\mathbf{x}_{t+1}^\top \right)^{-1} \mathbf{X}_{1:t}^\top \mathbf{Y}_{1:t} + \mathbf{x}_{t+1}^\top \mathbf{y}_{t+1}$

where $$\beta_{t+1}$$ are the updated regression coefficients, $$\mathbf{X}_{1:t}$$ and $$\mathbf{Y}_{1:t}$$ are the set of examples observed from timestep 1 to t.

In the single-example case, the RLS algorithm uses the Sherman-Morrison formula [3] to avoid re-inverting the covariance matrix on each new update. In the multi-example case (i.e., where $$\mathbf{X}_{t+1}$$ and $$\mathbf{y}_{t+1}$$ are matrices of N examples each), we use the generalized Woodbury matrix identity [4] to update the inverse covariance. This comes at a performance cost, but is still more performant than doing multiple single-example updates if N is large.

References

 [1] Gauss, C. F. (1821) _Theoria combinationis observationum erroribus minimis obnoxiae_, Werke, 4. Gottinge
Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M y (ndarray of shape (N, K)) – The targets for each of the N examples in X, where each target has dimension K
fit(X, y)[source]

Fit the regression coefficients via maximum likelihood.

Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M. y (ndarray of shape (N, K)) – The targets for each of the N examples in X, where each target has dimension K.
predict(X)[source]

Use the trained model to generate predictions on a new collection of data points.

Parameters: X (ndarray of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M. y_pred (ndarray of shape (Z, K)) – The model predictions for the items in X.

# RidgeRegression¶

class numpy_ml.linear_models.RidgeRegression(alpha=1, fit_intercept=True)[source]

A ridge regression model fit via the normal equation.

Notes

Given data matrix X and target vector y, the maximum-likelihood estimate for the ridge coefficients, $$\\beta$$, is:

$\hat{\beta} = \left(\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I} \right)^{-1} \mathbf{X}^\top \mathbf{y}$

It turns out that this estimate for $$\beta$$ also corresponds to the MAP estimate if we assume a multivariate Gaussian prior on the model coefficients:

$\beta \sim \mathcal{N}(\mathbf{0}, \frac{1}{2M} \mathbf{I})$

Note that this assumes that the data matrix X has been standardized and the target values y centered at 0.

Parameters: alpha (float) – L2 regularization coefficient. Higher values correspond to larger penalty on the L2 norm of the model coefficients. Default is 1. fit_intercept (bool) – Whether to fit an additional intercept term in addition to the model coefficients. Default is True.
fit(X, y)[source]

Fit the regression coefficients via maximum likelihood.

Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M. y (ndarray of shape (N, K)) – The targets for each of the N examples in X, where each target has dimension K.
predict(X)[source]

Use the trained model to generate predictions on a new collection of data points.

Parameters: X (ndarray of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M. y_pred (ndarray of shape (Z, K)) – The model predictions for the items in X.

# LogisticRegression¶

class numpy_ml.linear_models.LogisticRegression(penalty='l2', gamma=0, fit_intercept=True)[source]

A simple logistic regression model fit via gradient descent on the penalized negative log likelihood.

Notes

For logistic regression, the penalized negative log likelihood of the targets y under the current model is

$- \log \mathcal{L}(\mathbf{b}, \mathbf{y}) = -\frac{1}{N} \left[ \left( \sum_{i=0}^N y_i \log(\hat{y}_i) + (1-y_i) \log(1-\hat{y}_i) \right) - R(\mathbf{b}, \gamma) \right]$

where

$\begin{split}R(\mathbf{b}, \gamma) = \left\{ \begin{array}{lr} \frac{\gamma}{2} ||\mathbf{beta}||_2^2 & :\texttt{ penalty = 'l2'}\\ \gamma ||\beta||_1 & :\texttt{ penalty = 'l1'} \end{array} \right.\end{split}$

is a regularization penalty, $$\gamma$$ is a regularization weight, N is the number of examples in y, and b is the vector of model coefficients.

Parameters: penalty ({'l1', 'l2'}) – The type of regularization penalty to apply on the coefficients beta. Default is ‘l2’. gamma (float) – The regularization weight. Larger values correspond to larger regularization penalties, and a value of 0 indicates no penalty. Default is 0. fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for beta will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.
fit(X, y, lr=0.01, tol=1e-07, max_iter=10000000.0)[source]

Fit the regression coefficients via gradient descent on the negative log likelihood.

Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M. y (ndarray of shape (N,)) – The binary targets for each of the N examples in X. lr (float) – The gradient descent learning rate. Default is 1e-7. max_iter (float) – The maximum number of iterations to run the gradient descent solver. Default is 1e7.
predict(X)[source]

Use the trained model to generate prediction probabilities on a new collection of data points.

Parameters: X (ndarray of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M. y_pred (ndarray of shape (Z,)) – The model prediction probabilities for the items in X.

# BayesianLinearRegressionUnknownVariance¶

class numpy_ml.linear_models.BayesianLinearRegressionUnknownVariance(alpha=1, beta=2, b_mean=0, b_V=None, fit_intercept=True)[source]

Bayesian linear regression model with unknown variance and conjugate Normal-Gamma prior on b and $$\sigma^2$$.

Notes

Uses a conjugate Normal-Gamma prior on b and $$\sigma^2$$. The joint and marginal posteriors over error variance and model parameters are:

$\begin{split}b, \sigma^2 &\sim \text{NG}(b_{mean}, b_{V}, \alpha, \beta) \\ \sigma^2 &\sim \text{InverseGamma}(\alpha, \beta) \\ b &\sim \mathcal{N}(b_{mean}, \sigma^2 \cdot b_V)\end{split}$
Parameters: alpha (float) – The shape parameter for the Inverse-Gamma prior on $$\sigma^2$$. Must be strictly greater than 0. Default is 1. beta (float) – The scale parameter for the Inverse-Gamma prior on $$\sigma^2$$. Must be strictly greater than 0. Default is 1. b_mean (ndarray of shape (M,) or float) – The mean of the Gaussian prior on b. If a float, assume b_mean is np.ones(M) * b_mean. Default is 0. b_V (ndarray of shape (N, N) or (N,) or None) – A symmetric positive definite matrix that when multiplied element-wise by $$b_sigma^2$$ gives the covariance matrix for the Gaussian prior on b. If a list, assume b_V = diag(b_V). If None, assume b_V is the identity matrix. Default is None. fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for b will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.
fit(X, y)[source]

Compute the posterior over model parameters using the data in X and y.

Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M. y (ndarray of shape (N, K)) – The targets for each of the N examples in X, where each target has dimension K.
predict(X)[source]

Return the MAP prediction for the targets associated with X.

Parameters: X (ndarray of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M. y_pred (ndarray of shape (Z, K)) – The model predictions for the items in X.

# BayesianLinearRegressionKnownVariance¶

class numpy_ml.linear_models.BayesianLinearRegressionKnownVariance(b_mean=0, b_sigma=1, b_V=None, fit_intercept=True)[source]

Bayesian linear regression model with known error variance and conjugate Gaussian prior on model parameters.

Notes

Uses a conjugate Gaussian prior on the model coefficients. The posterior over model parameters is

$b \mid b_{mean}, \sigma^2, b_V \sim \mathcal{N}(b_{mean}, \sigma^2 b_V)$

Ridge regression is a special case of this model where $$b_{mean}$$ = 0, $$\sigma$$ = 1 and b_V = I (ie., the prior on b is a zero-mean, unit covariance Gaussian).

Parameters: b_mean (ndarray of shape (M,) or float) – The mean of the Gaussian prior on b. If a float, assume b_mean is np.ones(M) * b_mean. Default is 0. b_sigma (float) – A scaling term for covariance of the Gaussian prior on b. Default is 1. b_V (ndarray of shape (N,N) or (N,) or None) – A symmetric positive definite matrix that when multiplied element-wise by b_sigma^2 gives the covariance matrix for the Gaussian prior on b. If a list, assume b_V = diag(b_V). If None, assume b_V is the identity matrix. Default is None. fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for b will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.
fit(X, y)[source]

Compute the posterior over model parameters using the data in X and y.

Parameters: X (ndarray of shape (N, M)) – A dataset consisting of N examples, each of dimension M. y (ndarray of shape (N, K)) – The targets for each of the N examples in X, where each target has dimension K.
predict(X)[source]

Return the MAP prediction for the targets associated with X.

Parameters: X (ndarray of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M. y_pred (ndarray of shape (Z, K)) – The MAP predictions for the targets associated with the items in X.