LinearRegression
¶

class
numpy_ml.linear_models.
LinearRegression
(fit_intercept=True)[source]¶ An ordinary least squares regression model fit via the normal equation.
Notes
Given data matrix X and target vector y, the maximumlikelihood estimate for the regression coefficients, \(\beta\), is:
\[\hat{\beta} = \Sigma^{1} \mathbf{X}^\top \mathbf{y}\]where \(\Sigma^{1} = (\mathbf{X}^\top \mathbf{X})^{1}\).
Parameters: fit_intercept (bool) – Whether to fit an intercept term in addition to the model coefficients. Default is True. 
update
(X, y)[source]¶ Incrementally update the leastsquares coefficients for a set of new examples.
Notes
The recursive leastsquares algorithm [1] [2] is used to efficiently update the regression parameters as new examples become available. For a single new example \((\mathbf{x}_{t+1}, \mathbf{y}_{t+1})\), the parameter updates are
\[\beta_{t+1} = \left( \mathbf{X}_{1:t}^\top \mathbf{X}_{1:t} + \mathbf{x}_{t+1}\mathbf{x}_{t+1}^\top \right)^{1} \mathbf{X}_{1:t}^\top \mathbf{Y}_{1:t} + \mathbf{x}_{t+1}^\top \mathbf{y}_{t+1}\]where \(\beta_{t+1}\) are the updated regression coefficients, \(\mathbf{X}_{1:t}\) and \(\mathbf{Y}_{1:t}\) are the set of examples observed from timestep 1 to t.
In the singleexample case, the RLS algorithm uses the ShermanMorrison formula [3] to avoid reinverting the covariance matrix on each new update. In the multiexample case (i.e., where \(\mathbf{X}_{t+1}\) and \(\mathbf{y}_{t+1}\) are matrices of N examples each), we use the generalized Woodbury matrix identity [4] to update the inverse covariance. This comes at a performance cost, but is still more performant than doing multiple singleexample updates if N is large.
References
[1] Gauss, C. F. (1821) _Theoria combinationis observationum erroribus minimis obnoxiae_, Werke, 4. Gottinge [2] https://en.wikipedia.org/wiki/Recursive_least_squares_filter [3] https://en.wikipedia.org/wiki/Sherman%E2%80%93Morrison_formula [4] https://en.wikipedia.org/wiki/Woodbury_matrix_identity Parameters:

RidgeRegression
¶

class
numpy_ml.linear_models.
RidgeRegression
(alpha=1, fit_intercept=True)[source]¶ A ridge regression model fit via the normal equation.
Notes
Given data matrix X and target vector y, the maximumlikelihood estimate for the ridge coefficients, \(\\beta\), is:
\[\hat{\beta} = \left(\mathbf{X}^\top \mathbf{X} + \alpha \mathbf{I} \right)^{1} \mathbf{X}^\top \mathbf{y}\]It turns out that this estimate for \(\beta\) also corresponds to the MAP estimate if we assume a multivariate Gaussian prior on the model coefficients:
\[\beta \sim \mathcal{N}(\mathbf{0}, \frac{1}{2M} \mathbf{I})\]Note that this assumes that the data matrix X has been standardized and the target values y centered at 0.
Parameters:
LogisticRegression
¶

class
numpy_ml.linear_models.
LogisticRegression
(penalty='l2', gamma=0, fit_intercept=True)[source]¶ A simple logistic regression model fit via gradient descent on the penalized negative log likelihood.
Notes
For logistic regression, the penalized negative log likelihood of the targets y under the current model is
\[ \log \mathcal{L}(\mathbf{b}, \mathbf{y}) = \frac{1}{N} \left[ \left( \sum_{i=0}^N y_i \log(\hat{y}_i) + (1y_i) \log(1\hat{y}_i) \right)  R(\mathbf{b}, \gamma) \right]\]where
\[\begin{split}R(\mathbf{b}, \gamma) = \left\{ \begin{array}{lr} \frac{\gamma}{2} \mathbf{beta}_2^2 & :\texttt{ penalty = 'l2'}\\ \gamma \beta_1 & :\texttt{ penalty = 'l1'} \end{array} \right.\end{split}\]is a regularization penalty, \(\gamma\) is a regularization weight, N is the number of examples in y, and b is the vector of model coefficients.
Parameters:  penalty ({'l1', 'l2'}) – The type of regularization penalty to apply on the coefficients beta. Default is ‘l2’.
 gamma (float) – The regularization weight. Larger values correspond to larger regularization penalties, and a value of 0 indicates no penalty. Default is 0.
 fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for beta will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.

fit
(X, y, lr=0.01, tol=1e07, max_iter=10000000.0)[source]¶ Fit the regression coefficients via gradient descent on the negative log likelihood.
Parameters:  X (
ndarray
of shape (N, M)) – A dataset consisting of N examples, each of dimension M.  y (
ndarray
of shape (N,)) – The binary targets for each of the N examples in X.  lr (float) – The gradient descent learning rate. Default is 1e7.
 max_iter (float) – The maximum number of iterations to run the gradient descent solver. Default is 1e7.
 X (

predict
(X)[source]¶ Use the trained model to generate prediction probabilities on a new collection of data points.
Parameters: X ( ndarray
of shape (Z, M)) – A dataset consisting of Z new examples, each of dimension M.Returns: y_pred ( ndarray
of shape (Z,)) – The model prediction probabilities for the items in X.
BayesianLinearRegressionUnknownVariance
¶

class
numpy_ml.linear_models.
BayesianLinearRegressionUnknownVariance
(alpha=1, beta=2, b_mean=0, b_V=None, fit_intercept=True)[source]¶ Bayesian linear regression model with unknown variance and conjugate NormalGamma prior on b and \(\sigma^2\).
Notes
Uses a conjugate NormalGamma prior on b and \(\sigma^2\). The joint and marginal posteriors over error variance and model parameters are:
\[\begin{split}b, \sigma^2 &\sim \text{NG}(b_{mean}, b_{V}, \alpha, \beta) \\ \sigma^2 &\sim \text{InverseGamma}(\alpha, \beta) \\ b &\sim \mathcal{N}(b_{mean}, \sigma^2 \cdot b_V)\end{split}\]Parameters:  alpha (float) – The shape parameter for the InverseGamma prior on \(\sigma^2\). Must be strictly greater than 0. Default is 1.
 beta (float) – The scale parameter for the InverseGamma prior on \(\sigma^2\). Must be strictly greater than 0. Default is 1.
 b_mean (
ndarray
of shape (M,) or float) – The mean of the Gaussian prior on b. If a float, assume b_mean isnp.ones(M) * b_mean
. Default is 0.  b_V (
ndarray
of shape (N, N) or (N,) or None) – A symmetric positive definite matrix that when multiplied elementwise by \(b_sigma^2\) gives the covariance matrix for the Gaussian prior on b. If a list, assumeb_V = diag(b_V)
. If None, assume b_V is the identity matrix. Default is None.  fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for b will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.
BayesianLinearRegressionKnownVariance
¶

class
numpy_ml.linear_models.
BayesianLinearRegressionKnownVariance
(b_mean=0, b_sigma=1, b_V=None, fit_intercept=True)[source]¶ Bayesian linear regression model with known error variance and conjugate Gaussian prior on model parameters.
Notes
Uses a conjugate Gaussian prior on the model coefficients. The posterior over model parameters is
\[b \mid b_{mean}, \sigma^2, b_V \sim \mathcal{N}(b_{mean}, \sigma^2 b_V)\]Ridge regression is a special case of this model where \(b_{mean}\) = 0, \(\sigma\) = 1 and b_V = I (ie., the prior on b is a zeromean, unit covariance Gaussian).
Parameters:  b_mean (
ndarray
of shape (M,) or float) – The mean of the Gaussian prior on b. If a float, assume b_mean isnp.ones(M) * b_mean
. Default is 0.  b_sigma (float) – A scaling term for covariance of the Gaussian prior on b. Default is 1.
 b_V (
ndarray
of shape (N,N) or (N,) or None) – A symmetric positive definite matrix that when multiplied elementwise by b_sigma^2 gives the covariance matrix for the Gaussian prior on b. If a list, assumeb_V = diag(b_V)
. If None, assume b_V is the identity matrix. Default is None.  fit_intercept (bool) – Whether to fit an intercept term in addition to the coefficients in b. If True, the estimates for b will have M + 1 dimensions, where the first dimension corresponds to the intercept. Default is True.
 b_mean (