GradientBoostedDecisionTree

class numpy_ml.trees.GradientBoostedDecisionTree(n_iter, max_depth=None, classifier=True, learning_rate=1, loss='crossentropy', step_size='constant')[source]

A gradient boosted ensemble of decision trees.

Notes

Gradient boosted machines (GBMs) fit an ensemble of m weak learners such that:

\[f_m(X) = b(X) + \eta w_1 g_1 + \ldots + \eta w_m g_m\]

where b is a fixed initial estimate for the targets, \(\eta\) is a learning rate parameter, and \(w_{\cdot}\) and \(g_{\cdot}\) denote the weights and learner predictions for subsequent fits.

We fit each w and g iteratively using a greedy strategy so that at each iteration i,

\[w_i, g_i = \arg \min_{w_i, g_i} L(Y, f_{i-1}(X) + w_i g_i)\]

On each iteration we fit a new weak learner to predict the negative gradient of the loss with respect to the previous prediction, \(f_{i-1}(X)\). We then use the element-wise product of the predictions of this weak learner, \(g_i\), with a weight, \(w_i\), to compute the amount to adjust the predictions of our model at the previous iteration, \(f_{i-1}(X)\):

\[f_i(X) := f_{i-1}(X) + w_i g_i\]
Parameters:
  • n_iter (int) – The number of iterations / weak estimators to use when fitting each dimension / class of Y.
  • max_depth (int) – The maximum depth of each decision tree weak estimator. Default is None.
  • classifier (bool) – Whether Y contains class labels or real-valued targets. Default is True.
  • learning_rate (float) – Value in [0, 1] controlling the amount each weak estimator contributes to the overall model prediction. Sometimes known as the shrinkage parameter in the GBM literature. Default is 1.
  • loss ({'crossentropy', 'mse'}) – The loss to optimize for the GBM. Default is ‘crossentropy’.
  • step_size ({"constant", "adaptive"}) – How to choose the weight for each weak learner. If “constant”, use a fixed weight of 1 for each learner. If “adaptive”, use a step size computed via line-search on the current iteration’s loss. Default is ‘constant’.
fit(X, Y)[source]

Fit the gradient boosted decision trees on a dataset.

Parameters:
  • X (ndarray of shape (N, M)) – The training data of N examples, each with M features
  • Y (ndarray of shape (N,)) – An array of integer class labels for each example in X if self.classifier = True, otherwise the set of target values for each example in X.
predict(X)[source]

Use the trained model to classify or predict the examples in X.

Parameters:X (ndarray of shape (N, M)) – The training data of N examples, each with M features
Returns:preds (ndarray of shape (N,)) – The integer class labels predicted for each example in X if self.classifier = True, otherwise the predicted target values.