RandomForest

class numpy_ml.trees.RandomForest(n_trees, max_depth, n_feats, classifier=True, criterion='entropy')[source]

An ensemble (forest) of decision trees where each split is calculated using a random subset of the features in the input.

Parameters:
  • n_trees (int) – The number of individual decision trees to use within the ensemble.
  • max_depth (int or None) – The depth at which to stop growing each decision tree. If None, grow each tree until the leaf nodes are pure.
  • n_feats (int) – The number of features to sample on each split.
  • classifier (bool) – Whether Y contains class labels or real-valued targets. Default is True.
  • criterion ({'entropy', 'gini', 'mse'}) – The error criterion to use when calculating splits for each weak learner. When classifier = False, valid entries are {‘mse’}. When classifier = True, valid entries are {‘entropy’, ‘gini’}. Default is ‘entropy’.
fit(X, Y)[source]

Create n_trees-worth of bootstrapped samples from the training data and use each to fit a separate decision tree.

predict(X)[source]

Predict the target value for each entry in X.

Parameters:X (ndarray of shape (N, M)) – The training data of N examples, each with M features.
Returns:y_pred (ndarray of shape (N,)) – Model predictions for each entry in X.