GMM

class numpy_ml.gmm.GMM(C=3, seed=None)[source]

A Gaussian mixture model trained via the expectation maximization algorithm.

Parameters:
  • C (int) – The number of clusters / mixture components in the GMM. Default is 3.
  • seed (int) – Seed for the random number generator. Default is None.
Variables:
  • N (int) – The number of examples in the training dataset.
  • d (int) – The dimension of each example in the training dataset.
  • pi (ndarray of shape (C,)) – The cluster priors.
  • Q (ndarray of shape (N, C)) – The variational distribution q(T).
  • mu (ndarray of shape (C, d)) – The cluster means.
  • sigma (ndarray of shape (C, d, d)) – The cluster covariance matrices.
likelihood_lower_bound()[source]

Compute the LLB under the current GMM parameters.

fit(X, max_iter=100, tol=0.001, verbose=False)[source]

Fit the parameters of the GMM on some training data.

Parameters:
  • X (ndarray of shape (N, d)) – A collection of N training data points, each with dimension d.
  • max_iter (int) – The maximum number of EM updates to perform before terminating training. Default is 100.
  • tol (float) – The convergence tolerance. Training is terminated if the difference in VLB between the current and previous iteration is less than tol. Default is 1e-3.
  • verbose (bool) – Whether to print the VLB at each training iteration. Default is False.
Returns:

success ({0, -1}) – Whether training terminated without incident (0) or one of the mixture components collapsed and training was halted prematurely (-1).