`GMM`¶

class numpy_ml.gmm.GMM(C=3, seed=None)[source]¶

A Gaussian mixture model trained via the expectation maximization algorithm.

Parameters:	C (int) – The number of clusters / mixture components in the GMM. Default is 3. seed (int) – Seed for the random number generator. Default is None.
Variables:	N (int) – The number of examples in the training dataset. d (int) – The dimension of each example in the training dataset. pi (`ndarray` of shape (C,)) – The cluster priors. Q (`ndarray` of shape (N, C)) – The variational distribution q(T). mu (`ndarray` of shape (C, d)) – The cluster means. sigma (`ndarray` of shape (C, d, d)) – The cluster covariance matrices.

likelihood_lower_bound(X)[source]¶: Compute the LLB under the current GMM parameters.

fit(X, max_iter=100, tol=0.001, verbose=False)[source]¶

Fit the parameters of the GMM on some training data.

Parameters:

X (ndarray of shape (N, d)) – A collection of N training data points, each with dimension d.
max_iter (int) – The maximum number of EM updates to perform before terminating training. Default is 100.
tol (float) – The convergence tolerance. Training is terminated if the difference in VLB between the current and previous iteration is less than tol. Default is 1e-3.
verbose (bool) – Whether to print the VLB at each training iteration. Default is False.

Returns:

success ({0, -1}) – Whether training terminated without incident (0) or one of the mixture components collapsed and training was halted prematurely (-1).

predict(X, soft_labels=True)[source]¶

Return the log probability of each data point in X under each mixture components.

Parameters:	X (`ndarray` of shape (M, d)) – A collection of M data points, each with dimension d. soft_labels (bool) – If True, return the log probabilities of the M data points in X under each mixture component. If False, return only the ID of the most probable mixture. Default is True.
Returns:	y (`ndarray` of shape (M, C) or (M,)) – If soft_labels is True, y is a 2D array where index (i,j) gives the log probability of the i th data point under the j th mixture component. If soft_labels is False, y is a 1D array where the i th index contains the ID of the most probable mixture component.

numpy-ml

Navigation

Related Topics

`GMM`¶

GMM¶

`GMM`¶