# LDA¶

class numpy_ml.lda.LDA(T=10)[source]

Vanilla (non-smoothed) LDA model trained using variational EM. Generates maximum-likelihood estimates for model paramters alpha and beta.

Parameters: T (int) – Number of topics D (int) – Number of documents N (list of length D) – Number of words in each document V (int) – Number of unique word tokens across all documents phi (ndarray of shape (D, N[d], T)) – Variational approximation to word-topic distribution gamma (ndarray of shape (D, T)) – Variational approximation to document-topic distribution alpha (ndarray of shape (1, T)) – Parameter for the Dirichlet prior on the document-topic distribution beta (ndarray of shape (V, T)) – Word-topic distribution
VLB()[source]

Return the variational lower bound associated with the current model parameters.

initialize_parameters()[source]

Provide reasonable initializations for model and variational parameters.

train(corpus, verbose=False, max_iter=1000, tol=5)[source]

Train the LDA model on a corpus of documents (bags of words).

Parameters: corpus (list of length D) – A list of lists, with each sublist containing the tokenized text of a single document. verbose (bool) – Whether to print the VLB at each training iteration. Default is True. max_iter (int) – The maximum number of training iterations to perform before breaking. Default is 1000. tol (int) – Break the training loop if the difference betwen the VLB on the current iteration and the previous iteration is less than tol. Default is 5.