`LDA`¶

class numpy_ml.lda.LDA(T=10)[source]¶

Vanilla (non-smoothed) LDA model trained using variational EM. Generates maximum-likelihood estimates for model paramters alpha and beta.

Parameters:

T (int) – Number of topics

Variables:

D (int) – Number of documents
N (list of length D) – Number of words in each document
V (int) – Number of unique word tokens across all documents
phi (ndarray of shape (D, N[d], T)) – Variational approximation to word-topic distribution
gamma (ndarray of shape (D, T)) – Variational approximation to document-topic distribution
alpha (ndarray of shape (1, T)) – Parameter for the Dirichlet prior on the document-topic distribution
beta (ndarray of shape (V, T)) – Word-topic distribution

VLB()[source]¶: Return the variational lower bound associated with the current model parameters.

initialize_parameters()[source]¶: Provide reasonable initializations for model and variational parameters.

train(corpus, verbose=False, max_iter=1000, tol=5)[source]¶

Train the LDA model on a corpus of documents (bags of words).

Parameters:

corpus (list of length D) – A list of lists, with each sublist containing the tokenized text of a single document.
verbose (bool) – Whether to print the VLB at each training iteration. Default is True.
max_iter (int) – The maximum number of training iterations to perform before breaking. Default is 1000.
tol (int) – Break the training loop if the difference betwen the VLB on the current iteration and the previous iteration is less than tol. Default is 5.

numpy-ml