LDA
¶
-
class
numpy_ml.lda.
LDA
(T=10)[source]¶ Vanilla (non-smoothed) LDA model trained using variational EM. Generates maximum-likelihood estimates for model paramters alpha and beta.
Parameters: T (int) – Number of topics
Variables: - D (int) – Number of documents
- N (list of length D) – Number of words in each document
- V (int) – Number of unique word tokens across all documents
- phi (
ndarray
of shape (D, N[d], T)) – Variational approximation to word-topic distribution - gamma (
ndarray
of shape (D, T)) – Variational approximation to document-topic distribution - alpha (
ndarray
of shape (1, T)) – Parameter for the Dirichlet prior on the document-topic distribution - beta (
ndarray
of shape (V, T)) – Word-topic distribution
-
initialize_parameters
()[source]¶ Provide reasonable initializations for model and variational parameters.
-
train
(corpus, verbose=False, max_iter=1000, tol=5)[source]¶ Train the LDA model on a corpus of documents (bags of words).
Parameters: - corpus (list of length D) – A list of lists, with each sublist containing the tokenized text of a single document.
- verbose (bool) – Whether to print the VLB at each training iteration. Default is True.
- max_iter (int) – The maximum number of training iterations to perform before breaking. Default is 1000.
- tol (int) – Break the training loop if the difference betwen the VLB on the current iteration and the previous iteration is less than tol. Default is 5.