SmoothedLDA
¶

class
numpy_ml.lda.
SmoothedLDA
(T, **kwargs)[source]¶ Bases:
object
A smoothed LDA model trained using collapsed Gibbs sampling. Generates posterior mean estimates for model parameters phi and theta.
Parameters: T (int) – Number of topics
Variables:  D (int) – Number of documents
 N (int) – Total number of words across all documents
 V (int) – Number of unique word tokens across all documents
 phi (
ndarray
of shape (N[d], T)) – The wordtopic distribution  theta (
ndarray
of shape (D, T)) – The documenttopic distribution  alpha (
ndarray
of shape (1, T)) – Parameter for the Dirichlet prior on the documenttopic distribution  beta (
ndarray
of shape (V, T)) – Parameter for the Dirichlet prior on the topicword distribution

train
(texts, tokens, n_gibbs=2000)[source]¶ Trains a topic model on the documents in texts.
Parameters:  texts (array of length (D,)) – The training corpus represented as an array of subarrays, where each subarray corresponds to the tokenized words of a single document.
 tokens (array of length (V,)) – The set of unique tokens in the documents in texts.
 n_gibbs (int) – The number of steps to run the collapsed Gibbs sampler during training. Default is 2000.
Returns: