SmoothedLDA
¶
-
class
numpy_ml.lda.
SmoothedLDA
(T, **kwargs)[source]¶ Bases:
object
A smoothed LDA model trained using collapsed Gibbs sampling. Generates posterior mean estimates for model parameters phi and theta.
Parameters: T (int) – Number of topics
Variables: - D (int) – Number of documents
- N (int) – Total number of words across all documents
- V (int) – Number of unique word tokens across all documents
- phi (
ndarray
of shape (N[d], T)) – The word-topic distribution - theta (
ndarray
of shape (D, T)) – The document-topic distribution - alpha (
ndarray
of shape (1, T)) – Parameter for the Dirichlet prior on the document-topic distribution - beta (
ndarray
of shape (V, T)) – Parameter for the Dirichlet prior on the topic-word distribution
-
train
(texts, tokens, n_gibbs=2000)[source]¶ Trains a topic model on the documents in texts.
Parameters: - texts (array of length (D,)) – The training corpus represented as an array of subarrays, where each subarray corresponds to the tokenized words of a single document.
- tokens (array of length (V,)) – The set of unique tokens in the documents in texts.
- n_gibbs (int) – The number of steps to run the collapsed Gibbs sampler during training. Default is 2000.
Returns: