# MultinomialHMM¶

class numpy_ml.hmm.MultinomialHMM(A=None, B=None, pi=None, eps=None)[source]

A simple hidden Markov model with multinomial emission distribution.

Parameters: A (ndarray of shape (N, N) or None) – The transition matrix between latent states in the HMM. Index i, j gives the probability of transitioning from latent state i to latent state j. Default is None. B (ndarray of shape (N, V) or None) – The emission matrix. Entry i, j gives the probability of latent state i emitting an observation of type j. Default is None. pi (ndarray of shape (N,) or None) – The prior probability of each latent state. If None, use a uniform prior over states. Default is None. eps (float or None) – Epsilon value to avoid $$\log(0)$$ errors. If None, defaults to the machine epsilon. Default is None. A (ndarray of shape (N, N)) – The transition matrix between latent states in the HMM. Index i, j gives the probability of transitioning from latent state i to latent state j. B (ndarray of shape (N, V)) – The emission matrix. Entry i, j gives the probability of latent state i emitting an observation of type j. N (int) – The number of unique latent states V (int) – The number of unique observation types O (ndarray of shape (I, T)) – The collection of observed training sequences. I (int) – The number of sequences in O. T (int) – The number of observations in each sequence in O.
generate(n_steps, latent_state_types, obs_types)[source]

Sample a sequence from the HMM.

Parameters: n_steps (int) – The length of the generated sequence latent_state_types (ndarray of shape (N,)) – A collection of labels for the latent states obs_types (ndarray of shape (V,)) – A collection of labels for the observations states (ndarray of shape (n_steps,)) – The sampled latent states. emissions (ndarray of shape (n_steps,)) – The sampled emissions.
log_likelihood(O)[source]

Given the HMM parameterized by $$(A$$, B, pi) and an observation sequence O, compute the marginal likelihood of the observations: $$P(O \mid A,B,\pi)$$, summing over latent states.

Notes

The log likelihood is computed efficiently via DP using the forward algorithm, which produces a 2D trellis, forward (sometimes referred to as alpha in the literature), where entry i, j represents the probability under the HMM of being in latent state i after seeing the first j observations:

$\mathtt{forward[i,j]} = P(o_1, \ldots, o_j, q_j=i \mid A, B, \pi)$

Here $$q_j = i$$ indicates that the hidden state at time j is of type i.

The DP step is:

$\begin{split}\mathtt{forward[i,j]} &= \sum_{s'=1}^N \mathtt{forward[s',j-1]} \cdot \mathtt{A[s',i]} \cdot \mathtt{B[i,o_j]} \\ &= \sum_{s'=1}^N P(o_1, \ldots, o_{j-1}, q_{j-1}=s' \mid A, B, \pi) P(q_j=i \mid q_{j-1}=s') P(o_j \mid q_j=i)\end{split}$

In words, forward[i,j] is the weighted sum of the values computed on the previous timestep. The weight on each previous state value is the product of the probability of transitioning from that state to state i and the probability of emitting observation j in state i.

Parameters: O (ndarray of shape (1, T)) – A single set of observations. likelihood (float) – The likelihood of the observations O under the HMM.
decode(O)[source]

Given the HMM parameterized by $$(A, B, \pi)$$ and an observation sequence $$O = o_1, \ldots, o_T$$, compute the most probable sequence of latent states, $$Q = q_1, \ldots, q_T$$.

Notes

HMM decoding is done efficiently via DP using the Viterbi algorithm, which produces a 2D trellis, viterbi, where entry i, j represents the probability under the HMM of being in state i at time j after having passed through the most probable state sequence $$q_1,\ldots,q_{j-1}$$:

$\mathtt{viterbi[i,j]} = \max_{q_1, \ldots, q_{j-1}} P(o_1, \ldots, o_j, q_1, \ldots, q_{j-1}, q_j=i \mid A, B, \pi)$

Here $$q_j = i$$ indicates that the hidden state at time j is of type i, and $$\max_{q_1,\ldots,q_{j-1}}$$ represents the maximum over all possible latent state sequences for the first j-1 observations.

The DP step is:

$\begin{split}\mathtt{viterbi[i,j]} &= \max_{s'=1}^N \mathtt{viterbi[s',j-1]} \cdot \mathtt{A[s',i]} \cdot \mathtt{B[i,o_j]} \\ &= \max_{s'=1}^N P(o_1,\ldots, o_j, q_1, \ldots, q_{j-1}, q_j=i \mid A, B, \pi) P(q_j=i \mid q_{j-1}=s') P(o_j \mid q_j=i)\end{split}$

In words, viterbi[i,j] is the weighted sum of the values computed on the previous timestep. The weight on each value is the product of the probability of transitioning from that state to state i and the probability of emitting observation j in state i.

To compute the most probable state sequence we maintain a second trellis, back_pointer, whose i, j entry contains the value of the latent state at timestep j-1 that is most likely to lead to latent state i at timestep j.

When we have completed the viterbi and back_pointer trellises for all T timseteps/observations, we greedily move backwards through the back_pointer trellis to construct the best path for the full sequence of observations.

Parameters: O (ndarray of shape (T,)) – An observation sequence of length T. best_path (list of length T) – The most probable sequence of latent states for observations O. best_path_prob (float) – The probability of the latent state sequence in best_path under the HMM.
fit(O, latent_state_types, observation_types, pi=None, tol=1e-05, verbose=False)[source]

Given an observation sequence O and the set of possible latent states, learn the MLE HMM parameters A and B.

Notes

Model fitting is done iterativly using the Baum-Welch/Forward-Backward algorithm, a special case of the EM algorithm.

We begin with an intial estimate for the transition (A) and emission (B) matrices and then use these to derive better and better estimates by computing the forward probability for an observation and then dividing that probability mass among all the paths that contributed to it.

Parameters: O (ndarray of shape (I, T)) – The set of I training observations, each of length T. latent_state_types (list of length N) – The collection of valid latent states. observation_types (list of length V) – The collection of valid observation states. pi (ndarray of shape (N,)) – The prior probability of each latent state. If None, assume each latent state is equally likely a priori. Default is None. tol (float) – The tolerance value. If the difference in log likelihood between two epochs is less than this value, terminate training. Default is 1e-5. verbose (bool) – Print training stats after each epoch. Default is True. A (ndarray of shape (N, N)) – The estimated transition matrix. B (ndarray of shape (N, V)) – The estimated emission matrix. pi (ndarray` of shape (N,)) – The estimated prior probabilities of each latent state.