Hidden Markov Models (HMM)

Background:

As we saw in Part-of-Speech Tagging, a sequence of words has a probability that a given path generated it. Ideally, we find the state path that most likely generated them.

What are HMMs:

A Markov Process is the assumption that the next state depends only on the current state - not the full history; i.e.

P (next state ∣ current state, previous states) = P (next state ∣ current state)

Therefore, given:

Sequential observations (e.g. words, sounds etc)
Latent/hidden underlying states that caused those observations (e.g. if observation is item of clothing worn, then underlying state may be the weather.)
The hidden states follow a Markov Process: The assumption that the next state depends only on the current state and not the full history

HMMs are models that see a sequence of things (words, sounds etc.) and try to infer the hidden underlying states (e.g. POS tags), using probability over time.

Viterbi: How tf do we get the most likely sequence of hidden states?

$v_{t} (s) = max_{s^{'}} [v_{t - 1} (s^{'}) \cdot P (s ∣ s^{'}) \cdot P (x_{t} ∣ s)]$

$v_{t - 1} (s^{'})$ : best score up to previous time step in state $s^{'}$
$P (s ∣ s^{'})$ : transition probability from $s^{'}$ to current state $s$
$P (x_{t} ∣ s)$ : emission probability of the observation at time $t$ from state $s$
The max selects the most likely path to state $s$ at time $t$
The starting doesn’t include transition probability, just previous + emission.

Given:

Observations
States
Emissions (how likely an observation is for a given state)
Transitions (how likely one state follows another)
Start Probabilities

Setup:

The sentence “Dogs bark”, which we observe as ["Dogs", "bark"]
Our hidden states are: Noun (N), Verb (V)
Emissions Probabilities:
- $P ("Dogs" ∣ Noun) = 0.9$
- $P ("Dogs" ∣ Verb) = 0.1$
- $P ("bark" ∣ Noun) = 0.2$
- $P ("bark" ∣ verb) = 0.8$
- I.e. “Dogs” is very likely to be a noun
Transition Probabilities of $Tag_{t - 1} \to Tag_{t}$ :
- Noun -> Noun = 0.1
- Noun -> Verb = 0.9
- Verb -> Noun = 0.5
- Verb -> Verb = 0.5
Probability of Starting:
- $Noun : 0.6$
- $Verb : 0.4$

Viterbi Table Setup:

Word 1: Setup:

Word	Noun (N)	Verb (V)
“Dogs”	`v[Noun] = start[Noun] × emission[Noun]["Dogs"] = 0.6 x 0.9 = 0.54`	`v[Verb] = start[Verb] × emission[Verb]["Dogs"] = 0.4 x 0.1 = 0.04`

Word 2:

Now, we compute the probabilities of “bark”, given it came from either noun or verb.

Assuming “bark” is a noun: (i.e. `v["bark"][Noun]`):

Noun → Noun: $0.54 \times 0.1 \times 0.2 = 0.0108$
Verb → Noun: $0.04 \times 0.5 \times 0.2 = 0.004$

Since 0.0108 is larger, we can assume that, given “bark” is a noun, it’s most likely that “Dogs” is also a noun.

Assuming “bark” is a verb: (i.e. `v["bark"][Verb]`):

Noun → Verb: $0.54 \times 0.9 \times 0.8 = 0.3888$
Verb → Verb: $0.04 \times 0.5 \times 0.8 = 0.016$

Since 0.3888 is larger, we can assume that, given “bark” is a verb, it’s most likely that “Dogs” is also a noun.

Updating the table:

Word	Noun (N)	Verb (V)	Backpointer
”Dogs”	0.54	0.04	start
”bark”	0.0108	0.3888	Noun

Once we’ve hit the end, we backtrack:

Since the highest end score was $0.3888$ , we backtrack from that word to the most likely state that gave us that. We repeat that until we get back to the start. The back-tracked path is thus the most likely path.

Where tf did the probabilities come from?

Take a labelled corpus, and simply count and divide (like in normal Probability duh).

Problems with Viterbi

Let’s say, by pure chance, the model’s path begins to go down 2+ words that are incredibly uncommon in the training data. After that point, the model’s behaviour will be incredibly unpredictable. The same is also true in language models. Whenever you feed it solidGoldMagikarp (distinctly random, non-human text), then the model begins to freak out / act randomly.

~/leocamacho.co

Get Around

🧠 EdinburghAI

🛠️ Projects

📝 Essays

Contact Me

📧 Email

💼 LinkedIn

🐦 Twitter

Hidden Markov Models (HMM)

Background:

What are HMMs:

Viterbi: How tf do we get the most likely sequence of hidden states?

Setup:

Viterbi Table Setup:

Word 1: Setup:

Word 2:

Assuming “bark” is a noun: (i.e. `v["bark"][Noun]`):

Assuming “bark” is a verb: (i.e. `v["bark"][Verb]`):

Updating the table:

Once we’ve hit the end, we backtrack:

Problems with Viterbi

Graph View

Table of Contents

Backlinks

~/leocamacho.co

Get Around

🧠 EdinburghAI

🛠️ Projects

📝 Essays

Contact Me

📧 Email

💼 LinkedIn

🐦 Twitter

Hidden Markov Models (HMM)

Background:

What are HMMs:

Viterbi: How tf do we get the most likely sequence of hidden states?

Setup:

Viterbi Table Setup:

Word 1: Setup:

Word 2:

Assuming “bark” is a noun: (i.e. v["bark"][Noun]):

Assuming “bark” is a verb: (i.e. v["bark"][Verb]):

Updating the table:

Once we’ve hit the end, we backtrack:

Problems with Viterbi

Graph View

Table of Contents

Backlinks

Assuming “bark” is a noun: (i.e. `v["bark"][Noun]`):

Assuming “bark” is a verb: (i.e. `v["bark"][Verb]`):