Part-of-Speech Tagging

What?

It’s about taking a sentence and assigning each word into grammatical categories.
E.g.as

The     → Determiner (DET)
quick   → Adjective (ADJ)
brown   → Adjective (ADJ)
fox     → Noun (NOUN)

Approaches To POS Tagging:

Rule-Based Tagging: Uses rules / dictionaries to come up with them
Probabilistic Tagging: Uses Hidden Markov Models (HMM) , Probabilistic Finite State Machine etc.
Deep Learning Tagging: Uses RNNs, LSTMs, etc. to come up with them.

Note on Probabilistic FSM:

This is a really cool way of thinking about it. You ideally want to fine-tune the probabilities of, given being on one state, going to another state.

There’s a dataset where each word has a probability of a corresponding category. If we take every word’s category in our sentence, and multiply their probabilities, then the result is the probability of that sentence.

The Parts of Speech:

Open Class Words: Content words
- Nouns, verbs, adjectives, adverbs
- Content-bearing: refer to features of the world
- It’s open as there’s no limit to these words (and new ones are constantly added: email!)
Closed Class Words: Functional Words (“Syntactic Glue”)
- Pronouns, prepositions, connectives etc.
- Limited amount
- The ties the concepts of a sentence together.

Challenges:

Ambiguity: Water the plants vs the water on the plants.
- Verb or noun?
Sparse Data: We’ve not seen all of the data in all contexts before.

~/leocamacho.co

Get Around

🧠 EdinburghAI

🛠️ Projects

📝 Essays

Contact Me

📧 Email

💼 LinkedIn

🐦 Twitter

Part-of-Speech Tagging

What?

Approaches To POS Tagging:

The Parts of Speech:

Challenges:

Graph View

Table of Contents

Backlinks