Accuracy
- Simply: Out of all predictions we made, what proportion were true?
- Formula:
- Drawbacks: Misleading when data is imbalanced
Precision
- Simply: Out of all the times we said āpositiveā, how many of those were true?
- Formula:
- Intuition: Focuses on how many times you cried wolf (positive) and it actually was. High precision can lead to fewer false alarms.
R ecall
- Simply: Out of all of the data points that are actually positive, how many of them did we predict to be positive.
- Formula:
- Intuition: On all of the time there was a wolf, it measures how many you correctly cried. Higher recall means you missed the wolf less.
F1 Score
- Combines precision and recall:
- Formula:
- If either Precision or Recall is very low, then F1 will be low.
Perplexity
- Simply: How confused is the model for a given word? Lower means better.
- Formula:
where is the Cross-Entropy Loss of the language model. - Intuition: If a model assigns high probability to correct sequences, it has low perplexity (less confused). If it spreads probability evenly across words, it has high perplexity (very unsure).
- Example: A model with perplexity = 10 is as uncertain as if it had 10 choices per word on average.