Accuracy vs Precision vs Recall vs F1 vs Perplexity

Simply: Out of all predictions we made, what proportion were true?
Formula: $Accuracy = \frac{True Positives + True Negatives}{Total Instances}$
Drawbacks: Misleading when data is imbalanced

Simply: Out of all the times we said “positive”, how many of those were true?
Formula: $Precision = \frac{True Positives}{True Positives + False Positives}$
Intuition: Focuses on how many times you cried wolf (positive) and it actually was. High precision can lead to fewer false alarms.

Simply: Out of all of the data points that are actually positive, how many of them did we predict to be positive.
Formula: $Recall = \frac{True Positives}{True Positives + False Negatives}$
Intuition: On all of the time there was a wolf, it measures how many you correctly cried. Higher recall means you missed the wolf less.

Simply: How confused is the model for a given word? Lower means better.
Formula: $Perplexity = P (W)^{- \frac{1}{N}} = 2^{H (W)} or e^{H (W)}$
where $H (W)$ is the Cross-Entropy Loss of the language model.
Probability Space Formula: $Perplexity = (\prod_{t = 1}^{T} P (w_{t} ∣ w_{< t}))^{- 1/ T}$
Intuition: If a model assigns high probability to correct sequences, it has low perplexity (less confused). If it spreads probability evenly across words, it has high perplexity (very unsure).
Example: A model with perplexity = 10 is as uncertain as if it had 10 choices per word on average.

~/leocamacho.co