
  • Simply: Out of all predictions we made, what proportion were true?
  • Formula:
  • Drawbacks: Misleading when data is imbalanced


  • Simply: Out of all the times we said ā€œpositiveā€, how many of those were true?
  • Formula:
  • Intuition: Focuses on how many times you cried wolf (positive) and it actually was. High precision can lead to fewer false alarms.

R ecall

  • Simply: Out of all of the data points that are actually positive, how many of them did we predict to be positive.
  • Formula:
  • Intuition: On all of the time there was a wolf, it measures how many you correctly cried. Higher recall means you missed the wolf less.

F1 Score

  • Combines precision and recall:
  • Formula:
  • If either Precision or Recall is very low, then F1 will be low.


  • Simply: How confused is the model for a given word? Lower means better.
  • Formula:
    where is the Cross-Entropy Loss of the language model.
  • Intuition: If a model assigns high probability to correct sequences, it has low perplexity (less confused). If it spreads probability evenly across words, it has high perplexity (very unsure).
  • Example: A model with perplexity = 10 is as uncertain as if it had 10 choices per word on average.