What?

As opposed to Linear Regression - which predicts real values - Logistic Regression is all about predicting binary response variables. Often they predict Odds and Log Odds.

The Core Idea:

👉 Logistic regression doesn’t actually “regress” like linear regression. Instead, it models ; essentially:

Given some input features , what is the probability of class

  • Essentially, each feature contributes a “vote” for different classes.
  • The model adds up the weighted votes using the dot product (the part in the main brackets).
  • Then, the Softmax Function (the ) ensures that the votes get converted into probabilities
  • The part behind the highest probability wins.

Mathematically:

Choose the class that has highest probability according to

where the normalisation constant

  • Inside brackets is just a dot product: . Essentially, this is the weight each feature brings to the class
  • does not depend on .
  • So, we will end up choosing class for which is highest.
  • Softmax function: exponentiation of scores , followed by normalisation to turn into a distribution.

Predicting By Hand:

Take the following example:

  • Number of occurrences of phrase “world-beating”
  • Number of occurrences of phrase “confidence interval”
  • Number of occurrences of phrase “bootstrap”
  • Whether the paper was rejected (1) or sent out for review (0).
  • Question: Suppose a paper contains the phrase “world-beating” 5 times, and 0 occurrences of “confidence interval” or “bootstrap”. What is the predicted probability of rejection?
Solution:

To calculate the predicted probability of rejection for a paper with 5 occurrences of “world-beating” and 0 occurrences of “confidence interval” and “bootstrap”, we use the logistic regression coefficients and the following formula:

Log-odds =

Here, is the number of occurrences of “world-beating”, is the number of occurrences of “confidence interval”, and is the number of occurrences of “bootstrap”. Given the coefficients , , , and , and the occurrences , , and , the log-odds are:

Log-odds =
Log-odds =

To convert the log-odds to a probability, we use the logistic function:

Plugging in the log-odds we calculated:

Training Logistic Regression:

Like any Machine Learning Model:

  1. Iterate over a subset of your data.
  2. Predict the classes / output of that model.
  3. Use a loss function (BCE in this case) to see how incorrect you were:
  4. Calculate which direction you should take to get more correct.
  5. Take steps in the correct direction.
4. Calculate Which Direction You Should Take:

You calculate the direction of the steepest loss decrease. To do this, we’ll use Gradient Descent! We’ll use the following values:

  • Gradient of loss function w.r.t weights is:
  • Gradient of loss w.r.t. the bias: