What?
As opposed to Linear Regression - which predicts real values - Logistic Regression is all about predicting binary response variables. Often they predict Odds and Log Odds.
The Core Idea:
👉 Logistic regression doesn’t actually “regress” like linear regression. Instead, it models ; essentially:
Given some input features , what is the probability of class
- Essentially, each feature contributes a “vote” for different classes.
- The model adds up the weighted votes using the dot product (the part in the main brackets).
- Then, the Softmax Function (the ) ensures that the votes get converted into probabilities
- The part behind the highest probability wins.
Mathematically:
Choose the class that has highest probability according to
where the normalisation constant
- Inside brackets is just a dot product: . Essentially, this is the weight each feature brings to the class
- does not depend on .
- So, we will end up choosing class for which is highest.
- Softmax function: exponentiation of scores , followed by normalisation to turn into a distribution.
Predicting By Hand:
Take the following example:
- Number of occurrences of phrase “world-beating”
- Number of occurrences of phrase “confidence interval”
- Number of occurrences of phrase “bootstrap”
- Whether the paper was rejected (1) or sent out for review (0).
- Question: Suppose a paper contains the phrase “world-beating” 5 times, and 0 occurrences of “confidence interval” or “bootstrap”. What is the predicted probability of rejection?
Solution:
To calculate the predicted probability of rejection for a paper with 5 occurrences of “world-beating” and 0 occurrences of “confidence interval” and “bootstrap”, we use the logistic regression coefficients and the following formula:
Log-odds =
Here, is the number of occurrences of “world-beating”, is the number of occurrences of “confidence interval”, and is the number of occurrences of “bootstrap”. Given the coefficients , , , and , and the occurrences , , and , the log-odds are:
Log-odds =
Log-odds =
To convert the log-odds to a probability, we use the logistic function:
Plugging in the log-odds we calculated:
Training Logistic Regression:
Like any Machine Learning Model:
- Iterate over a subset of your data.
- Predict the classes / output of that model.
- Use a loss function (BCE in this case) to see how incorrect you were:
- Calculate which direction you should take to get more correct.
- Take steps in the correct direction.
4. Calculate Which Direction You Should Take:
You calculate the direction of the steepest loss decrease. To do this, we’ll use Gradient Descent! We’ll use the following values:
- Gradient of loss function w.r.t weights is:
- Gradient of loss w.r.t. the bias: