Logistic Regression
Discover how logistic regression turns any number into a probability — and makes yes/no decisions with surprising elegance.
Imagine your email inbox. Every second, new messages arrive — and your email provider must decide: spam or not spam? It can't just eyeball each one. It needs a fast, reliable rule that outputs a clear yes or no.
This is a classification problem — predicting which category something belongs to. Logistic regression is one of the oldest and most trusted tools for exactly this job. Despite its confusing name (it says "regression" but it does classification), it's everywhere: detecting fraudulent credit card transactions, predicting whether a patient has a disease, deciding whether a review is positive or negative.
#Why Not Just Use Linear Regression?
You might wonder: we already have linear regression for making predictions — why not use that?
Linear regression predicts a continuous number: house price, temperature, salary. But for classification, we need a probability — a number between 0 and 1 — where 0 means "definitely not spam" and 1 means "definitely spam".
Linear regression can output 2.7, or -0.5, or 1000. Those are meaningless as probabilities. We need something that squashes any number into the 0–1 range. That's where the sigmoid function comes in.
The Volume Knob Analogy
Think of the sigmoid function like a special volume knob. No matter how hard you crank it — to the left (negative infinity) or to the right (positive infinity) — the output never goes below 0 or above 1. It always settles smoothly somewhere in between. Turn it a little to the right: maybe 0.6. Turn it way to the right: approaches 0.99, but never quite 1.
#The Sigmoid Function — Heart of Logistic Regression
The sigmoid function takes any real number z and outputs a value between 0 and 1:
`` sigmoid(z) = 1 / (1 + e^(-z)) ``
Let's unpack that in plain English: - e is Euler's number, roughly 2.718 (a mathematical constant that shows up constantly in nature and math). - e^(-z) means "e raised to the power of negative z". - When z is a large positive number, e^(-z) is nearly zero, so the whole thing approaches 1 / (1 + 0) = 1. - When z is a large negative number, e^(-z) becomes huge, so the output approaches 1 / (very large number) ≈ 0. - When z = 0, the output is exactly 1 / (1 + 1) = 0.5 — perfectly on the fence.
This S-shaped curve is the sigmoid's signature. It maps the entire number line into a tidy probability.
import math
def sigmoid(z):
return 1 / (1 + math.exp(-z))
# Try a few values
for z in [-5, -1, 0, 1, 5]:
print(f"sigmoid({z:3}) = {sigmoid(z):.4f}")#From Raw Score to Prediction
So how does logistic regression actually make a prediction?
Step 1 — Compute a raw score. Just like linear regression, we take our input features (say, the number of exclamation marks and the presence of the word "FREE" in an email), multiply each by a learned weight, and add them up:
`` z = w1 * feature1 + w2 * feature2 + bias ``
Step 2 — Squeeze through sigmoid. Pass that raw score z through sigmoid(z) to get a probability p between 0 and 1.
Step 3 — Apply the decision boundary. If p >= 0.5, predict class 1 (spam). If p < 0.5, predict class 0 (not spam).
The model learns the weights during training so that spam emails produce high z values (→ high probability → predicted spam) and normal emails produce low z values.
The Decision Boundary
The decision boundary is the dividing line at probability = 0.5. Anything above: class 1. Anything below: class 0. Since sigmoid(0) = 0.5, the boundary occurs exactly where the raw score z = 0. You can think of the model as drawing a line (or curve) through your data and labeling each side.
#A Mini Example From Scratch
Let's pretend we've already trained a spam detector. It learned these weights: - w1 = 0.8 (weight for number of exclamation marks) - w2 = 1.5 (weight for whether "FREE" appears: 1 = yes, 0 = no) - bias = -1.0
Now let's classify two emails.
import math
def sigmoid(z):
return 1 / (1 + math.exp(-z))
def predict(exclamations, has_free, w1=0.8, w2=1.5, bias=-1.0):
z = w1 * exclamations + w2 * has_free + bias
prob = sigmoid(z)
label = 1 if prob >= 0.5 else 0
return prob, label
# Email A: 3 exclamation marks, contains FREE
prob_a, label_a = predict(exclamations=3, has_free=1)
print(f"Email A -> prob={prob_a:.3f}, prediction={'SPAM' if label_a else 'NOT SPAM'}")
# Email B: 0 exclamation marks, no FREE
prob_b, label_b = predict(exclamations=0, has_free=0)
print(f"Email B -> prob={prob_b:.3f}, prediction={'SPAM' if label_b else 'NOT SPAM'}")#How the Model Learns the Weights
We glossed over one thing: where do the weights come from? During training, the model sees thousands of labeled examples (spam/not spam). It starts with random weights, makes predictions, checks how wrong it is using a loss function (called log-loss or binary cross-entropy), and nudges the weights in the direction that reduces the error. This iterative nudging is called gradient descent.
The loss function is specifically designed to heavily penalize confident wrong answers — if the model says 99% spam for a normal email, the penalty is huge. This trains the model to be well-calibrated.
In practice, you'd use scikit-learn's LogisticRegression class which handles all this automatically. But now you know what's happening underneath.
"Logistic Regression" is a Classifier, Not a Regressor
The name is historically misleading. The word "regression" refers to the internal math (the linear combination of features), not the output type. Logistic regression outputs a class label — it's a classification algorithm through and through. Don't let the name fool you into thinking it predicts continuous values like house prices.
#Real-World Uses
Logistic regression is surprisingly powerful for a simple algorithm:
- Spam detection — is this email spam or not?
- Medical diagnosis — does this patient have diabetes? (based on blood sugar, age, BMI)
- Credit risk — will this loan applicant default?
- Sentiment analysis — is this review positive or negative?
- Click prediction — will a user click this ad?
It's often the first algorithm practitioners try on a new binary classification problem — it's fast to train, easy to interpret (high weight = feature matters a lot), and gives calibrated probabilities, not just labels.
When to Reach for Logistic Regression
Use logistic regression when: - You need a probability, not just a class label. - You want to understand which features matter (interpretability). - You have a binary outcome (yes/no, spam/not spam, 0/1). - Your dataset is not huge and you want fast training.
If your decision boundary is highly non-linear (e.g., images, complex patterns), you'll likely need more powerful models like neural networks — but logistic regression remains a rock-solid baseline.
You pass z = 0 into the sigmoid function. What is the output, and what prediction does logistic regression make?
You now understand the core idea behind logistic regression: compute a linear score, squeeze it through the sigmoid into a probability, and classify based on whether that probability clears the 0.5 threshold. Simple, elegant, and still widely used decades after its invention.
Key takeaways
- Logistic regression is a *classification* algorithm despite its name — it predicts categories (spam/not spam), not continuous numbers.
- The sigmoid function transforms any real number into a probability between 0 and 1, giving the model its characteristic S-shaped output.
- The decision boundary sits at probability 0.5 (equivalently, raw score z = 0): above → class 1, below → class 0.
- The model learns feature weights during training via gradient descent, minimizing a log-loss that punishes confident wrong predictions.
- When you need interpretable probabilities for a binary outcome, logistic regression is often the best first algorithm to try.
The sigmoid squashes any number into a probability from 0 to 1. Predict class 1 when it crosses the 0.5 threshold.
This is the lesson's trained spam detector (w1=0.8 for exclamation marks, w2=1.5 for the word FREE, bias=-1.0). We classify an email with 1 exclamation mark that contains FREE. What does it print?
import math
def sigmoid(z):
return 1 / (1 + math.exp(-z))
def predict(exclamations, has_free, w1=0.8, w2=1.5, bias=-1.0):
z = w1 * exclamations + w2 * has_free + bias
prob = sigmoid(z)
label = 1 if prob >= 0.5 else 0
return prob, label
prob, label = predict(exclamations=1, has_free=1)
print(f"prob={prob:.3f}, spam={label == 1}")This code has a bug — what's wrong?
import math
def sigmoid(z):
return 1 / (1 + math.exp(z))
# Expect sigmoid(0)=0.5 and a LARGE positive z to give ~1
print(round(sigmoid(0), 4))
print(round(sigmoid(5), 4))Fill in the exponent so this correctly implements the sigmoid function from the lesson: sigmoid(z) = 1 / (1 + e^(-z)).
import math def sigmoid(z): return 1 / (1 + math.exp()) print(round(sigmoid(0), 4)) # expect 0.5
Put these lines in the correct order to make one logistic-regression prediction, following the lesson's recipe: compute a raw score, squeeze it through sigmoid, then apply the 0.5 decision boundary.
label = 1 if prob >= 0.5 else 0
prob = sigmoid(z)
print(f"prob={prob:.3f}, class={label}")z = w1 * feature1 + w2 * feature2 + bias
Complete the logistic_predict function below. Given a list of feature values and a corresponding list of weights, compute the raw score z (sum of each feature multiplied by its weight, plus the bias), pass it through the sigmoid, and return both the probability and the predicted class label (1 if probability >= 0.5, else 0).
Test it with: - features=[2.0, 1.0], weights=[0.5, -1.0], bias=0.0 → should predict class 0 - features=[3.0, 0.0], weights=[1.2, -0.5], bias=-0.5 → should predict class 1
Try it live — edit the code and hit Run to execute real Python: