Supervised LearningBeginner⏱ 8 min04 / 13

Linear Regression

Learn how machines draw the best-fit line through data to predict numbers — from house prices to tomorrow's temperature.

Imagine you're a real-estate agent. You know a 1,000 sq ft home sold for $200k, a 1,500 sq ft home sold for $280k, and a 2,000 sq ft home sold for $350k. A client asks: "What should I list my 1,750 sq ft home for?"

Your instinct — drawing a mental trend line through past observations and reading off a new prediction — is exactly what linear regression does, but with mathematical precision. It's one of the most widely used algorithms in all of machine learning.

#The Line: y = w * x + b

Linear regression finds the straight line that best summarizes the relationship between an input (x) and an output (y):

`` y = w * x + b ``

`w` (weight / slope) — how steeply the line rises. "For every extra 100 sq ft, add $14k."
`b` (bias / intercept) — the baseline value when x = 0.

During training, the algorithm figures out the right w and b from your data. At prediction time, you just plug in a new x and get y.

Think of it like

Think of it like a Sliding Ruler

Picture holding a ruler over a scatter plot of dots. You can tilt it (change w) and slide it up or down (change b). Linear regression finds the exact tilt and position where the ruler is closest to all the dots simultaneously — not just a few.

#What Makes a Line 'Best'? Residuals and MSE

For any line we draw, most points won't sit exactly on it. The gap between an actual value and a prediction is called a residual:

`` residual = actual_y - predicted_y ``

To measure how well our line fits all points, we square each residual (making all errors positive and penalizing big mistakes more) then average them — this is Mean Squared Error (MSE):

`` MSE = average of (actual - predicted)^2 ``

A lower MSE = better fit. The best line is the one that minimizes MSE. Squaring matters because without it, positive and negative errors would cancel out — a prediction that's +10 off and one that's -10 off would average to zero error, which sounds perfect but clearly isn't.

#Computing Predictions and MSE From Scratch

Computing predictions and MSE for a candidate line — no libraries, just plain Python.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15# Tiny dataset: house size (100s sqft) vs price ($1000s)
xs = [10, 15, 20, 25, 30]
ys = [200, 280, 350, 410, 480]

# Try a candidate line: price = 14 * size + 60
w, b = 14, 60

predictions = [w * x + b for x in xs]
residuals   = [a - p for a, p in zip(ys, predictions)]
squared_err = [r ** 2 for r in residuals]
mse         = sum(squared_err) / len(squared_err)

for i in range(len(xs)):
    print(f"size={xs[i]}, actual={ys[i]}, pred={predictions[i]}, residual={residuals[i]}")
print(f"\nMSE for w={w}, b={b}: {mse:.1f}")

#Finding the Optimal Line

There is a direct closed-form formula: slope w = sum((x-x_mean)*(y-y_mean)) / sum((x-x_mean)^2), intercept b = y_mean - w*x_mean. It captures 'how much does y rise when x rises?' and finds the optimal w and b in a single pass.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16xs = [10, 15, 20, 25, 30]
ys = [200, 280, 350, 410, 480]
n  = len(xs)

x_mean = sum(xs) / n
y_mean = sum(ys) / n

numerator   = sum((xs[i] - x_mean) * (ys[i] - y_mean) for i in range(n))
denominator = sum((xs[i] - x_mean) ** 2 for i in range(n))

w = numerator / denominator
b = y_mean - w * x_mean

print(f"Best slope  w = {w:.2f}")
print(f"Best intercept b = {b:.2f}")
print(f"Predict size=17.5: ${w * 17.5 + b:.1f}k")

Tip

Try the Interactive Visualizer

The visualizer on this page lets you drag the slope and intercept sliders yourself. Watch the residual lines (shown in red) shrink and the MSE counter drop in real time as you approach the best-fit line. Try to beat the algorithm!

Common mistake

Gotcha: Linear Regression Can't Capture Curves

Linear regression always draws a straight line. If your data curves (e.g., tree growth that accelerates then plateaus), the line will be a poor fit no matter what w and b you pick. Also, avoid extrapolating far outside your training data — the line keeps going forever mathematically, but real relationships don't always follow suit. Always plot your data first before assuming a line is appropriate.

Quick check

A linear regression model is trained and finds w=5 and b=10. What does it predict for x=8?

In the real world, linear regression is prized for being fast, transparent, and often surprisingly effective. You can read the slope directly — "every extra year of experience adds $5,000 to salary" — no black box needed.

Libraries like scikit-learn handle multi-feature data, normalization, and large datasets in just a few lines:

``python from sklearn.linear_model import LinearRegression model = LinearRegression() model.fit(X_train, y_train) # finds best w and b for you print(model.predict([[17.5]])) ``

But now you know exactly what .fit() does under the hood: it finds the w and b that minimize MSE — just like the code you ran above.

Key takeaways

Linear regression fits a straight line y = w*x + b through training data to predict a continuous number.
The 'best' line is the one that minimizes Mean Squared Error (MSE) — the average of all squared residuals.
Squaring errors ensures big mistakes are penalized more, and positive and negative errors don't cancel out.
Once w and b are learned, prediction is instant: just one multiply and one add.
Linear regression only works well when the true relationship is roughly linear — always plot your data first.

Try it yourself · Fit the line

Gradient descent rotates the line to the best fit — watch the error shrink.

slope w

0.00

intercept b

0.00

error (MSE)

35.97

Watch the line rotate and shift to minimize the squared distance (dashed lines) to every point. That's gradient descent finding the best-fit line.

epoch 0 / 60

Practice challenges

Test yourself · earn XP

0/4

Predict the output#1

This model has already learned its slope and intercept. What does it print for the new input?

predict-output

1
2
3w, b = 14, 60
x = 25
print(w * x + b)

Fill in the blank#2

Complete the Mean Squared Error calculation. We square each residual, then average them. Fill in the operator that squares each error.

residuals   = [a - p for a, p in zip(ys, preds)]
squared_err = [r  2 for r in residuals]
mse         = sum(squared_err) / len(squared_err)

Reorder the lines#3

Put these lines in the correct order to compute the MSE for a candidate line, following the lesson's from-scratch approach.

mse       = sum(sq_errors) / len(sq_errors)

sq_errors = [r ** 2 for r in residuals]

residuals = [a - p for a, p in zip(ys, preds)]

preds     = [w * x + b for x in xs]

print(mse)

Fix the bug#4

This code is meant to predict a price with a trained line, but it gives the wrong answer. What's wrong?

fix-bug

1
2
3
4w, b = 14, 60
x = 20
prediction = w + x * b
print(prediction)

Your turn

Practice exercise

You have a small dataset of study hours and exam scores. Your tasks: 1. Complete predict_all to return a list of w*x + b predictions for each x. 2. Complete mse to compute the mean squared error between actuals and predictions. 3. Try changing w and b to get a lower MSE — see if you can beat the starter values!

Try it live — edit the code and hit Run to execute real Python:

solution.py · editable

hours  = [1, 2, 3, 4, 5]
scores = [52, 60, 70, 78, 88]

def predict_all(xs, w, b):
    # TODO: return [w * x + b for each x in xs]
    pass

def mse(actuals, predictions):
    # TODO: return mean of (actual - predicted)^2
    pass

w, b = 9, 43
preds = predict_all(hours, w, b)
print("Predictions:", preds)
print("MSE:", mse(scores, preds))