Learning MachinesIntermediate⏱ 10 min10 / 14

The Perceptron

Build the original artificial neuron from scratch and watch it learn to separate two classes by nudging its weights after every mistake.

In 1958, psychologist Frank Rosenblatt wired together a machine about the size of a room and called it the Perceptron. He claimed it could learn. The press went wild — headlines declared that machines would soon think like humans.

The device was inspired by a biological neuron: it receives signals from many neighbors, adds them up, and fires if the total is strong enough. Rosenblatt asked: what if we could teach a mathematical version of that cell to classify things? His answer became the bedrock of every neural network that followed.

Think of it like

The Voting Jury

Picture a jury deciding guilty (1) or innocent (0). Each juror casts a vote, but not all votes are equal — the forensics expert counts triple, a confused juror barely counts at all. A bias juror always nudges the verdict in one direction, shifting the threshold.

The perceptron works the same way. Inputs are votes. Weights are how much each vote counts. The bias is that opinionated juror. The step function is the verdict: if the weighted total crosses zero, output 1; otherwise output 0.

#Anatomy of a Perceptron

A perceptron multiplies each input x by a weight w, sums everything up, then adds a bias that shifts the threshold independently of the inputs. This total is the net input: net = (x1*w1) + (x2*w2) + ... + bias. A step function converts it to a crisp decision: 1 if net >= 0, else 0.

Weights and bias start as random guesses. Learning means adjusting them until every training example is classified correctly.

The whole perceptron in two functions. The bias shifts the decision boundary — without it the boundary would always pass through the origin.

1
2
3
4
5
6
7
8
9
10def predict(inputs, weights, bias):
    net = sum(x * w for x, w in zip(inputs, weights)) + bias
    return 1 if net >= 0 else 0

# Two inputs: tumor size and patient age (0=low, 1=high)
weights = [0.5, 0.5]
bias = -0.6
print(predict([1, 1], weights, bias))  # large + old  -> 1
print(predict([0, 0], weights, bias))  # small + young -> 0
print(predict([1, 0], weights, bias))  # large + young -> ?

#The Learning Rule: Nudging Weights Toward Correct

After each prediction, if the perceptron was wrong we nudge every weight in the direction that would have produced the correct answer. The nudge size is the learning rate (lr), typically a small number like 0.1.

`` error = actual_label - predicted_label w_new = w_old + lr * error * input_value bias_new = bias_old + lr * error ``

Error +1 (predicted 0, should be 1): weights grow, raising the net sum next time.
Error -1 (predicted 1, should be 0): weights shrink.
Error 0 (correct): nothing changes.

One full pass over all examples is an epoch. Repeat until zero errors. The Perceptron Convergence Theorem guarantees this will eventually succeed — if the classes are linearly separable.

AND is linearly separable, so the perceptron converges in 4 epochs. With two inputs, the learned weights define a straight line that divides the plane — the visualizer on this page shows it rotating into place.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26import random

def train(data, labels, lr=0.1, epochs=20):
    random.seed(42)
    weights = [random.uniform(-0.5, 0.5) for _ in data[0]]
    bias = random.uniform(-0.5, 0.5)
    for epoch in range(epochs):
        errors = 0
        for inputs, label in zip(data, labels):
            net  = sum(x * w for x, w in zip(inputs, weights)) + bias
            pred = 1 if net >= 0 else 0
            err  = label - pred
            if err != 0:
                errors += 1
                weights = [w + lr * err * x for w, x in zip(weights, inputs)]
                bias += lr * err
        if errors == 0:
            print(f"Converged at epoch {epoch + 1}!")
            break
    return weights, bias

# AND gate: output 1 only when BOTH inputs are 1
data   = [[0,0],[0,1],[1,0],[1,1]]
labels = [  0,    0,    0,    1 ]
w, b = train(data, labels)
print(f"Weights: {[round(x,3) for x in w]}, Bias: {round(b,3)}")

#The Famous Failure: XOR

In 1969, Minsky and Papert proved a single perceptron cannot learn XOR. Plot the four points: the 1s sit in opposite corners — no straight line can separate them from the 0s. XOR is not linearly separable. This triggered an AI winter; the fix was stacking multiple layers (MLPs) trained with backpropagation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28import random

def train(data, labels, lr=0.1, epochs=100):
    random.seed(7)
    w = [random.uniform(-0.5, 0.5) for _ in data[0]]
    b = random.uniform(-0.5, 0.5)
    for epoch in range(epochs):
        errs = 0
        for inp, lbl in zip(data, labels):
            net  = sum(x * wt for x, wt in zip(inp, w)) + b
            pred = 1 if net >= 0 else 0
            e    = lbl - pred
            if e != 0:
                errs += 1
                w = [wt + lr * e * x for wt, x in zip(w, inp)]
                b += lr * e
        if errs == 0:
            return w, b, epoch + 1
    return w, b, None

data   = [[0,0],[0,1],[1,0],[1,1]]
labels = [  0,    1,    1,    0 ]  # XOR: 1 only when inputs differ
w, b, conv = train(data, labels)
print("Converged at epoch:", conv)
for inp, lbl in zip(data, labels):
    net  = sum(x * wt for x, wt in zip(inp, w)) + b
    pred = 1 if net >= 0 else 0
    print(f"  {inp} -> {pred}  (want {lbl})  {'OK' if pred==lbl else 'WRONG'}")

Common mistake

Misconception: More Epochs Will Fix It

It's tempting to think "just run more epochs." For linearly separable data, the Perceptron Convergence Theorem guarantees that works. But for data that is not linearly separable — like XOR — the weights will flip back and forth forever. More epochs do nothing. The architecture must change, not the training time.

Quick check

A perceptron is trained on a dataset. After 1000 epochs it still makes mistakes. What is the most likely reason?

#From One Neuron to Deep Learning

The perceptron's DNA is in every modern AI system: - Logistic regression is a perceptron with a smooth sigmoid instead of a hard step. - A neuron in a deep network is a perceptron — weighted sum plus bias — with a smooth activation like ReLU. - Backpropagation generalizes the same weight-update idea to many layers using calculus.

In practice you'd use sklearn.linear_model.Perceptron for simple tasks, or stack millions of perceptron-like neurons in PyTorch or TensorFlow. But every time you do, you're standing on Rosenblatt's 1958 idea — a single artificial neuron that learns from its mistakes.

Key takeaways

A perceptron computes a weighted sum of inputs plus a bias, then applies a step function to output 0 or 1.
The learning rule: when a prediction is wrong, nudge each weight by (learning_rate * error * input) and update the bias the same way.
Geometrically, a perceptron draws a straight decision boundary; learning rotates and shifts that line to separate two classes.
The perceptron is guaranteed to converge only when data is linearly separable — it can never solve XOR.
Modern deep learning stacks perceptron-like neurons and trains them with backpropagation, a generalization of the same core idea.

Try it yourself · Learning a boundary

The line rotates each pass until it separates the two classes.

accuracy 50%w=(1.0, -0.2) b=-0.5

Each pass, the perceptron nudges its line toward any point it gets wrong. Watch it rotate until it cleanly separates the two classes.

epoch 0 / 3

Practice challenges

Test yourself · earn XP

0/4

Predict the output#1

Using the perceptron's predict function from the lesson, what does this print?

predict-output

1
2
3
4
5
6
7
8def predict(inputs, weights, bias):
    net = sum(x * w for x, w in zip(inputs, weights)) + bias
    return 1 if net >= 0 else 0

weights = [0.5, 0.5]
bias = -0.6
print(predict([1, 1], weights, bias))
print(predict([1, 0], weights, bias))

Fill in the blank#2

Complete the perceptron learning rule from the lesson. When a prediction is wrong, we compute the error and nudge each weight. Fill in the error expression and the weight-update operator.

err = label  pred
if err != 0:
    weights = [w  lr * err * x for w, x in zip(weights, inputs)]
    bias += lr * err

Reorder the lines#3

Put the body of the perceptron's per-example training step into the correct order, matching the train() loop from the lesson.

pred = 1 if net >= 0 else 0                                 # 2. step function -> 0 or 1

err  = label - pred                                        # 3. how wrong were we?

weights = [w + lr * err * x for w, x in zip(weights, inputs)]  # 4. nudge each weight

net  = sum(x * w for x, w in zip(inputs, weights)) + bias   # 1. weighted sum plus bias

bias += lr * err                                           # 5. nudge the bias too

Fix the bug#4

This code has a bug — what's wrong?

fix-bug

1
2
3def predict(inputs, weights, bias):
    net = sum(x * w for x, w in zip(inputs, weights))
    return 1 if net >= 0 else 0

Your turn

Practice exercise

Implement the OR gate using the perceptron learning rule.

The OR gate outputs 1 if at least one input is 1: - [0, 0] -> 0 - [0, 1] -> 1 - [1, 0] -> 1 - [1, 1] -> 1

Your tasks: 1. Start with weights = [0.0, 0.0] and bias = 0.0. 2. Use a learning rate of 0.1. 3. Run up to 20 epochs. After each epoch, print the epoch number, current weights (rounded to 2 decimal places), and bias (rounded to 2 decimal places). 4. Stop early and print 'Converged!' if an epoch ends with zero errors. 5. After training, print the prediction for each of the four inputs.

Try it live — edit the code and hit Run to execute real Python:

solution.py · editable

data   = [[0,0],[0,1],[1,0],[1,1]]
labels = [  0,    1,    1,    1 ]

weights = [0.0, 0.0]
bias    = 0.0
lr      = 0.1

for epoch in range(20):
    errors = 0
    for inputs, label in zip(data, labels):
        # TODO: compute net input (weighted sum + bias)
        # TODO: apply step function -> prediction
        # TODO: compute error = label - prediction
        # TODO: if error != 0, update weights and bias
        pass
    # TODO: print epoch, rounded weights, rounded bias
    # TODO: if zero errors, print 'Converged!' and break

# TODO: print final prediction for each input