The Perceptron
Build the original artificial neuron from scratch and watch it learn to separate two classes by nudging its weights after every mistake.
In 1958, psychologist Frank Rosenblatt wired together a machine about the size of a room and called it the Perceptron. He claimed it could learn. The press went wild — headlines declared that machines would soon think like humans.
The device was inspired by a biological neuron: it receives signals from many neighbors, adds them up, and fires if the total is strong enough. Rosenblatt asked: what if we could teach a mathematical version of that cell to classify things? His answer became the bedrock of every neural network that followed.
The Voting Jury
Picture a jury deciding guilty (1) or innocent (0). Each juror casts a vote, but not all votes are equal — the forensics expert counts triple, a confused juror barely counts at all. A bias juror always nudges the verdict in one direction, shifting the threshold.
The perceptron works the same way. Inputs are votes. Weights are how much each vote counts. The bias is that opinionated juror. The step function is the verdict: if the weighted total crosses zero, output 1; otherwise output 0.
#Anatomy of a Perceptron
A perceptron multiplies each input x by a weight w, sums everything up, then adds a bias that shifts the threshold independently of the inputs. This total is the net input: net = (x1*w1) + (x2*w2) + ... + bias. A step function converts it to a crisp decision: 1 if net >= 0, else 0.
Weights and bias start as random guesses. Learning means adjusting them until every training example is classified correctly.
def predict(inputs, weights, bias):
net = sum(x * w for x, w in zip(inputs, weights)) + bias
return 1 if net >= 0 else 0
# Two inputs: tumor size and patient age (0=low, 1=high)
weights = [0.5, 0.5]
bias = -0.6
print(predict([1, 1], weights, bias)) # large + old -> 1
print(predict([0, 0], weights, bias)) # small + young -> 0
print(predict([1, 0], weights, bias)) # large + young -> ?#The Learning Rule: Nudging Weights Toward Correct
After each prediction, if the perceptron was wrong we nudge every weight in the direction that would have produced the correct answer. The nudge size is the learning rate (lr), typically a small number like 0.1.
`` error = actual_label - predicted_label w_new = w_old + lr * error * input_value bias_new = bias_old + lr * error ``
- Error
+1(predicted 0, should be 1): weights grow, raising the net sum next time. - Error
-1(predicted 1, should be 0): weights shrink. - Error
0(correct): nothing changes.
One full pass over all examples is an epoch. Repeat until zero errors. The Perceptron Convergence Theorem guarantees this will eventually succeed — if the classes are linearly separable.
import random
def train(data, labels, lr=0.1, epochs=20):
random.seed(42)
weights = [random.uniform(-0.5, 0.5) for _ in data[0]]
bias = random.uniform(-0.5, 0.5)
for epoch in range(epochs):
errors = 0
for inputs, label in zip(data, labels):
net = sum(x * w for x, w in zip(inputs, weights)) + bias
pred = 1 if net >= 0 else 0
err = label - pred
if err != 0:
errors += 1
weights = [w + lr * err * x for w, x in zip(weights, inputs)]
bias += lr * err
if errors == 0:
print(f"Converged at epoch {epoch + 1}!")
break
return weights, bias
# AND gate: output 1 only when BOTH inputs are 1
data = [[0,0],[0,1],[1,0],[1,1]]
labels = [ 0, 0, 0, 1 ]
w, b = train(data, labels)
print(f"Weights: {[round(x,3) for x in w]}, Bias: {round(b,3)}")#The Famous Failure: XOR
import random
def train(data, labels, lr=0.1, epochs=100):
random.seed(7)
w = [random.uniform(-0.5, 0.5) for _ in data[0]]
b = random.uniform(-0.5, 0.5)
for epoch in range(epochs):
errs = 0
for inp, lbl in zip(data, labels):
net = sum(x * wt for x, wt in zip(inp, w)) + b
pred = 1 if net >= 0 else 0
e = lbl - pred
if e != 0:
errs += 1
w = [wt + lr * e * x for wt, x in zip(w, inp)]
b += lr * e
if errs == 0:
return w, b, epoch + 1
return w, b, None
data = [[0,0],[0,1],[1,0],[1,1]]
labels = [ 0, 1, 1, 0 ] # XOR: 1 only when inputs differ
w, b, conv = train(data, labels)
print("Converged at epoch:", conv)
for inp, lbl in zip(data, labels):
net = sum(x * wt for x, wt in zip(inp, w)) + b
pred = 1 if net >= 0 else 0
print(f" {inp} -> {pred} (want {lbl}) {'OK' if pred==lbl else 'WRONG'}")Misconception: More Epochs Will Fix It
It's tempting to think "just run more epochs." For linearly separable data, the Perceptron Convergence Theorem guarantees that works. But for data that is not linearly separable — like XOR — the weights will flip back and forth forever. More epochs do nothing. The architecture must change, not the training time.
A perceptron is trained on a dataset. After 1000 epochs it still makes mistakes. What is the most likely reason?
#From One Neuron to Deep Learning
The perceptron's DNA is in every modern AI system: - Logistic regression is a perceptron with a smooth sigmoid instead of a hard step. - A neuron in a deep network is a perceptron — weighted sum plus bias — with a smooth activation like ReLU. - Backpropagation generalizes the same weight-update idea to many layers using calculus.
In practice you'd use sklearn.linear_model.Perceptron for simple tasks, or stack millions of perceptron-like neurons in PyTorch or TensorFlow. But every time you do, you're standing on Rosenblatt's 1958 idea — a single artificial neuron that learns from its mistakes.
Key takeaways
- A perceptron computes a weighted sum of inputs plus a bias, then applies a step function to output 0 or 1.
- The learning rule: when a prediction is wrong, nudge each weight by (learning_rate * error * input) and update the bias the same way.
- Geometrically, a perceptron draws a straight decision boundary; learning rotates and shifts that line to separate two classes.
- The perceptron is guaranteed to converge only when data is linearly separable — it can never solve XOR.
- Modern deep learning stacks perceptron-like neurons and trains them with backpropagation, a generalization of the same core idea.
Each pass, the perceptron nudges its line toward any point it gets wrong. Watch it rotate until it cleanly separates the two classes.
Using the perceptron's predict function from the lesson, what does this print?
def predict(inputs, weights, bias):
net = sum(x * w for x, w in zip(inputs, weights)) + bias
return 1 if net >= 0 else 0
weights = [0.5, 0.5]
bias = -0.6
print(predict([1, 1], weights, bias))
print(predict([1, 0], weights, bias))Complete the perceptron learning rule from the lesson. When a prediction is wrong, we compute the error and nudge each weight. Fill in the error expression and the weight-update operator.
err = label pred if err != 0: weights = [w lr * err * x for w, x in zip(weights, inputs)] bias += lr * err
Put the body of the perceptron's per-example training step into the correct order, matching the train() loop from the lesson.
pred = 1 if net >= 0 else 0 # 2. step function -> 0 or 1
err = label - pred # 3. how wrong were we?
weights = [w + lr * err * x for w, x in zip(weights, inputs)] # 4. nudge each weight
net = sum(x * w for x, w in zip(inputs, weights)) + bias # 1. weighted sum plus bias
bias += lr * err # 5. nudge the bias too
This code has a bug — what's wrong?
def predict(inputs, weights, bias):
net = sum(x * w for x, w in zip(inputs, weights))
return 1 if net >= 0 else 0Implement the OR gate using the perceptron learning rule.
The OR gate outputs 1 if at least one input is 1: - [0, 0] -> 0 - [0, 1] -> 1 - [1, 0] -> 1 - [1, 1] -> 1
Your tasks: 1. Start with weights = [0.0, 0.0] and bias = 0.0. 2. Use a learning rate of 0.1. 3. Run up to 20 epochs. After each epoch, print the epoch number, current weights (rounded to 2 decimal places), and bias (rounded to 2 decimal places). 4. Stop early and print 'Converged!' if an epoch ends with zero errors. 5. After training, print the prediction for each of the four inputs.
Try it live — edit the code and hit Run to execute real Python: