Learning MachinesIntermediate⏱ 11 min09 / 14

Neural Networks & Deep Learning

Peel back the curtain on neural networks: see exactly how a single artificial neuron fires, why stacking layers unlocks superhuman pattern recognition, and what 'deep' really means.

In 2012, a neural network looked at 1.2 million photos and learned to tell cats from cars — with no human writing a single rule. By 2024, similar networks were writing essays, generating images, and detecting tumours earlier than radiologists. How does software made of simple arithmetic become so powerful?

The answer starts surprisingly small: one fake neuron, doing one tiny computation. Build enough of them, wire them together in layers, and something remarkable emerges.

Think of it like

Your Brain, Simplified to One Cell

A real neuron receives electrical signals from many neighbours. It adds them up. If the total is strong enough, it fires — sending its own signal onward. If it's too weak, it stays quiet.

An artificial neuron mimics this exactly: it takes several numbers as inputs, weights each one by importance, sums them up, then decides whether and how strongly to pass a signal forward.

#Inside One Artificial Neuron

A single artificial neuron does three things in order:

Multiply each input by its weight. A weight says how much to trust that input. A weight of 2.0 means "this input matters a lot"; a negative weight means "this pushes against firing".
Add a bias. The bias is a baseline offset — it lets the neuron fire even when all inputs are zero, or stay quiet when they're high.
Pass the sum through an activation function. This is the gate. It squashes the raw number into a useful range and — crucially — introduces non-linearity, which lets networks learn complex, curvy patterns instead of just straight lines.

Formula: output = activation(w1·x1 + w2·x2 + … + bias)

ReLU simply passes positive values through and blocks negatives — brutal simplicity that powers modern AI.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17import math

def relu(x):    return max(0, x)
def sigmoid(x): return 1 / (1 + math.exp(-x))

def neuron(inputs, weights, bias, activation=relu):
    raw = sum(w * x for w, x in zip(weights, inputs)) + bias
    return activation(raw)

# A neuron asking: "is this image bright AND edgy?"
output = neuron(
    inputs=[0.8, 0.6],          # brightness, edge_strength
    weights=[1.5, 2.0],         # edges weighted higher
    bias=-1.0,
    activation=relu
)
print(f"Neuron output: {output:.3f}")

#Stacking Neurons into Layers

One neuron answers one question. Real problems need many questions answered at once — so we run many neurons in parallel, all reading the same inputs. Their outputs become the inputs to the next row of neurons, and so on.

This gives us the classic structure: - Input layer — raw data (pixel values, sensor readings…) - Hidden layers — intermediate detectors for progressively complex patterns - Output layer — the final answer (probabilities for each class)

Every connection carries its own weight. Training means finding the right values for all of them.

A forward pass through two hidden layers — the core of every neural network inference call.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20def relu(x): return max(0, x)

def layer(inputs, weight_matrix, biases):
    outputs = []
    for weights, bias in zip(weight_matrix, biases):
        raw = sum(w * x for w, x in zip(weights, inputs)) + bias
        outputs.append(relu(raw))
    return outputs

inputs = [0.8, 0.6]

W1 = [[1.5, -0.5], [0.2, 2.0], [-1.0, 1.2]]
b1 = [0.0, -0.5, 0.1]
h1 = layer(inputs, W1, b1)
print("Layer 1:", [round(v, 3) for v in h1])

W2 = [[0.8, 0.3, -0.2], [0.1, -0.6, 1.5]]
b2 = [0.1, 0.0]
h2 = layer(h1, W2, b2)
print("Layer 2:", [round(v, 3) for v in h2])

#Why 'Deep' Changes Everything

A network with many hidden layers is called deep — hence Deep Learning. Depth enables hierarchical feature learning. Training on face photos, each layer spontaneously discovers a different level of abstraction:

Layer 1 — horizontal edges, colour blobs
Layer 2 — corners, curves, small textures
Layer 3 — eyes, noses, mouths
Layer 4+ — whole faces, expressions, identity

Nobody programmed those concepts in. The network invented them because they were useful. The same hierarchy appears in language (characters → words → sentences → meaning) and audio (frequencies → phonemes → words → speech).

Three things made this practical around 2012: more data (the internet), more compute (GPUs running parallel matrix math), and better tricks (ReLU, dropout, batch normalisation). The interactive visualiser on this page lights up each layer in sequence so you can watch signal flow from input to output.

#The Full Forward Pass

Softmax converts raw output scores into probabilities that sum to 100% — the final step in most classifiers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24import math

def relu(x): return max(0, x)
def softmax(vals):
    exps = [math.exp(v) for v in vals]
    s = sum(exps)
    return [e/s for e in exps]

def forward(inputs, layers):
    signal = inputs
    for i, (W, b) in enumerate(layers):
        raw = [sum(w*x for w,x in zip(row, signal)) + bias
               for row, bias in zip(W, b)]
        signal = softmax(raw) if i == len(layers)-1 else [relu(v) for v in raw]
    return signal

layers = [
    ([[1.2,-0.4],[0.5,1.8],[-0.3,0.9]], [0.0,-0.2,0.1]),
    ([[1.0,0.2,-0.5],[-0.3,0.8,1.1]],  [0.1, 0.0])
]
probs = forward([0.9, 0.4], layers)
print(f"Class A: {probs[0]:.1%}")
print(f"Class B: {probs[1]:.1%}")
print(f"Prediction: {'A' if probs[0] > probs[1] else 'B'}")

Common mistake

A Forward Pass Isn't Learning

The forward pass only uses the weights the network already has. It doesn't update them. Learning happens during backpropagation — the error is measured and the weights are nudged backward through every layer to reduce it. In PyTorch or TensorFlow, one training step = one forward pass + one backward pass. Everything in this lesson covers the forward direction only.

In practice you'd use torch.nn.Linear or tf.keras.layers.Dense rather than writing weight loops by hand — but understanding the pure-Python version means you'll never be mystified by what those layers actually do.

Quick check

A neuron receives inputs [0.5, 1.0] with weights [2.0, −1.5] and a bias of 0.5. It uses ReLU. What is its output?

Key takeaways

A neuron computes a weighted sum of its inputs plus a bias, then applies an activation function like ReLU to decide how strongly to fire.
Stacking neurons in layers lets a network answer many sub-questions at once; each layer's outputs become the next layer's inputs.
Depth enables hierarchical feature learning — early layers detect edges, later layers detect objects — without any human feature engineering.
The forward pass is pure arithmetic flowing input-to-output; actual learning (backpropagation) is a separate step that adjusts the weights.
Data scale, GPU compute, and better training tricks are why deep learning went from a curiosity to the engine behind modern AI.

Try it yourself · Forward pass

Watch activations flow layer by layer to a prediction.

A forward pass: values flow left→right. Each neuron adds up its weighted inputs, applies an activation, and passes the result on — until the output layer makes a prediction.

ready

Practice challenges

Test yourself · earn XP

0/4

Predict the output#1

This single artificial neuron computes a weighted sum plus bias, then applies ReLU. What does it print?

predict-output

1
2
3
4
5
6
7
8
9
10
11
12def relu(x): return max(0, x)

def neuron(inputs, weights, bias):
    raw = sum(w * x for w, x in zip(weights, inputs)) + bias
    return relu(raw)

output = neuron(
    inputs=[1.0, 2.0],
    weights=[0.5, -1.0],
    bias=1.5
)
print(f"Neuron output: {output:.1f}")

Fix the bug#2

This code has a bug — what's wrong? The neuron should follow the lesson's formula output = activation(w1*x1 + w2*x2 + ... + bias).

fix-bug

1
2
3
4
5
6def relu(x): return max(0, x)

def neuron(inputs, weights, bias):
    # weighted sum of the inputs, then add the bias
    raw = sum(w + x for w, x in zip(weights, inputs)) + bias
    return relu(raw)

Fill in the blank#3

Complete the two activation functions from the lesson: ReLU (blocks negatives) and sigmoid. Fill in the missing pieces.

import math

def relu(x):
    return max(, x)

def sigmoid(x):
    return 1 / (1 + math.exp())

Reorder the lines#4

Put these lines in the correct order to run a full forward pass through a 2-layer classifier, matching the structure taught in the lesson (hidden layer with ReLU, output layer with softmax).

probs = softmax(raw_out)                             # 4. squash scores into probabilities

h1 = [relu(sum(w*x for w,x in zip(row, inputs)) + b) for row, b in zip(W1, b1)]  # 2. hidden layer, ReLU

raw_out = [sum(w*x for w,x in zip(row, h1)) + b for row, b in zip(W2, b2)]       # 3. output layer, raw scores

inputs = [0.9, 0.4]                                  # 1. raw input layer

print('Prediction:', 'A' if probs[0] > probs[1] else 'B')  # 5. pick the higher class

Your turn

Practice exercise

Build a function predict(inputs, threshold) that runs a forward pass through the hardcoded two-layer network below and returns 'spam' if the output neuron fires above threshold, or 'not spam' otherwise. Test it with at least three different input vectors and observe which combinations of features tip the network into spam territory.

Try it live — edit the code and hit Run to execute real Python:

solution.py · editable

import math

# Tiny email-classifier network
# Inputs: [word_count_normalised, exclamation_count_normalised]
W1 = [[2.0, 0.5], [0.1, 3.0], [1.5, -0.5]]
b1 = [-0.8, -1.0, 0.2]

W2 = [[1.2, 2.5, -0.3]]
b2 = [-0.5]

def relu(x):
    return max(0, x)

def sigmoid(x):
    return 1 / (1 + math.exp(-x))

def predict(inputs, threshold=0.5):
    # TODO:
    # 1. Run layer 1: for each (row, bias) in zip(W1, b1),
    #    compute relu(dot(row, inputs) + bias). Collect into h1.
    # 2. Run layer 2: same with W2[0] and b2[0], but use sigmoid.
    # 3. Return 'spam' if output > threshold, else 'not spam'.
    pass

print(predict([0.2, 0.1]))   # short email, few !
print(predict([0.9, 0.8]))   # long email, lots of !
print(predict([0.5, 0.9]))   # medium email, many !