Neural Networks & Deep Learning
Peel back the curtain on neural networks: see exactly how a single artificial neuron fires, why stacking layers unlocks superhuman pattern recognition, and what 'deep' really means.
In 2012, a neural network looked at 1.2 million photos and learned to tell cats from cars — with no human writing a single rule. By 2024, similar networks were writing essays, generating images, and detecting tumours earlier than radiologists. How does software made of simple arithmetic become so powerful?
The answer starts surprisingly small: one fake neuron, doing one tiny computation. Build enough of them, wire them together in layers, and something remarkable emerges.
Your Brain, Simplified to One Cell
A real neuron receives electrical signals from many neighbours. It adds them up. If the total is strong enough, it fires — sending its own signal onward. If it's too weak, it stays quiet.
An artificial neuron mimics this exactly: it takes several numbers as inputs, weights each one by importance, sums them up, then decides whether and how strongly to pass a signal forward.
#Inside One Artificial Neuron
A single artificial neuron does three things in order:
- Multiply each input by its weight. A weight says how much to trust that input. A weight of
2.0means "this input matters a lot"; a negative weight means "this pushes against firing". - Add a bias. The bias is a baseline offset — it lets the neuron fire even when all inputs are zero, or stay quiet when they're high.
- Pass the sum through an activation function. This is the gate. It squashes the raw number into a useful range and — crucially — introduces non-linearity, which lets networks learn complex, curvy patterns instead of just straight lines.
Formula: output = activation(w1·x1 + w2·x2 + … + bias)
import math
def relu(x): return max(0, x)
def sigmoid(x): return 1 / (1 + math.exp(-x))
def neuron(inputs, weights, bias, activation=relu):
raw = sum(w * x for w, x in zip(weights, inputs)) + bias
return activation(raw)
# A neuron asking: "is this image bright AND edgy?"
output = neuron(
inputs=[0.8, 0.6], # brightness, edge_strength
weights=[1.5, 2.0], # edges weighted higher
bias=-1.0,
activation=relu
)
print(f"Neuron output: {output:.3f}")#Stacking Neurons into Layers
One neuron answers one question. Real problems need many questions answered at once — so we run many neurons in parallel, all reading the same inputs. Their outputs become the inputs to the next row of neurons, and so on.
This gives us the classic structure: - Input layer — raw data (pixel values, sensor readings…) - Hidden layers — intermediate detectors for progressively complex patterns - Output layer — the final answer (probabilities for each class)
Every connection carries its own weight. Training means finding the right values for all of them.
def relu(x): return max(0, x)
def layer(inputs, weight_matrix, biases):
outputs = []
for weights, bias in zip(weight_matrix, biases):
raw = sum(w * x for w, x in zip(weights, inputs)) + bias
outputs.append(relu(raw))
return outputs
inputs = [0.8, 0.6]
W1 = [[1.5, -0.5], [0.2, 2.0], [-1.0, 1.2]]
b1 = [0.0, -0.5, 0.1]
h1 = layer(inputs, W1, b1)
print("Layer 1:", [round(v, 3) for v in h1])
W2 = [[0.8, 0.3, -0.2], [0.1, -0.6, 1.5]]
b2 = [0.1, 0.0]
h2 = layer(h1, W2, b2)
print("Layer 2:", [round(v, 3) for v in h2])#Why 'Deep' Changes Everything
A network with many hidden layers is called deep — hence Deep Learning. Depth enables hierarchical feature learning. Training on face photos, each layer spontaneously discovers a different level of abstraction:
- Layer 1 — horizontal edges, colour blobs
- Layer 2 — corners, curves, small textures
- Layer 3 — eyes, noses, mouths
- Layer 4+ — whole faces, expressions, identity
Nobody programmed those concepts in. The network invented them because they were useful. The same hierarchy appears in language (characters → words → sentences → meaning) and audio (frequencies → phonemes → words → speech).
Three things made this practical around 2012: more data (the internet), more compute (GPUs running parallel matrix math), and better tricks (ReLU, dropout, batch normalisation). The interactive visualiser on this page lights up each layer in sequence so you can watch signal flow from input to output.
#The Full Forward Pass
import math
def relu(x): return max(0, x)
def softmax(vals):
exps = [math.exp(v) for v in vals]
s = sum(exps)
return [e/s for e in exps]
def forward(inputs, layers):
signal = inputs
for i, (W, b) in enumerate(layers):
raw = [sum(w*x for w,x in zip(row, signal)) + bias
for row, bias in zip(W, b)]
signal = softmax(raw) if i == len(layers)-1 else [relu(v) for v in raw]
return signal
layers = [
([[1.2,-0.4],[0.5,1.8],[-0.3,0.9]], [0.0,-0.2,0.1]),
([[1.0,0.2,-0.5],[-0.3,0.8,1.1]], [0.1, 0.0])
]
probs = forward([0.9, 0.4], layers)
print(f"Class A: {probs[0]:.1%}")
print(f"Class B: {probs[1]:.1%}")
print(f"Prediction: {'A' if probs[0] > probs[1] else 'B'}")A Forward Pass Isn't Learning
The forward pass only uses the weights the network already has. It doesn't update them. Learning happens during backpropagation — the error is measured and the weights are nudged backward through every layer to reduce it. In PyTorch or TensorFlow, one training step = one forward pass + one backward pass. Everything in this lesson covers the forward direction only.
In practice you'd use torch.nn.Linear or tf.keras.layers.Dense rather than writing weight loops by hand — but understanding the pure-Python version means you'll never be mystified by what those layers actually do.
A neuron receives inputs [0.5, 1.0] with weights [2.0, −1.5] and a bias of 0.5. It uses ReLU. What is its output?
Key takeaways
- A neuron computes a weighted sum of its inputs plus a bias, then applies an activation function like ReLU to decide how strongly to fire.
- Stacking neurons in layers lets a network answer many sub-questions at once; each layer's outputs become the next layer's inputs.
- Depth enables hierarchical feature learning — early layers detect edges, later layers detect objects — without any human feature engineering.
- The forward pass is pure arithmetic flowing input-to-output; actual learning (backpropagation) is a separate step that adjusts the weights.
- Data scale, GPU compute, and better training tricks are why deep learning went from a curiosity to the engine behind modern AI.
A forward pass: values flow left→right. Each neuron adds up its weighted inputs, applies an activation, and passes the result on — until the output layer makes a prediction.
This single artificial neuron computes a weighted sum plus bias, then applies ReLU. What does it print?
def relu(x): return max(0, x)
def neuron(inputs, weights, bias):
raw = sum(w * x for w, x in zip(weights, inputs)) + bias
return relu(raw)
output = neuron(
inputs=[1.0, 2.0],
weights=[0.5, -1.0],
bias=1.5
)
print(f"Neuron output: {output:.1f}")This code has a bug — what's wrong? The neuron should follow the lesson's formula output = activation(w1*x1 + w2*x2 + ... + bias).
def relu(x): return max(0, x)
def neuron(inputs, weights, bias):
# weighted sum of the inputs, then add the bias
raw = sum(w + x for w, x in zip(weights, inputs)) + bias
return relu(raw)Complete the two activation functions from the lesson: ReLU (blocks negatives) and sigmoid. Fill in the missing pieces.
import math def relu(x): return max(, x) def sigmoid(x): return 1 / (1 + math.exp())
Put these lines in the correct order to run a full forward pass through a 2-layer classifier, matching the structure taught in the lesson (hidden layer with ReLU, output layer with softmax).
probs = softmax(raw_out) # 4. squash scores into probabilities
h1 = [relu(sum(w*x for w,x in zip(row, inputs)) + b) for row, b in zip(W1, b1)] # 2. hidden layer, ReLU
raw_out = [sum(w*x for w,x in zip(row, h1)) + b for row, b in zip(W2, b2)] # 3. output layer, raw scores
inputs = [0.9, 0.4] # 1. raw input layer
print('Prediction:', 'A' if probs[0] > probs[1] else 'B') # 5. pick the higher classBuild a function predict(inputs, threshold) that runs a forward pass through the hardcoded two-layer network below and returns 'spam' if the output neuron fires above threshold, or 'not spam' otherwise. Test it with at least three different input vectors and observe which combinations of features tip the network into spam territory.
Try it live — edit the code and hit Run to execute real Python: