Neural NetworksIntermediate10 min12 / 13

Neural Networks: The Basics

Peek inside the artificial neuron — the tiny building block that, when stacked into layers, lets machines recognize faces, translate languages, and beat world champions at chess.

Your brain right now is doing something remarkable: billions of neurons are firing in coordinated waves, letting you read these words, feel the chair beneath you, and think about lunch — all at once. Machine learning researchers looked at this biological marvel and asked: what if we built a simplified version in software?

The result is the artificial neural network — the technology powering image recognition, voice assistants, language translation, and almost every recent AI breakthrough. The beautiful part: the core idea fits in a single line of arithmetic. Let's build one from scratch.

#The Building Block: One Artificial Neuron

A single artificial neuron does three things:

  1. Receives inputs — numbers representing features (e.g., pixel brightness, temperature, word frequency).
  2. Weights and sums — multiplies each input by a learned weight, adds them up, then adds one extra number called the bias.
  3. Activates — passes the sum through an activation function that decides how strongly the neuron "fires".

In math: output = activation(w1*x1 + w2*x2 + ... + wn*xn + bias)

Weights control how much each input matters — large positive weight means "strong yes signal", large negative means "strong no signal", near zero means "mostly ignore this". The bias shifts the neuron's default threshold so it can fire even when inputs are zero, or stay quiet when they're large. Both weights and bias start random and are learned from data.

Think of it like

The Neuron as a Voting Panel

Imagine three judges scoring a restaurant. Judge 1 (food quality) gets weight 0.6 — their vote matters most. Judge 2 (price) gets 0.3. Judge 3 (distance) gets 0.1. Each gives a score 0–10.

The neuron multiplies and sums: 0.6*9 + 0.3*6 + 0.1*2 = 7.4. The bias is a mood nudge — say +0.1. The activation function then decides if 7.4 is strong enough to recommend the restaurant.

Weights = how much each input matters. Bias = the default baseline. The network learns both.

ReLU passes the value through unchanged (it's positive). Sigmoid squashes it to a probability near 1 — a strong 'yes'.
import math

def relu(x):    return max(0.0, x)
def sigmoid(x): return 1.0 / (1.0 + math.exp(-x))

# One neuron: weighted sum + bias + activation
inputs  = [0.9, 0.6, 0.2]   # food quality, price, distance
weights = [0.6, 0.3, 0.1]
bias    = 0.1

raw = sum(x * w for x, w in zip(inputs, weights)) + bias
print(f"Weighted sum + bias : {raw:.3f}")
print(f"After ReLU          : {relu(raw):.3f}")
print(f"After Sigmoid       : {sigmoid(raw):.3f}")

#Activation Functions and Layers

Without an activation function, stacking layers would be pointless — any number of linear transformations collapses into one. Activation functions break this by introducing non-linearity:

  • ReLU max(0, x) — outputs the input if positive, zero otherwise. Fast and the default for hidden layers.
  • Sigmoid — squashes any number to (0, 1). Great for an output neuron predicting a probability.
  • Tanh — squashes to (−1, 1), often used in recurrent networks.

Now stack neurons side by side to form a layer, and stack layers in sequence to form a network. Three kinds of layers:

  • Input layer: receives raw data, one node per feature. No computation — just passes values forward.
  • Hidden layer(s): each neuron takes all outputs from the previous layer, applies weights + bias + activation. This is where intermediate patterns are learned.
  • Output layer: produces the final prediction. One sigmoid neuron for yes/no; multiple neurons for multi-class.

Data flows in one direction — input → hidden → output — making this a feedforward network. The interactive visualizer on this page lights up each neuron as activations ripple forward.

#The Forward Pass: Turning Inputs into a Prediction

The full forward pass: inputs ripple through 3 hidden ReLU neurons, then a sigmoid output neuron produces a probability.
import math

def relu(x):    return max(0.0, x)
def sigmoid(x): return 1.0 / (1.0 + math.exp(-x))

# 2 inputs -> 3 hidden neurons -> 1 output
# Each neuron stored as (weights, bias)
hidden = [([0.5, -0.4], 0.1), ([0.2, 0.8], -0.3), ([-0.6, 0.3], 0.5)]
output = [([0.7, 0.4, -0.5], 0.2)]

def forward(inputs):
    # Hidden layer (ReLU)
    hidden_out = [relu(sum(x*w for x,w in zip(inputs,ws)) + b) for ws,b in hidden]
    # Output layer (sigmoid -> probability)
    ws, b = output[0]
    return sigmoid(sum(x*w for x,w in zip(hidden_out,ws)) + b)

pred = forward([1.0, 0.5])
print(f"Prediction: {pred:.4f}  ->  {'Yes' if pred >= 0.5 else 'No'} ({pred:.1%} confidence)")
Common mistake

More Layers Doesn't Always Mean Better

It's tempting to think deeper = smarter. But without enough data, extra layers cause overfitting — the network memorizes training examples and falls apart on new ones.

Very deep networks also historically suffered from the vanishing gradient problem: error signals shrank toward zero as they flowed backward through many layers, so early layers stopped learning. Modern fixes include ReLU activations, residual (skip) connections, and batch normalization. Rule of thumb: start simple. A 1–2 hidden-layer network solves a surprising number of real problems.

#How Weights Are Learned: Backprop + Gradient Descent

So far our weights are made up. In practice the network learns them through a two-phase cycle repeated thousands of times:

  1. Forward pass: feed an example through the network to get a prediction.
  2. Backpropagation: compare the prediction to the true label, compute the loss, then walk backwards through every layer computing gradients — how much did each weight contribute to the error?
  3. Gradient descent update: nudge every weight a tiny step in the direction that reduces the loss (weight -= learning_rate * gradient).

Libraries like PyTorch, TensorFlow, and scikit-learn's MLPClassifier compute the backward pass automatically. But the underlying idea is the same gradient descent loop from the previous lesson — applied to every weight in every layer simultaneously. Once you grasp the forward pass conceptually, these libraries feel much less like magic.

Tip

Why Non-Linearity Enables Complex Learning

Each neuron is a detector for one simple pattern. The hidden layer combines many detectors. The next layer combines combinations of combinations. After a few layers a network can detect faces, sentences, or fraud — patterns no single rule could capture. Non-linear activation is the secret ingredient: without it, every layer would collapse into one linear transformation, no matter how deep.

Quick check

In a neural network, what is the purpose of the bias term added to each neuron?

Key takeaways

  • A neuron computes a weighted sum of its inputs plus a bias, then passes the result through an activation function — that's the entire core operation.
  • Weights control how much each input matters; bias shifts the neuron's default firing threshold. Both are learned from data via gradient descent.
  • Activation functions like ReLU introduce non-linearity, which is what allows stacked layers to learn complex, curved decision boundaries.
  • The forward pass is the sequential flow of data through input → hidden → output layers, producing a prediction.
  • Weights are learned by running the forward pass, measuring the error, and using backpropagation + gradient descent to nudge every weight in the direction that reduces the loss.
Try it yourself · Forward pass
Watch activations flow layer by layer to a prediction.
inputhiddenhiddenoutput

A forward pass: values flow left→right. Each neuron adds up its weighted inputs, applies an activation, and passes the result on — until the output layer makes a prediction.

ready
Practice challenges
Test yourself · earn XP
0/4
Predict the output#1

A single neuron computes its weighted sum plus bias, then passes it through ReLU. What does this print?

predict-output
def relu(x):
    return max(0.0, x)

inputs  = [2.0, 3.0]
weights = [0.5, -1.0]
bias    = 1.0

raw = sum(x * w for x, w in zip(inputs, weights)) + bias
print(relu(raw))
Fix the bug#2

This code is meant to compute a neuron's output as activation(weighted sum + bias), but the result is wrong. What's the bug?

fix-bug
import math

def sigmoid(x):
    return 1.0 / (1.0 + math.exp(-x))

inputs  = [1.0, 0.5]
weights = [0.8, -0.5]
bias    = 0.1

raw = sum(x * w for x, w in zip(inputs, weights))
print(sigmoid(raw))
Fill in the blank#3

Complete the neuron so it applies the sigmoid activation to the weighted sum plus bias. Fill in the missing function name.

import math

def sigmoid(x):
    return 1.0 / (1.0 + math.exp(-x))

def neuron(inputs, weights, bias):
    total = sum(x * w for x, w in zip(inputs, weights)) + bias
    return (total)
Reorder the lines#4

Put these lines in the correct order to define and run a forward pass through one hidden layer (ReLU) and one sigmoid output neuron.

1
hidden = [([0.5, -0.4], 0.1), ([0.2, 0.8], -0.3)]
2
raw = sum(h*w for h,w in zip(hidden_out, output_ws)) + output_b
3
print(sigmoid(raw))
4
hidden_out = [relu(sum(x*w for x,w in zip([1.0, 0.5], ws)) + b) for ws, b in hidden]
5
output_ws, output_b = [0.7, 0.4], 0.2
Your turn
Practice exercise

Build a single-neuron classifier from scratch.

You have three training examples, each with two inputs and a label (0 or 1). Complete the neuron(inputs, weights, bias) function so it: 1. Computes the weighted sum of inputs. 2. Adds the bias. 3. Applies the sigmoid activation and returns the result.

Then loop through the examples, print each prediction (4 decimal places), and print whether round(pred) matches the label.

Use weights = [0.8, -0.5] and bias = 0.1.

Try it live — edit the code and hit Run to execute real Python:

solution.py · editable