~/leocamacho.co

Get Around

🧠 EdinburghAI
Co-founder and President of my University's AI Society
🛠️ Projects
Side projects I've worked on
📝 Essays
Thoughts on AI, startups, and the future

Contact Me

📧 Email
💼 LinkedIn
🐦 Twitter

Hyperparameter Optimisation

Made Jan 18, 2026modified Jan 18, 20262 min read

Early Stopping:

More available here: Gradient Descent

L2 Regularisation:

As described here, we add penalty terms to the loss function that discourages large weights.
- New Loss: $E_{t o t a l} = E_{t r ainin g} + λ \cdot E_{w e i g h t s}$
- L2 Penalty: $E_{W} = \frac{1}{2} \sum_{i} w_{i}^{2}$ (the sum of all squared weights).
The optimiser now tries to minimise both training error and the size of the weights. This forces the model to use smaller, smoother weights at every single step.

L1 Regularisation:

Similar to L2, but the penalty is different.
- L1 Penalty: $E_{W} = \sum_{i} ∣ w_{i} ∣$ (the sum of all absolute values of weights).
- Key Effect: L1 encourages sparsity. It has a strong tendency to push many weights to be exactly zero. This effectively “turns off” unhelpful features, making it useful for feature selection.

Dropout:

In Training: In mini-batches, randomly “drop” a fraction of nodes. The forward and backward pass happens on these thinned networks.
In Testing: Use the full network (scaling the activation functions to account for there being more contributions).
Intuition: It’s like many networks. When combined, their opinions are combined. This way, each weight is actually pulling its own weight and become a useful feature detector.

Data Augmentation:

Literally just make synthetic data by applying realistic transformation to your existing data.

Graph View

Early Stopping:
L2 Regularisation:
L1 Regularisation:
Dropout:
Data Augmentation:

Backlinks

AI
Regularisation
Scaling Laws

Created with Quartz v4.4.0 © 2026

GitHub
LinkedIn
Twitter