Intuition

All hard problems can essentially be boiled down to functions (well almost all of them). These could be:

By training a Neural Networks, we’re simply undergoing a process to find the specific function. For the first one it could be easy. But for the others? How do we do it?

What:

Neural Networks refers to an entire category of AI models loosely inspired by the brain. The callout above is a pretty good intuition (credit Pierre Mackenzie!).

Components of a Neural Network:

The Neuron:

A fundamental building block of an NN is the Neuron. Like the human brain, they take an input and, based on the strength of it’s connections, it will fire a signal of specific strength forwards. (The strength of it’s connections are learned / honed from experience).

When you put a whole collection of them together (i.e. network or brain), emergent properties arise (i.e. ability to ability to walk etc.).

The Neuron, Dissected:

  • The neuron has a bunch of inputs ().
  • Each has a corresponding connection (i.e. weight) connecting it to the neuron.
  • Each neuron also has a bias number. (Helps ensure the model can learn something of value, even when the inputs are )
  • The “Linear Function” is literally just computing the weighted sum of the inputs (and the bias!).
  • The linear function () is calculated by:
  • The “Activation Function” (e.g. ReLU, Sigmoid, Softmax etc.). allows us to learn non-linearity. If we didn’t have that, our decision boundaries could not have curves.

Fun Fact!

A Neural Network with single linear layer (and no activation function) is called The Perceptron. When they were first invented ~70’s, they were considered the absolute state of the art. Unfortunately, they sucked at learning basically anything.

It’s only been more recently that we’ve had the compute and data to scale a bunch more layers. Doing this (Deep Learning) has allowed us to truly get value from them.

How Do They Learn?

Take the following Neural Network, it’s got 784 inputs, 2 hidden layers, and 10 outputs.:

Training, at a super high level:

  1. Choose the hyperparameters (architecture and optimiser)
  2. Initialise the model (randomly)
  3. Optimise the model with Gradient Descent. (Click for way more details)
    (a) Sample an input–label pair from the data
    (b) Perform a forward pass to obtain a prediction
    (c) Calculate the loss (Cross-Entropy Loss) between the prediction and the label
    (d) Back-propagate to get the gradient of the loss wrt parameters
    (e) Parameter update

Types of Neural Networks:

  1. Feedforward Neural Networks (FNNs):

    • The simplest type, where information flows in one direction (from input to output).
    • Multilayer Perceptrons (MLPs) are a type of feedforward neural network with one or more hidden layers.
    • MLPs are fully connected, meaning every node in one layer is connected to every node in the next layer.
  2. Convolutional Neural Networks (CNNs):

    • Specialised for tasks like image recognition and processing.
    • They use convolutional layers that apply filters to detect features (like edges, shapes, and patterns).
  3. Recurrent Neural Networks (RNN) (RNNs):

    • Designed for sequence-based data like time series or language models.
    • RNNs have connections that form directed cycles, enabling them to have “memory” of previous inputs.
  4. Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs):

  • These are more advanced forms used for generative tasks (e.g., creating new images, enhancing photos).
  1. Transformers:
  • A more recent and powerful architecture for Natural Language Processing (NLP) and computer vision.
  • The Transformer architecture underlies many state-of-the-art models (e.g., GPT and BERT).
  • It’s phenomenal, thanks to attention