Multilayer perceptrons (MLPs) or feedforward neural networks are a class of neural networks that are composed of multiple layers of perceptrons, which are small processing units, connected in a feedforward manner. They are widely used for supervised learning. With one or more hidden layers between the input and output layers, they can model complex relationships between inputs and outputs.
They are popular for pattern recognition tasks such as computer vision and NLP. Their ability to model complex nonlinear relationships using multiple layers of simple neuron-like processing units makes them useful for real-world pattern modeling across several domains.
What are perceptrons?
Perceptron is a class of artificial neurons that represents the simplest possible neural network unit. It operates on numeric inputs, calculating the weighted sum of the inputs similar to linear regression models.
The illustration above depicts a perceptron with three input nodes each associated with weights respectively. These weight values determine the relative influence that each input has on the output . Let's see how perceptron computes the output.
The output — either 0 or 1, is determined by whether this weighted sum exceeds a pre-defined threshold value .
The most minimalistic perceptron consists of a single input layer and an output node. The goal of a feed-forward network is to approximate some function by learning the parameters that best define the input-output relation —
where is the input, is the output, and denotes the parameters to be learned.
Increased complexity: multi-layered perceptron network
In contrast to a single perceptron, MLPs are characterized by a layered architecture consisting of an input layer, one or more hidden layers, and an output layer.
Before proceeding further, its important to get an intuition what role does number of hidden layer play in the network.
When no hidden layers are present, the MLP reduces to a linear regression model. In this simplified form, the network lacks the capability to capture nonlinear patterns and can only model linear relationships between the input and output variables. As the number of hidden layers increases, the network gains capacity to capture nonlinear patterns, enabling it to model increasingly complex relationships between variables. Thus, the degree of non-linearity in the data determines the complexity of the network required, with more hidden layers facilitating the modeling of higher-order nonlinear relationships.
The hidden units combine inputs via a set of learned weights and activation functions into meaningful features. Also, it applies nonlinear activations (such as ReLU, Tanh, etc) to transform input representations into formats that perform the separation of not-linearly separable classes.
The training of a feed-forward network involves iteratively adjusting the parameters to achieve the best possible approximation of the target function. This approximation known as the Universal Approximation Theorem states that a multilayer perceptron (MLP) with at least one single hidden layer containing a finite number of neurons can approximate any continuous function. In simple words it tells that neural networks are strong function approximators.
Computational graphs
Computational graphs are a way of expressing mathematical expression in a directed graph where nodes correspond to variables. The nodes are ordered in such a way that we can compute their output one after the other. Nodes in the graph correspond to input or output variables and mathematical operation, while edges represent the flow of data between them. The direction of edges indicate the sequence of operations.
Let's understand better with examples. The goal is to represent . In this example the two variable circles ( and ) connect with arrows to a single output circle (), showing that is calculated by adding and .
Another example: :
Forward propagation in MLP
Forward propagation refers to the sequential calculation of outputs from the input layer through the hidden layers all the way to the output layer. During the forward pass, each neuron receives inputs from the previous layer, performs a dot product with the connection weights, adds the bias and applies the activation function to compute its output. This output then serves as input to the next layer, propagating information forward from input features all the way through the neural network layers. Let's see this mathematically.
The network parameters are the weights and the biases (the parameters are the values that are learned during training):
— weight from the neuron in the layer to the neuron in the layer.
— bias of the neuron in the layer.
Here, — activation or output of the neuron in the layer. The activation of the neuron the layer is related to the activation in the layer by the equation
where the sum is over all neurons in layer and denotes the activation function.
Compact vectorized form
Let
where is the weighted inputs to all the neurons in layer .
An example of forward propagation
Let's take a step-by-step look into forward propagation with the help of a small example. Consider the MLP below.
We first write the weight matrices (), and input vector ():
Here are the given ground truth. Our goal is to compute the hidden layer nodes i.e. and also the predictions at the output node .
Lets start by computing the net input for layer or hidden layer, i.e and :
Similarly computing net input for :
Further at the output layer, i.e and :
Summarizing above computations, we have
Recall that is the output of neuron in the layer. Therefore, and represent output from layer, and and represent activation to the last layer. This computation represents one-full forward propagation. Further, in the next step error is computed, and backward propagation takes place.
Conclusion
As a result, you are now familiar with the following:
Perceptron is the simplest artificial neural network unit that makes binary predictions by comparing the weighted sum of numeric inputs to a threshold, mimicking a biological neuron.
MLPs are inspired by biological neural networks where neurons communicate signals through synaptic connections that are strengthened by learning.
MLPs exhibit a hierarchical architecture with input, hidden, and output layers, allowing them to model intricate patterns and relationships within data.
Computational graphs allow simple functions to be combined to form quite complex models in neural networks.
MLP networks involve the sequential calculation of outputs from input to output layers, via hidden layers where each neuron applies a weighted sum of inputs, adding bias, and applying an activation function.