As you learned from the introductory topic, a neural network is a type of machine learning model that is made up of layers of interconnected "neurons" that process information and make decisions. They are used for a wide variety of tasks such as image recognition, speech recognition, and natural language processing. In this topic, you'll learn about some essential parts of neural network architecture, how those parts are built, and how they interact with each other.
The main idea
The idea of a neural network might be compared to how neurons in a living neural system work (though it is not a complete analogy since we'll simplify some things, and the brain is a much more complex structure). In our brain, we have numerous neurons which are somehow connected to each other. Thanks to these connections, each neuron is able to somehow unify all the information it gets and process it, then pass it on further. After a bunch of such transformations, our brain can, for example, identify if the image in front of us is an image of a cat or a dog.
In a neural network, neurons have pretty much the same functionality. They also get data, process it and pass it further. The neurons of a neural network are interconnected in a special way, making up the so-called architecture of a neural network.
In the simplest case, we have several "layers" of neurons forming the network, which you'll see in some pictures later on.
These layers can be ideologically divided into three groups:
-
Input layers (we have one of that in each neural network);
-
Output layers (same thing here);
-
Hidden layers (these can vary drastically, and we usually tend to have more than one of these).
In this topic, we’ll consider the simplest architecture, namely, the fully-connected neural network. The definition of this type of neural network is quite straightforward. Every neuron in one layer is connected to every neuron in the other layer. That can be illustrated with the following graph:
Now let’s dive into the details of how those layers are built and why we need them.
How neurons work
Before we move further, it's time for us to bring some maths in here. There are two transformations that each neuron after the input layer must undergo. Let’s imagine that each neuron in our fully-connected neural network is some kind of a computational pipeline. We start with the input. Each neuron gets the output of each neuron from the previous layer. If the previous layer has, for example, 16 neurons that output only one number, the input of each neuron in the next layer will be a vector with 16 elements.
The first transformation is a linear combination of all input values. This means that we multiply each input with a weighted scalar and sum up all the results. That gives us the following formula:
Here the is a value of the -th neuron of the input layer, while is the weight of the link between the and neurons. As usual, when it comes to linear combinations, we also include the bias term.
That was the first stage of our pipeline. However, to model complex (non-linear) relationships, the linear combination is not enough. That's why the next crucial stage to add some non-linearity is an activation function. There are quite a lot of them. Let's get back to our comparison with the neurons in the brain. They carry out the same set of transformations. They sum up all the signal's potentials, and if the potential is greater than a certain threshold, the signal will be passed on, otherwise, it will fade out. An activation function works in a similar way. Let's look at the example of an activation function in computational neural networks. Being a sigmoid function, it looks like this:
Here we have a similar situation: if the output of the linear combination is less than circa -5, it will be mapped to 0. If it's greater than circa 5, it becomes 1. With the evolution of deep learning, more training-efficient activation functions have been introduced. We'll discuss this in detail in the upcoming topics.
For now, let's get back to the layers.
Input layer
In order to continue moving further, we need first to make things a bit more "ML-ish". So, we will first determine a dataset we are going to work with. Traditionally, MNIST is a good choice, consisting of images of handwritten numbers from 0 to 9 with the resolution of pixels. Our task is to predict which number is which.
At the input layer, each neuron will actually be responsible for a single feature — the brightness of an individual pixel. For example, a white pixel will have the maximal value (for MNIST that is 255) while a black one would have zero value. These neurons form the so-called input layer of a neural network. Since there is a neuron for each pixel, we would have neurons at the input layer for a pixels image with one channel, that is, for black-and-white images. Note that if we were working with an RGB image, it would include 3 channels so we'd multiply the number by 3. No transformations are done at this layer. The only function of the neurons in the input layer is to get the pixel values and to pass them forward to the hidden or, in the simplest case, to the output layer.
Output layer
Now we need to predict the values. Since we now only have neurons and nothing else, let's make another layer of neurons called the output layer. Neurons on this layer are responsible for a specific class we want to predict. In our example, we'll have 10 neurons in the output layer: neuron 0 for "0", neuron 1 for "1", and so on. The idea is that the higher the output value of a neuron in the output layer, the higher the model's confidence that the picture it got at the input layer is related to the number the neuron in the output layer represents. For example, if a neuron representing the number 1 has a 0.1 value in it, while a neuron for number 2 has a value of 1, the network is quite sure that it is not "1" that is in the picture but "2".
For each input, the two transformations introduced above are carried out and that is how we obtain an output at the output layer.
This explanation might seem a little bit tricky before you actually see the numbers and the so-called feedforward, which is the way we actually get a prediction from a model. For now, you just need to understand that:
-
We have one neuron in the output layer for each answer possible.
-
The more the value of a neuron, the more the network’s confidence that the prediction for the input is the answer this neuron represents.
In the image below you can see a simple neural network consisting just of an input and an output layer.
Hidden layers
Hidden layers are actually the power of the neural network. These are the layers that are going to make our model more complicated, allowing us to find out more complex patterns in our data.
As we've already mentioned above, hidden layers work in exactly the same terms of maths as the output layer does. That is, an activation function follows a linear combination of the input. The only difference is their goal: while the output layer corresponds to the actual predictions we want to make, the hidden layers simply "preprocess" our data and pass it to the next layer.
Conclusion
In this topic you’ve learned some basics on what the simplest neural networks, the fully-connected networks, look like. The key thing to keep in mind is that there are three types of layers:
- Input layer
- Output layer
- Hidden layers
Both input and output layers are unique, while there can be multiple hidden layers. The input layer gets the features and passes these to hidden layers. The hidden layers process the data and pass it to the output layer where predictions are made.