Padding is one of the techniques in CNNs that helps to control the size of the outputs after performing a convolution.
In this topic, we will see why padding is used and take a look at some common padding methods.
The setup for padding
An important aspect of any CNN is the capacity to pad the input, expanding its spatial dimensionality. Without this, the output's width decreases by one pixel less than the filter's width with each layer. Through padding the input, we can independently manage the filter's dimensionality and the output's size. In situations where zero padding is not present, we are compelled to decide between rapidly downsizing the network's spatial reach, which restrict the network's expressive ability.
In the most extreme scenario, no padding is used at all and the convolution filters can only access positions where it's completely inside the image. This means every pixel in the output is influenced by an equal number of pixels in the input, leading to a more consistent output pixel behavior. However, the output size decreases with each layer. Given an input image with width and a filter with width , the output width will be . This reduction in size can be quite significant if large filters are utilized. A shrinkage greater than zero restricts the number of convolutional layers that can be incorporated in the network. As more layers are added, the network's spatial dimension will ultimately reduce to , making any additional layers non-convolutional. This can be intuitively understood considering the following synthetic example:
So, in order to avoid the issue described above, it's quite common to append values to the edges of the input.
A note on the same and valid padding
Typically, the same padding mode is used to make sure the output vector matches the dimension of the input vector. The padding parameter, , is calculated based on the filter size and the necessity for the input and output sizes to be equal. Consequently, the architectural options for the subsequent layer remain the same by the convolution operation, allowing for as many convolutional layers as the hardware allows. However, the input pixels closer to the edge have less impact on output pixels compared to those nearer to the middle, potentially resulting in a marginal representation of edge pixels in the model.
Executing a convolution with valid padding happens when (no padding). The preferred padding mode in CNNs is the same padding.
The values: constant, reflection, circular, replication
One of the most common settings is to append zeros to the edges of the input:
With zero padding, several problems are present. Padding bias is an issue that arises, as it can emphasize border values by duplicating them. Zero padding also requires extra computational resources, as it increases the size of the input matrix. Another concern is the matter of accuracy. The padded zeros, in theory, should not contain any information. However, they are still utilized in feature calculations during the convolution operation, which could introduce discrepancies into the feature maps. Furthermore, if the image has a very specific border, the addition of zero-padded borders could trigger strong artifacts post-convolution. Lastly, there's the task of determining the size of padding to be utilized. Currently, there isn't an established rule to select the padding size that should be used.
Reflection (or mirror) padding works by mirroring and replicating the the values of pixels at the borders.
Circular padding involves the idea of extending the input in a manner that the 'top' and 'bottom', and the 'left' and 'right' boundaries of the image data are joined in a seamless manner, forming a continuous loop — basically, a 'circle'.
Circular padding helps to learn features without bias towards the center of the image. The padding refrains from adding any new information or artifacts to the image, which might lead to a performance improvement.
In replication padding, the input is padded with the value from the border of the input. This is thought to produce more realistic results for some types of images and reduce artifacts at the border of the output.
Conclusion
As a result, you are now familiar with the idea and the motivation behind padding, some of it's most popular variants, and their limitations.