Computer scienceData scienceInstrumentsPyTorch

Introduction to PyTorch

13 minutes read

In deep learning, data is often stored and fed into models in the form of multi-dimensional arrays known as tensors. Using tensors, one can train models on large datasets and make predictions based on the learned patterns. When working with complex models, it's essential to have a reliable and performant tool to handle tensors seamlessly and efficiently.

PyTorch is one of the most popular frameworks for deep learning research and development in the fields of computer vision and natural language processing. In this topic, we will review the main components of the library.

Installing PyTorch

PyTorch installation will depend on the system configuration. PyTorch can be run on both the CPU and the GPU, with the latter being the default choice for accelerated training of neural networks. The GPU installation of PyTorch requires different versions of CUDA. You can access the PyTorch installation instructions based on your specific requirements from the official documentation page.

For this topic, running the following command will be sufficient:

pip3 install torch

Alternatively, you can run the code snippets in this topic on Google Colab:

import torch
print(torch.__version__)
# 2.0.1+cu118

There are multiple ways to enable CUDA operations:

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# obj = obj.to(device), which is equivalent to obj = obj.to("cuda") if CUDA is installed

where obj is a PyTorch object, such as a Tensor or a network model. Alternatively, the .cuda() method can be used on the PyTorch object. Also, you can use CUDA for all Tensors by default as

if torch.cuda.is_available():
    torch.set_default_tensor_type('torch.cuda.FloatTensor')

The three components of PyTorch

PyTorch can be viewed as a three-component framework:

Tensor library;
General deep learning library;
Automatic differentiation engine.

The three components of the PyTorch library

The dynamic computational graph is the core processing structure for PyTorch operations, functioning as a directed acyclic graph that denotes the data flow between operations. It can be seen as figures linked by arrows, where each vertex denotes operations performed on the data that flows across the edges. The computational graph is defined at runtime, allowing modifications in the graph at each iteration.

Below is an example graph generated with the pytorchviz library for the computations of the single-layer perceptron, where each node represents a PyTorch operation:

A sample computational graph

Here, we mainly deal with the specific operations related to the forward and backward pass of the single-layer perceptron.

The Tensor object

A Tensor is the fundamental data structure in PyTorch, similar to NumPy's ndarray, but with two additional features that make it suitable for deep learning:

GPU support, which accelerates numeric computations;
Automatic differentiation, which is a process of computing gradients crucial for training deep learning models (NumPy's ndarray does not store the gradients, while the Tensor does).

A comparison between NumPy's ndarray and PyTorch's Tensor

The Tensor object can be instantiated as

a = torch.tensor(1.) 
a.shape # torch.Size([])

In the code above, we have created a Tensor object, a tensor with rank 0, also known as a scalar. Similarly, we can create a vector (a one-dimensional tensor), a 2D matrix (a two-dimensional tensor), and any other tensor with arbitrary dimensionality (which is usually just called a tensor):

b = torch.tensor([1., 2., 3.])
b.shape # torch.Size([3]) - vector

c = torch.tensor([[1., 2.], [1., 2.]])
c.shape # torch.Size([2, 2]) - a 2D matrix

We can also switch between the NumPy's ndarray and the Tensor object:

import torch
import numpy as np

np_array = np.array([0, 1, 2, 3, 4])

# Convert Numpy array to torch.Tensor
tensor_a = torch.from_numpy(np_array)
tensor_b = torch.Tensor(np_array)

# Convert torch.Tensor to a NumPy array
tensor = torch.tensor([1, 2, 3, 4, 5])
np_a = tensor.numpy()

Here, we assume that the Tensor was created on the CPU for the sake of simplicity, but GPU-created tensors could also be converted to NumPy arrays. Tensor can only hold the values of the same data type, and supports the standard variants of the numeric types (int/float), complex numbers, and booleans. In order to work with text data in PyTorch, it has to be converted to a numeric type first.

The main PyTorch functionality

Below, let's take a quick look at the main PyTorch modules and their intended usage:

Module	Description
`torch.nn`	The main module for the building and training of the neural networks, provides classes for the layers
`torch.autograd`	Provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions
`torch.optim`	Provides various optimization algorithms that are commonly used to train neural networks (such as SGD, RMSprop, Adam, etc.)
`torch.utils.data`	Contains the dataset utilities for creating and managing the data conveniently

PyTorch comes with a lot of features that significantly simplify the process of building neural networks:

PyTorch natively supports multi-threading and distributed processing, making the training of large-scale systems much more efficient;
It has a clear API and complete documentation;
Provides automatic differentiation, eliminating the need for manual backpropagation;
PyTorch 2.0 has introduced the torch.compile module, which is backward compatible and accelerates the training and inference of models.

Automatic differentiation with torch.autograd

The primary aim of torch.autograd is to automate the computation of backward passes in neural networks. This automation is further used to perform backpropagation during the training of neural networks. Each tensor in PyTorch has an associated flag, requires_grad, that, if set to True, records operations on this tensor in a function history that enables automatic calculation of gradients.

Let's see how to automatically compute the gradients for a simple function, $y = 3 \cdot x^2$ , at $x = 1$ :

# Set the device to CUDA if it is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define the tensor with requires_grad = True to enable .backward() later on
# and move it to the available device
x = torch.tensor([1.], requires_grad=True, device=device)

# Define the function
y = 3*x[0]**2

# Compute the gradients
y.backward()

print(f"dy/dx: {x.grad[0]}")

# dy/dx: 6.0

Conclusion

As a result, you are now familiar with the following:

PyTorch can be viewed as the tensor library, the automatic differentiation library, and the general-purpose deep learning library;
The main PyTorch modules and their basic functionality;
The central model of computations in PyTorch is the dynamic computational graph, which represents the data flow from the input to the outputs and displays the corresponding tensor operations;
The differences between NumPy's ndarray and PyTorch's Tensor objects.

How did you like the theory?

Report a typo