This topic serves as an introduction to the pickle module and serialization. This module can serialize and deserialize various objects in Python.
Serialization
Serialization is the process of converting data structures or object states into a representation that preserves the state and hierarchy for further uses. For instance, we can store serialized data in a file or in a database. We can recover the original structure through deserialization in the same or another environment. It can save us a lot of time as we don't need to reconstruct the structure manually.
Serialization is a general term for processes that have different implementations. If you are familiar with JSON or XML, you can serialize data with their aid. Each format has its own features, some of them are human-readable, others support only binary representation. Various formats have different purposes and are meant for different data types.
Pickling is a special serialization process that handles Python objects. The name of the technique reminds us of a preserving method that is popular among some cultures. We can preserve Python objects with the pickle module. They can be turned into a stream of bytes for later use; that's the core idea of the procedure. In the context of pickle, pickling is a synonym of serialization, while unpickling is deserialization. We can pickle almost anything in Python, including functions and classes. We can even preserve trained machine learning models, how do you like that?
Pickling
Now, let's consider a couple of examples. Pickle is a built-in module, so you can load it with the standard import statement:
import pickle
We mentioned that pickle can work with many Python data structures. Let's create a simple list. It contains musical instruments that could be found in a store:
music_instruments = ['Acoustic piano', 'Electric piano', 'Synthesizer']
Now, we will create a pickled representation of our object. The module has two functions:
dump()stores a pickled representation of an object and outputs it to a separate file. The function takes two mandatory arguments: the object and an output file. We can also pass an additional argument with the protocol value (currently, from0to5). It indicates the compatible Python version. Higher protocol values mean more recent Python versions.0-2are older options suitable for Python 2.3is the default version for Python 3.0-3.7.4was introduced in Python 3.4 and became the default protocol starting with Python 3.8. The latest5protocol was introduced with Python 3.8 and is currently supported only by Python 3.8 or higher. So, once again, if you set the protocol value at4, it means you can open your file both in Python 3.4-3.7 and Python 3.8. If it is5, use only Python 3.8 or higher. This value can be passed as an integer. Alternatively, to assign the latest protocol version automatically, you can pass thepickle.HIGHEST_PROTOCOLvalue, like in the example below. Note that our output file is binary, so the mode is'wb':with open('pickled_instruments', 'wb') as file: pickle.dump(music_instruments, file, pickle.HIGHEST_PROTOCOL)dumps()is another function that simply returns a pickled representation of the object as a string of bytes. Take a look at the following example; we pass the list to the function and then output the result:pickled_instruments = pickle.dumps(music_instruments) print(pickled_instruments) # b'\x80\x03]q\x00(X\x0e\x00\x00\x00Acoustic pianoq\x01X\x0e\x00\x00\x00Electric pianoq\x02X\x0b\x00 # \x00\x00Synthesizerq\x03e.'
lambda functions. If you try to serialize them, the operation will produce a PicklingError.Unpickling
Now, we will turn to unpickling. We will try to recover our data to a Python object hierarchy. Like serialization, deserialization can be performed using two functions from the pickle module:
load()reads the pickled data from an output file. Note that we can't indicate the protocol version manually; it is detected automatically. As an example, we will transform the data frompickled_instrumentsback to a Python object. We need to specify the'rb'mode, as our file is binary:with open('pickled_instruments', 'rb') as pickled_file: unpickled_data = pickle.load(pickled_file) print(unpickled_data) # ['Acoustic piano', 'Electric piano', 'Synthesizer'] print(type(unpickled_data)) # <class 'list'>As you can see, we recovered a Python list, so no changes here.
loads()reads a byte string and returns an unpickled object. In the example below, we pass thepickled_instrumentsvariable to the function:unpickled_bytes = pickle.loads(pickled_instruments) print(unpickled_bytes) # ['Acoustic piano', 'Electric piano', 'Synthesizer'] print(type(unpickled_bytes)) # <class 'list'>
UnpicklingError will be raised.Pickling features
Since you have learned about the main functionality of pickle, we can outline other features of the module:
pickleis Python-specific, so it doesn't provide compatibility with other programming languages. If compatibility is important, it may be a good idea to look at other forms of serialization.pickleis not very secure, so it is not recommended to load pickled data from untrusted sources.- the pickled data may be incompatible between Python versions; above, we mentioned six protocol versions and Python versions corresponding to them. They should be kept in mind when you create your pickled objects or try to retrieve some information from the outer source.
Summary
So far, we have learned the basics of the pickle module. Let's sum up the main points:
- Pickling is a form of serialization that allows us to convert Python objects to a series of bytes for later use.
picklehas two functions for serialization.dump()writes pickled representation to a separate file anddumps()returns the pickled representations as a string.picklehas two functions for deserialization,load()andloads(): the former one retrieves the object hierarchy from an outer file, and the latter one just needs a string of bytes to do it.
Interested? Take a look at the Pickle documentation. It also contains brief comparisons with other serialization methods.
Now, let's practice your knowledge!