Archives are very useful in our daily lives. They can store objects with significant compression. They also make the process of transferring more convenient and easier. There are several archive formats, but the most widespread is .zip. Even if you don't have experience in information handling and don't come across archives that often, they are still easy to use. However, in programming, we often work with a lot of data to automize routine tasks. Fortunately, if it's related to archives, the Python standard library has the right module – zipfile. In this topic, we will learn how to use it.
Extracting data
The most popular operation on archives, without a doubt, is extracting. However, as we have already discussed, the content of the archives you're going to work with may be unknown to you. In this case, you would probably want to learn a little about it before extracting anything.
Let's say we have an archive files.zip that we want to extract. First, we need to import the module, open the archive, and get general information. The main class for working with archive objects is called ZipFile. We will use it to manipulate our objects:
import zipfile
archive = zipfile.ZipFile('cat_forms.zip', 'r')
archive.printdir()
# File Name Modified Size (in bytes)
# floppa.txt 2021-11-15 17:10:46 12
# bob.txt 2021-11-15 17:11:08 15
# garfield.txt 2021-11-15 17:11:30 26In the first line of code, we've provided the path to the file and reading mode. There are several reading modes you can use; look at the table below for the options and description:
Mode | Description |
|---|---|
| Read an existing file |
| Write a new file (rewrites it, if the file already exists) |
| Append to an existing file |
| Create and write a new file (fail, if the file already exists) |
If you only need a list of file names from the archive, use namelist() method:
archive.namelist()
# ['floppa.txt', 'bob.txt', 'garfield.txt']Okay, at this point we know that our archive contains three .txt files. Now, we have two options: either import a particular file (or files) or all of them. Here's how we can do it:
# extact a specific file in the specified directory
archive.extract('floppa.txt', 'extracted_cats')
# 'C:\\Users\\Username\\Desktop\\extracted_cats\\floppa.txt'
# extract all files in the current directory
# (a different path might be provided as well)
archive.extractall()As you can see, you can extract specific data or all of it at once and even choose the destination. Also, if you have doubts about the nature of the file you're about to access, there is a way to check whether it's a .zip file or not:
zipfile.is_zipfile('cat_forms.zip') # TrueReading files
We've learned how to extract files, but what if you would also like to see what's inside? No problem, you can open them like any other file:
floppa = archive.open('floppa.txt')
print(floppa.read()) # b'Hello! I am Floppa, a big Russian cat!'
archive.close() # don't forget to close the archive when you're done!Note that the file is accessed as a binary file-like object. To change it to a normal string, you can do the following:
floppa = archive.open('floppa.txt')
string = floppa.read().decode('ascii')
print(string) # 'Hello! I am Floppa, a big Russian cat!'
archive.close()Decode() is the method that takes bytes and converts them into usual Python strings. However, to do that, it needs to know the character encoding — the code that represents text in computers. In this case, the encoding is ASCII.
Creating archives
Now, once we know how to work with ready-to-use archives, it's time for us to figure out how we can create archives ourselves. Here's a way to create a new archive:
new_archive = zipfile.ZipFile('C:/Users/Username/Desktop/more_cat_forms.zip', 'w')You cannot directly create folders in archives using zipfile. You need to specify the full path to the file when adding it:
new_archive = zipfile.ZipFile('C:/Users/Username/Desktop/more_cat_forms.zip', 'w')
new_archive.write('C:/Users/Username/Downloads/grumpy.txt')
new_archive.printdir()
# File Name Modified Size
# Users/Username/Desktop/grumpy.txt 2021-11-16 14:28:04 12
new_archive.close()Or you can write the relative path if they are in the same directory:
new_archive = zipfile.ZipFile('C:/Users/Username/Desktop/more_cat_forms.zip', 'w')
new_archive.write('grumpy.txt')
new_archive.printdir()
# File Name Modified Size
# grumpy.txt 2021-11-16 14:28:04 12
new_archive.close()The path you provide when writing a file to the archive is reproduced inside. In the first case, the file was in another directory, so we had to provide a full path for Python to find it. However, it also copied its structure, so we got the same path inside the archive at the end. This also means that if you want to create a folder inside the archive and put some data there, you need to create it on your machine first and then put the files to the archive:
new_archive = zipfile.ZipFile('C:/Users/Username/Desktop/more_cat_forms.zip', 'w')
new_archive.write('grumpy.txt')
new_archive.write('other_files/felix.txt')
new_archive.write('other_files/pistachio.txt')
new_archive.printdir()
# File Name Modified Size
# grumpy.txt 2021-11-16 14:28:04 12
# other_files/felix.txt 2021-11-16 14:30:31 15
# other_files/pistachio.txt 2021-11-16 14:30:31 26
new_archive.close()To make the task easier, take a look at the os module from the Python standard library to understand paths, navigate through them, and traverse folders, so you could, for example, add files to archives using loops instead of doing it one by one.
Conclusion
Summing up, we've learned how to work with archives using the zipfile module from the Python standard library. To be precise, we've learned:
How to extract data;
How to read files inside an archive;
How to create archives and put data inside.
Now, you're ready to use zipfile!