10 minutes read

It is not a secret that a lot of people use Excel. It is rather user-friendly and can store a lot of data. Unfortunately, sometimes it is difficult to process hundreds of data entries manually. In this case, we can use the openpyxl library in Python. You can create and modify Excel files with it and in this topic, you will learn the basics of this library. There are also other tools for working with Excel tables, you can read more about them on this special page.

First steps

You can use pip to install openpyxl:

pip install openpyxl

We are ready to work with this library. Let's discuss the process of creating and saving a new Excel file. First of all, we need to import the Workbook class. It is a container with other parts of our document. Then, we need to create a new workbook and save it in the book variable. By default, the library creates an empty Excel file with one worksheet. We can use book.active to link it. It creates a kind of a working environment for further use.

from openpyxl import Workbook


book = Workbook()
new_sheet = book.active

By the way, the new_sheet variable is of a specific type.

print(type(new_sheet))
# <class 'openpyxl.worksheet.worksheet.Worksheet'>

After getting the link to the sheet, we can start to fill it with data. Let's consider a small example. Imagine you finished reading a vet leaflet and decided to write down all the animals that were mentioned in the leaflet. So, we define each cell (A1, A2...) of the sheet in square brackets, write down the name of the animal and then save the file.

new_sheet['A1'] = 'cat'
new_sheet['A2'] = 'dog'
new_sheet['A3'] = 'tiger'
new_sheet['A4'] = 'hamster'

book.save("New_file.xlsx")

There's one more way to enter a value — we can specify the row and the column explicitly.

new_sheet.cell(row=1, column=1).value = 'cat'

Now, the file looks as follows:

Excel file screenshot

As you can see, the information is stored in one sheet.

Reading files

Now, we will take a look at how we can read an existing file. You have planned summer and winter holidays for all your friends. Summer holidays are outlined in the first sheet, and winter holidays are in the second one. The Excel file includes the following information: name, surname, city, departure date, days of stay.

Holiday file reading

First of all, we need to use the openpyxl.load_workbook() function to open our file. After that, we need to specify the sheet to deal with. Summer holidays is the name of the corresponding sheet.

import openpyxl


workbook = openpyxl.load_workbook('Flights.xlsx')
sheet = workbook['Summer holidays']

If you forgot the name of a sheet, you can use the following command to open it. The following statement allows us to open the Summer holidays sheet:

all_sheets_names = workbook.sheetnames
sheet = workbook[all_sheets_names[0]]

Workbook.sheetnames can save all sheet names of an Excel file to a list. After that, we can easily access them by using the corresponding index.

Now, let's print the name and surname of the first friend and the city. We can access them with the value attribute.

print(sheet['A1'].value)  # Tom
print(sheet['B1'].value)  # Black
print(sheet['C1'].value)  # London

If you need to print some information from the second sheet, you can access it in the same way by specifying its name:

sheet = workbook['Winter holidays']

print(sheet['A3'].value)  # Alex
print(sheet['B3'].value)  # Brown
print(sheet['C3'].value)  # Warsaw

And don't forget that you can use another notation for switching to another sheet. In this case, we need to specify the index 1:

all_sheets_names = workbook.sheetnames
sheet = workbook[all_sheets_names[1]]

Writing to a file

Let's continue working with our file. We want to plan holidays for our friends for the upcoming spring. Let's talk about Tom. He knows the city that he is eager to visit, but he isn't sure about the date. We can create a new sheet and write down information about him:

workbook.create_sheet(title='Spring holidays')
write_sheet = workbook['Spring holidays']

write_sheet['A1'] = 'Tom'
write_sheet['B1'] = 'Black'
write_sheet['C1'] = 'The Hague'
write_sheet['D1'] = 'We will discuss later'
write_sheet['E1'] = 5

workbook.save("Flights.xlsx")

Workbook.create_sheet() is used for adding a new sheet. We can specify the name in the parentheses. After that, we can open the sheet and write information to it. Don't forget to save the file!

Code result row

Horray! The record is added successfully.

Filtering entries

The examples above are quite easy to handle. But what if an Excel file contains thousands of entries, and we need to show only some of them? In this case, we can use if-statements and for-loops.

Let's analyze a file that stores the information on sold goods — economy sector, country, number of sold goods, discounts, and so on.

Filtering excel sheet

If it is too hard to count rows and columns manually, you can use sheet.max_row and sheet.max_column to find out how many rows and columns are in the file:

workbook = openpyxl.load_workbook('sample-xls-file-for-testing.xlsx')

sheet = workbook['Sheet1']

rows = sheet.max_row
cols = sheet.max_column
print(rows)  # 701
print(cols)  # 16

If we want to print all entries, we can do it like this:

for row in range(1, rows + 1):
    string = ''
    for column in range(1, cols + 1):
        value = sheet.cell(row=row, column=column).value
        string = string + str(value) + ', '
    string = string[:-2]  # we remove the last appended comma and space
    print(string)

# Segment, Country, Product, Discount Band, Units Sold ... 
# Government, Canada, Carretera, None, 1618.5 ... 
# Government, Germany, Carretera, None, 1321 ...
# ...

We can define the starting point for counting columns and rows with range(). We may want to create a list of fetched values with the help of sheet.cell(). There's also a way to join the values separated by a comma in one string.

Let's take a step further and print only those entries that contain Germany:

for row in range(1, rows + 1):
    string = ''
    for column in range(1, cols + 1):
        value = sheet.cell(row=row, column=column).value
        string = string + str(value) + ', '
    string = string[:-2]
    check = sheet.cell(row=row, column=2)
    check = str(check.value)
    if check == 'Germany':
        print(string)

# Government, Germany, Carretera, None, 1321 ... 
# Midmarket, Germany, Carretera, None, 888 ...
# Government, Germany, Carretera, None, 1513 ...
# ...

We create the check variable where we can store the name of a city. We know that it is the second column. First, we iterate over each row and address the second column. Second, we get the value from a certain cell and transform it into a string representation. If our variable is Germany, then we should print the entry.

You can do the same with other values in your files.

Summary

So far, we have learned:

  • how to create an Excel file with the help of openpyxl;
  • how to read and write files using this library;
  • how to specify certain conditions if an Excel file contains a lot of entries.

If you are interested in studying more features of this library, feel free to have a look at the Official Documentation. But for now, let's review everything we have learned.

69 learners liked this piece of theory. 0 didn't like it. What about you?
Report a typo