Computer scienceData scienceInstrumentsPandasData preprocessing with pandas

Sorting Data in Pandas

9 minutes read

In this topic, we'll cover advanced operations for sorting a Pandas Dataframe: sorting the rows by any column, by index values, and sorting the columns by their names.

As always, we import pandas and create a DataFrame from a dictionary:

import pandas as pd
goods = {
    "type": ['PC', 'monitor', 'printer', 'notebook', 'PC'], 
    "price": [450, 150, 175, 600, 500],
    "quantity": [5, 2, 10, 15, 5],
    "vendor_code": ['135', '960', '004', '420', '114'], 
}
df = pd.DataFrame(goods)
df.head()

Output:

       type  price  quantity vendor_code
0        PC    450         5         135
1   monitor    150         2         960
2   printer    175        10         004
3  notebook    600        15         420
4        PC    500         5         114

This is a sample from storage data of an electronics retail company.

Row sorting

There are two main approaches: sort a DataFrame by row indices or sort it by columns. Let's start with the former one.

1. Sorting by index

Our sample DataFrame has the basic ordinal index and rows are already sorted (by default). So at first, we change the index with the method that you've learned in previous topics – set_index():

df.set_index('vendor_code', inplace=True)
df.head()

Output:

                 type  price  quantity
vendor_code                           
135                PC    450         5
960           monitor    150         2
004           printer    175        10
420          notebook    600        15
114                PC    500         5

The vendor_code column became the index column. Now let's use the Pandas sort_index() method to sort all rows according to their index labels:

df.sort_index()

Output:

                 type  price  quantity
vendor_code                           
004           printer    175        10
114                PC    500         5
135                PC    450         5
420          notebook    600        15
960           monitor    150         2

Notice that the rows have been sorted lexicographically because of the string index. This method has the inplace argument, and as we didn't change its default state (False), the result is just being output. This means the row order of the DataFrame df remains the same. The sort_index() method can be very handy if your data has DateTime indices.

2. Sorting by a single column

Another way to sort the DataFrame/Series is to use the sort_values() method. Let's take our initial example and sort it by price:

df.sort_values('price')

Output:

       type  price  quantity vendor_code
1   monitor    150         2         960
2   printer    175        10         004
0        PC    450         5         135
4        PC    500         5         114
3  notebook    600        15         420

As you can see, the table is now sorted by the price column ascending. This is the default behavior of sort_values().

Let's add some details. Like in many other methods, first of all, we get instant output because we didn't set the inplace argument to True. Secondly, note how we pass the name of the column for sorting — just a string with its name. If we need to sort by several columns, we should pass them as a Python list in the corresponding order. Lastly, the order of sorting is set by the argument ascending, which is True by default, as you can see. Now we'll play around with those settings.

Suppose we want to look at our largest supplies. In such a case, we have to sort our data by the quantity column in descending order:

df.sort_values('quantity', ascending=False)

Output:

       type  price  quantity vendor_code
3  notebook    600        15         420
2   printer    175        10         004
0        PC    450         5         135
4        PC    500         5         114
1   monitor    150         2         960

Ok, it's now sorted by the quantity and top-down.

3. Sorting by multiple columns

But what if we also need to sort by the price, so more expensive units will be displayed first? Add the second sorting column:

df.sort_values(['quantity', 'price'], ascending=False)

Output:

       type  price  quantity vendor_code
3  notebook    600        15         420
2   printer    175        10         004
4        PC    500         5         114
0        PC    450         5         135
1   monitor    150         2         960

Here the DataFrame is sorted by quantity first and by price second, and both sortings are descending.

Let's look at the final example, where we set different sorting orders for chosen columns. This is the way to sort by the quantity ascending and for price descending:

df.sort_values(['quantity', 'price'], ascending=[True, False])

Output:

       type  price  quantity vendor_code
1   monitor    150         2         960
4        PC    500         5         114
0        PC    450         5         135
2   printer    175        10         004
3  notebook    600        15         420

We got the desired result by setting the ascending argument to a list of boolean values in the same order as the columns mentioned in the first argument.

Column sorting

Sometimes, in your data-rich future, you may encounter a table with a large number of unfamiliar columns. It would be very handy if something could arrange them in alphabetical order. And, of course, Pandas has this capability:

df = df.reindex(sorted(df.columns), axis='columns')
df.head()

Output:

   price  quantity      type vendor_code
0    450         5        PC         135
1    150         2   monitor         960
2    175        10   printer         004
3    600        15  notebook         420
4    500         5        PC         114

What has just happened here? reindex() is a Pandas method that sets new labels along the selected axis. We pass the list of column labels presorted with the built-in Python function sorted() as the first argument to achieve the desired result.

Summary

Let's conclude what you've learned:

how to sort DataFrame by index with sort_index()
how to sort DataFrame by any column with sort_values()
how to set columns in alphabetical order

23 learners liked this piece of theory. 1 didn't like it. What about you?

Report a typo

Sorting Data in Pandas

Row sorting

Column sorting

Summary

Related topics