In this topic, we'll cover advanced operations for sorting a Pandas Dataframe: sorting the rows by any column, by index values, and sorting the columns by their names.
As always, we import pandas and create a DataFrame from a dictionary:
import pandas as pd
goods = {
"type": ['PC', 'monitor', 'printer', 'notebook', 'PC'],
"price": [450, 150, 175, 600, 500],
"quantity": [5, 2, 10, 15, 5],
"vendor_code": ['135', '960', '004', '420', '114'],
}
df = pd.DataFrame(goods)
df.head()
Output:
type price quantity vendor_code
0 PC 450 5 135
1 monitor 150 2 960
2 printer 175 10 004
3 notebook 600 15 420
4 PC 500 5 114
This is a sample from storage data of an electronics retail company.
Row sorting
There are two main approaches: sort a DataFrame by row indices or sort it by columns. Let's start with the former one.
1. Sorting by index
Our sample DataFrame has the basic ordinal index and rows are already sorted (by default). So at first, we change the index with the method that you've learned in previous topics – set_index():
df.set_index('vendor_code', inplace=True)
df.head()
Output:
type price quantity
vendor_code
135 PC 450 5
960 monitor 150 2
004 printer 175 10
420 notebook 600 15
114 PC 500 5
The vendor_code column became the index column. Now let's use the Pandas sort_index() method to sort all rows according to their index labels:
df.sort_index()
Output:
type price quantity
vendor_code
004 printer 175 10
114 PC 500 5
135 PC 450 5
420 notebook 600 15
960 monitor 150 2
Notice that the rows have been sorted lexicographically because of the string index. This method has the inplace argument, and as we didn't change its default state (False), the result is just being output. This means the row order of the DataFrame df remains the same. The sort_index() method can be very handy if your data has DateTime indices.
2. Sorting by a single column
Another way to sort the DataFrame/Series is to use the sort_values() method. Let's take our initial example and sort it by price:
df.sort_values('price')
Output:
type price quantity vendor_code
1 monitor 150 2 960
2 printer 175 10 004
0 PC 450 5 135
4 PC 500 5 114
3 notebook 600 15 420
As you can see, the table is now sorted by the price column ascending. This is the default behavior of sort_values().
Let's add some details. Like in many other methods, first of all, we get instant output because we didn't set the inplace argument to True. Secondly, note how we pass the name of the column for sorting — just a string with its name. If we need to sort by several columns, we should pass them as a Python list in the corresponding order. Lastly, the order of sorting is set by the argument ascending, which is True by default, as you can see. Now we'll play around with those settings.
Suppose we want to look at our largest supplies. In such a case, we have to sort our data by the quantity column in descending order:
df.sort_values('quantity', ascending=False)
Output:
type price quantity vendor_code
3 notebook 600 15 420
2 printer 175 10 004
0 PC 450 5 135
4 PC 500 5 114
1 monitor 150 2 960
Ok, it's now sorted by the quantity and top-down.
3. Sorting by multiple columns
But what if we also need to sort by the price, so more expensive units will be displayed first? Add the second sorting column:
df.sort_values(['quantity', 'price'], ascending=False)
Output:
type price quantity vendor_code
3 notebook 600 15 420
2 printer 175 10 004
4 PC 500 5 114
0 PC 450 5 135
1 monitor 150 2 960
Here the DataFrame is sorted by quantity first and by price second, and both sortings are descending.
Let's look at the final example, where we set different sorting orders for chosen columns. This is the way to sort by the quantity ascending and for price descending:
df.sort_values(['quantity', 'price'], ascending=[True, False])
Output:
type price quantity vendor_code
1 monitor 150 2 960
4 PC 500 5 114
0 PC 450 5 135
2 printer 175 10 004
3 notebook 600 15 420
We got the desired result by setting the ascending argument to a list of boolean values in the same order as the columns mentioned in the first argument.
Column sorting
Sometimes, in your data-rich future, you may encounter a table with a large number of unfamiliar columns. It would be very handy if something could arrange them in alphabetical order. And, of course, Pandas has this capability:
df = df.reindex(sorted(df.columns), axis='columns')
df.head()
Output:
price quantity type vendor_code
0 450 5 PC 135
1 150 2 monitor 960
2 175 10 printer 004
3 600 15 notebook 420
4 500 5 PC 114
What has just happened here? reindex() is a Pandas method that sets new labels along the selected axis. We pass the list of column labels presorted with the built-in Python function sorted() as the first argument to achieve the desired result.
Summary
Let's conclude what you've learned:
- how to sort DataFrame by index with
sort_index() - how to sort DataFrame by any column with
sort_values() - how to set columns in alphabetical order