Computer scienceData scienceInstrumentsPandasStoring data with pandas

.loc & .iloc

12 minutes read

Sometimes, we may want to access a piece of information stored in a particular row or a column instead of working with the whole DataFrame. The good news is that pandas has just the right solution for it. It is called indexing, and we can select a particular subset of a DataFrame or a Series to work with it.

.loc

Before we start, let's import pandas (abbreviated as pd) and create a DataFrame from a dictionary:

import pandas as pd

people = {
    "first_name": ["Michael", "Michael", 'Jane', 'John'], 
    "last_name": ["Jackson", "Jordan", 'Doe', 'Doe'], 
    "email": ["[email protected]", "[email protected]", 
'[email protected]', '[email protected]'],
    "birthday": ["29.09.1958", "17.02.1963", "15.03.1978", "12.05.1979"],
    "height": [1.75, 1.98, 1.64, 1.8]
}
df = pd.DataFrame(people)
df.head()

Here is the output:

  first_name last_name               email    birthday  height
0    Michael   Jackson  [email protected]  29.09.1958    1.75
1    Michael    Jordan   [email protected]  17.02.1963    1.98
2       Jane       Doe   [email protected]  15.03.1978    1.64
3       John       Doe   [email protected]  12.05.1979    1.80

pandas provides two additional features for selecting a subset of rows and columns: .loc and .iloc. The first one stands for locator and is label-based. .iloc stands for integer locator and is integer position-based. Note that both features aren't methods: they are Python properties, and that's why they use square brackets. First, remember that their core syntax is similar:

 .loc[<row selection>, <optional column selection>]
.iloc[<row selection>, <optional column selection>]

Let's start with .loc. It can handle integer-based indexes as labels, but for clarity, we will create and name a text index:

df.index = ['first', 'second', 'third', 'fourth']
df.index.name = 'index'
df.head()

Output:

       first_name last_name               email    birthday  height
index                                                              
first     Michael   Jackson  [email protected]  29.09.1958    1.75
second    Michael    Jordan   [email protected]  17.02.1963    1.98
third        Jane       Doe   [email protected]  15.03.1978    1.64
fourth       John       Doe   [email protected]  12.05.1979    1.80

.loc can take:

  • a single row label;

  • a list of row labels;

  • a slice of row labels;

  • a result of conditional statements (a boolean array)

We could also pass columns as the second argument in a similar manner: a single label, a list, or a slice.

If we pass a single argument, pandas will return a Series:

df.loc['third']

Output:

first_name                 Jane
last_name                   Doe
email         [email protected]
birthday             15.03.1978
height                     1.64
Name: third, dtype: object

You can also select a single cell:

df.loc['third', 'last_name']

Output:

'Doe'

As you can see, we returned a cell value. In this case, it is of the String type.

To pass a list of labels, we need to do the following:

df.loc[['first','fourth']]

We get the rows with the first and fourth indexes:

       first_name last_name               email    birthday  height
index                                                              
first     Michael   Jackson  [email protected]  29.09.1958    1.75
fourth       John       Doe   [email protected]  12.05.1979    1.80

Let's add a column list of labels:

df.loc[['first','fourth'], ['last_name', 'birthday']]

Output:

       last_name    birthday
index                       
first    Jackson  29.09.1958
fourth       Doe  12.05.1979

Note that the first list inside the loc square brackets defines the row selection while the second list defines the column selection.

Here comes a slice of row labels:

df.loc['second':'fourth']

Output (notice how both the beginning ('second') and the end ('fourth') of the slice are included, unlike the usual Python slicing behavior that excludes the end itself):

       first_name last_name              email    birthday  height
index                                                             
second    Michael    Jordan  [email protected]  17.02.1963    1.98
third        Jane       Doe  [email protected]  15.03.1978    1.64
fourth       John       Doe  [email protected]  12.05.1979    1.80

Same as before, we can introduce a condition (with a column slice):

df.loc[df.birthday == '12.05.1979', 'last_name':'birthday':2]

Output:

       last_name    birthday
index                       
fourth       Doe  12.05.1979

The first argument here takes a row while the birthday column is set at 12.05.1979. The second argument takes columns from last_name to birthday with a step of 2. That is, it takes every second column, starting from the first one selected.

Feel free to choose any combination of single values, lists, and slices.

.iloc

Now, move on to .iloc. The core syntax is the same, but this one focuses on the ordinal integer indexes; we cannot use conditionals here. So, switch back to the initial DataFrame by resetting and dropping the label index — we don't need it anymore:

df.reset_index(drop=True, inplace=True)
df.head()

Output:

  first_name last_name               email    birthday  height
0    Michael   Jackson  [email protected]  29.09.1958    1.75
1    Michael    Jordan   [email protected]  17.02.1963    1.98
2       Jane       Doe   [email protected]  15.03.1978    1.64
3       John       Doe   [email protected]  12.05.1979    1.80

At first, let's select the first row and column value:

df.iloc[0, 0]

We returned the top-left cell.

'Michael'

We can also select four inner cells:

df.iloc[[1, 2], [1, 2]]

Output:

  last_name              email
1    Jordan  [email protected]
2       Doe  [email protected]

Don't forget about the step! To define a step k within a row interval [x,y], use the following syntax: df.iloc[x:y:k, :]. For example, we can list every second row (starting from zero) with this line of code:

df.iloc[::2, :]

Output:

  first_name last_name               email    birthday  height
0    Michael   Jackson  [email protected]  29.09.1958    1.75
2       Jane       Doe   [email protected]  15.03.1978    1.64

Awesome, isn't it? This technique looks simple if you're already familiar with Python lists.

Note that .iloc takes an integer position. It means that if we don't have an end-to-end line numbering, it will take the row positions. So if we have fancy indexing like this:

    a  b
10  1  4
0   2  5
20  3  6

df.iloc[0] will still select the first row (with an index of 10):

a    1
b    4
Name: 10, dtype: int64

And df.loc[0] will select the second row (with an index of 0):

a    2
b    5
Name: 0, dtype: int64

Use .loc and .iloc when you want to change a part of a DataFrame.

To sum up, let's look at the main differences between .loc and .iloc in one table:

.loc

.iloc

Conditional row selection

Yes

No

Takes rows as

Index names

Index integer position

Takes columns as

Column names

Column integer position

Modifying a DataFrame with loc & iloc

Both methods are not only a convenient way to select a part of a DataFrame, but also help modify a part of a DataFrame just with one code line. Let's imagine a situation: to save personal data on a server, users must send you a Data Processing Agreement (DPA). Suppose you didn't get the DPA from Jane & John Doe. Let's update our data:

df.iloc[2:, 2:5] = "no DPA"

Here is what we'll get:

  first_name last_name               email    birthday  height
0    Michael   Jackson  [email protected]  29.09.1958    1.75
1    Michael    Jordan   [email protected]  17.02.1963    1.98
2       Jane       Doe              no DPA      no DPA  no DPA
3       John       Doe              no DPA      no DPA  no DPA

Conclusion

Now you know how to select subsets based on their integer position with .iloc and based on labels with .loc. Of course, the list of useful methods goes on and you will learn about them in due time. In some cases, it will be easier to use .loc with a condition, in others — with basic dot-syntax selecting. Feel free to experiment!

Read more on this topic in Exploring Pandas Library for Python on Hyperskill Blog.

77 learners liked this piece of theory. 1 didn't like it. What about you?
Report a typo