Computer scienceData scienceInstrumentsPandasData preprocessing with pandas

Modifying a DataFrame

9 minutes read

Previously, we discussed how we can access columns and rows in a pandas DataFrame. Now, let's figure out how we can modify a DataFrame structure. In this topic, you'll learn the operations of adding new rows and columns and deleting old ones.

Adding columns

First, you need to import pandas and create a DataFrame from a dictionary:

import pandas as pd

pets = {
    'species': ['cat', 'dog', 'parrot', 'cockroach'], 
    'pet_name': ['Dr. Mittens Lamar', 'Diesel', 'Peach', 'Richard'], 
    'legs': [4, 4, 2, 6],
    'wings': [0, 0, 2, 4],
    'homeless': ['no', 'no', 'no', 'yes']
}
df = pd.DataFrame(pets)
df.head()

Here is the output:

+----+-----------+-------------------+--------+---------+------------+
|    | species   | pet_name          |   legs |   wings | homeless   |
|----+-----------+-------------------+--------+---------+------------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         |
|  1 | dog       | Diesel            |      4 |       0 | no         |
|  2 | parrot    | Peach             |      2 |       2 | no         |
|  3 | cockroach | Richard           |      6 |       4 | yes        |
+----+-----------+-------------------+--------+---------+------------+

In pandas, it's possible to create new columns on the fly. Just address the DataFrame with a new column name and pass the values:

df['mood'] = ['sleepy', 'happy', 'thinking', 'excited']
df.head()

Output:

+----+-----------+-------------------+--------+---------+------------+----------+
|    | species   | pet_name          |   legs |   wings | homeless   | mood     |
|----+-----------+-------------------+--------+---------+------------+----------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         | sleepy   |
|  1 | dog       | Diesel            |      4 |       0 | no         | happy    |
|  2 | parrot    | Peach             |      2 |       2 | no         | thinking |
|  3 | cockroach | Richard           |      6 |       4 | yes        | excited  |
+----+-----------+-------------------+--------+---------+------------+----------+

The number of values must match the number of rows in the DataFrame.

You can also derive a new column based on the existing one. Suppose you want to create the pairs_of_legs column based on the legs column. You take and divide it by 2:

df['pairs_of_legs'] = df['legs'] / 2
df.head()

Here is the output:

+----+-----------+-------------------+--------+---------+------------+----------+-----------------+
|    | species   | pet_name          |   legs |   wings | homeless   | mood     |   pairs_of_legs |
|----+-----------+-------------------+--------+---------+------------+----------+-----------------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         | sleepy   |             2.0 |
|  1 | dog       | Diesel            |      4 |       0 | no         | happy    |             2.0 |
|  2 | parrot    | Peach             |      2 |       2 | no         | thinking |             1.0 |
|  3 | cockroach | Richard           |      6 |       4 | yes        | excited  |             3.0 |
+----+-----------+-------------------+--------+---------+------------+----------+-----------------+

As a result, we have a float-type column. It's possible to use other types of mathematical operations. You can also use string operations, like concatenation. Let's create a new column called description from mood and species:

df['description'] = df['mood'] + ' ' + df['species']
df.head()

Output:

+----+-----------+-------------------+--------+---------+------------+----------+-----------------+-------------------+
|    | species   | pet_name          |   legs |   wings | homeless   | mood     |   pairs_of_legs | description       |
|----+-----------+-------------------+--------+---------+------------+----------+-----------------+-------------------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         | sleepy   |             2.0 | sleepy cat        |
|  1 | dog       | Diesel            |      4 |       0 | no         | happy    |             2.0 | happy dog         |
|  2 | parrot    | Peach             |      2 |       2 | no         | thinking |             1.0 | thinking parrot   |
|  3 | cockroach | Richard           |      6 |       4 | yes        | excited  |             3.0 | excited cockroach |
+----+-----------+-------------------+--------+---------+------------+----------+-----------------+-------------------+

Adding rows

If you need to add a row, use the pd.concat() method. The first parameter must be an iterable object, for example, a list, that contains DataFrames or Series. The second parameter is ignore_index. It is False by default. If we set it as True, the DataFrame will be reindexed from 0 up to the new row. pd.concat() doesn't change the data but returns the new DataFrame with the row, added to the end. Let's do it:

new_row = {'pet_name': 'Turtle', 
            'species': 'turtle',
            'legs': 4,
            'wings': 0,
            'homeless': 'no',
            'mood': 'skeptical',
            'pairs_of_legs': 2,
            'description': 'skeptical turtle'}
df = pd.concat([df, pd.DataFrame.from_records([new_row])], ignore_index=True)
df.head()

Here is the output:

+----+-----------+-------------------+--------+---------+------------+-----------+-----------------+-------------------+
|    | species   | pet_name          |   legs |   wings | homeless   | mood      |   pairs_of_legs | description       |
|----+-----------+-------------------+--------+---------+------------+-----------+-----------------+-------------------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         | sleepy    |             2.0 | sleepy cat        |
|  1 | dog       | Diesel            |      4 |       0 | no         | happy     |             2.0 | happy dog         |
|  2 | parrot    | Peach             |      2 |       2 | no         | thinking  |             1.0 | thinking parrot   |
|  3 | cockroach | Richard           |      6 |       4 | yes        | excited   |             3.0 | excited cockroach |
|  4 | turtle    | Turtle            |      4 |       0 | no         | skeptical |             2.0 | skeptical turtle  |
+----+-----------+-------------------+--------+---------+------------+-----------+-----------------+-------------------+

Note that the new row has 4 as an index and pairs_of_legs automatically became float (column type).

Deleting columns

You can delete a whole DataFrame with a single method — DataFrame.drop(). Since we have the description column, you don't need species and mood any more, so let's get rid of them! We can call .drop and pass these labels to the columns argument. This method also returns a DataFrame by default, but you can also set inplace to True for the changes to take place:

df.drop(columns=['species', 'mood'], inplace=True)
df.head()

Here is the output:

+----+-------------------+--------+---------+------------+-----------------+-------------------+
|    | pet_name          |   legs |   wings | homeless   |   pairs_of_legs | description       |
|----+-------------------+--------+---------+------------+-----------------+-------------------|
|  0 | Dr. Mittens Lamar |      4 |       0 | no         |             2.0 | sleepy cat        |
|  1 | Diesel            |      4 |       0 | no         |             2.0 | happy dog         |
|  2 | Peach             |      2 |       2 | no         |             1.0 | thinking parrot   |
|  3 | Richard           |      6 |       4 | yes        |             3.0 | excited cockroach |
|  4 | Turtle            |      4 |       0 | no         |             2.0 | skeptical turtle  |
+----+-------------------+--------+---------+------------+-----------------+-------------------+

Since we want to delete several columns, we can pass their labels as a list.

Deleting rows

If you want to delete rows, follow the same rules but use the index argument:

df.drop(index=3, inplace=True)
df.head()

Here is the output:

+----+-----------+-------------------+--------+---------+------------+-----------+-----------------+------------------+
|    | species   | pet_name          |   legs |   wings | homeless   | mood      |   pairs_of_legs | description      |
|----+-----------+-------------------+--------+---------+------------+-----------+-----------------+------------------|
|  0 | cat       | Dr. Mittens Lamar |      4 |       0 | no         | sleepy    |             2.0 | sleepy cat       |
|  1 | dog       | Diesel            |      4 |       0 | no         | happy     |             2.0 | happy dog        |
|  2 | parrot    | Peach             |      2 |       2 | no         | thinking  |             1.0 | thinking parrot  |
|  4 | turtle    | Turtle            |      4 |       0 | no         | skeptical |             2.0 | skeptical turtle |
+----+-----------+-------------------+--------+---------+------------+-----------+-----------------+------------------+

There are a few things to mention — we passed an integer row label (as we have an ordinal integer index). The DataFrame index now is not sequentially numbered. It doesn't contain 3, and if we want to fix that, we have to use .reset_index(drop=True, inplace=True). The most popular way to delete rows is to filter the DataFrame against a condition and put the selection (skip the rows you don't need) to df or any other DataFrame variable. We will master the art of selection in the topics to come.

Conclusion

In this topic, you've learned:

How to easily create columns and use pd.concat() to add rows
How to delete rows and columns with .drop()

81 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo

Modifying a DataFrame

Adding columns

Adding rows

Deleting columns

Deleting rows

Conclusion

Related topics