We have already discussed what type of data can be stored in a DataFrame and how it can be created. Now, let's learn how we can modify an existing DataFrame. In this topic, we'll discuss some basic operations, such as renaming, rearranging columns, or changing the index.
Accessing DataFrame axes
First, we need to import pandas and create a DataFrame from a dictionary:
import pandas as pd
pets = {
'species': ['cat', 'dog', 'parrot', 'cockroach'],
'name': ['Dr. Mittens Lamar', 'Diesel', 'Peach', 'Richard'],
'legs': [4, 4, 2, 6],
'wings': [0, 0, 2, 4],
'looking_for_home': ['no', 'no', 'no', 'yes']
}
df = pd.DataFrame(pets)
df.head()
Here is the output:
+----+-----------+-------------------+--------+---------+--------------------+
| | species | name | legs | wings | looking_for_home |
|----+-----------+-------------------+--------+---------+--------------------|
| 0 | cat | Dr. Mittens Lamar | 4 | 0 | no |
| 1 | dog | Diesel | 4 | 0 | no |
| 2 | parrot | Peach | 2 | 2 | no |
| 3 | cockroach | Richard | 6 | 4 | yes |
+----+-----------+-------------------+--------+---------+--------------------+
We can change indexes both in DataFrames and Series. Indexes can employ different data types such as strings, Datetime objects, float numbers, boolean values, and others.
You can see the row index in the first column on the left. Column names (labels) are in the header. Another way to describe indexing is axis labeling. You can see two axes in our data frame, vertical (rows) — axis 0 and horizontal (columns) — axis 1. Let's take a look at the axes of our DataFrame by accessing the df.axes attribute.
This is what we'll get:
[RangeIndex(start=0, stop=4, step=1),
Index(['species', 'name', 'legs', 'wings', 'looking_for_home'], dtype='object')]
The first object in the list is the indexing method for rows and the second for columns.
The default way of indexing data containing n rows is by using an integer range 0, 1, 2, 3,..., n−1. This index reflects the positions of the elements. As you can see above, our DataFrame uses only this type of row indexing (integer range): the first row has the 0 index, the last row has the index of 3.
Let's check the output of the df.info() method:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 species 4 non-null object
1 name 4 non-null object
2 legs 4 non-null int64
3 wings 4 non-null int64
4 looking_for_home 4 non-null object
dtypes: int64(2), object(3)
memory usage: 288.0+ bytes
As you can see, the first line describes the object class (DataFrame), then the data type for indexing, and then there is a list of columns that contains positional column indexes, column labels, Non-null Count (a number of non-empty rows,) and Dtype (a data type, it is automatically detected as object by Pandas).
The row index object is stored in df.index. We can see the current index by calling the corresponding attribute in the data frame:
RangeIndex(start=0, stop=4, step=1)
Since there are no row labels, the attribute will return an integer range. You can achieve the same result by using df.axes[0].
Tip: .info() also gives you positional indexes. In addition to positional indexing, it sometimes helps to use custom labels.
To see the column labels of a DataFrame, use df.columns:
Index(['species', 'name', 'legs', 'wings', 'looking_for_home'], dtype='object')Setting, changing, and resetting an index
One way to change column names is to assign a new value to the columns attribute. The new value should have the same length as the number of columns.
Let's change the value of some columns by assigning a list of new values to the columns attribute:
df.columns = ['col', 'col2', 'col3', 'col4', 'col5']
df.head()
Here is the output:
+----+-----------+-------------------+--------+--------+--------+
| | col | col2 | col3 | col4 | col5 |
|----+-----------+-------------------+--------+--------+--------|
| 0 | cat | Dr. Mittens Lamar | 4 | 0 | no |
| 1 | dog | Diesel | 4 | 0 | no |
| 2 | parrot | Peach | 2 | 2 | no |
| 3 | cockroach | Richard | 6 | 4 | yes |
+----+-----------+-------------------+--------+--------+--------+
As you can see, the columns now have different names. We can assign a new list of labels to the index attribute:
df.index = ['row', 'row2', 'row3', 'row4']
df.head()
This is what the table will look like:
+------+-----------+-------------------+--------+--------+--------+
| | col | col2 | col3 | col4 | col5 |
|------+-----------+-------------------+--------+--------+--------|
| row | cat | Dr. Mittens Lamar | 4 | 0 | no |
| row2 | dog | Diesel | 4 | 0 | no |
| row3 | parrot | Peach | 2 | 2 | no |
| row4 | cockroach | Richard | 6 | 4 | yes |
+------+-----------+-------------------+--------+--------+--------+
You can also use any column as an index. Let's index our data by name. We can do it with the set_index() method.
Most pandas functions do not change the existing DataFrame but instead return a new DataFrame object. So we can either assign a new DataFrame object to our df variable or use an optional argument inplace=True (although it's not recommended to use inplace=True, you can learn more here)
Let's return to our DataFrame and reset its index. Note that the DataFrame will no longer display the integer range:
df.set_index('name', inplace=True) # is equivalent to df = df.set_index('name')
df.head()
Here is the output:
+-------------------+-----------+--------+---------+--------------------+
| | species | legs | wings | looking_for_home |
|-------------------+-----------+--------+---------+--------------------|
| name | | | | |
|-------------------+-----------+--------+---------+--------------------|
| Dr. Mittens Lamar | cat | 4 | 0 | no |
| Diesel | dog | 4 | 0 | no |
| Peach | parrot | 2 | 2 | no |
| Richard | cockroach | 6 | 4 | yes |
+-------------------+-----------+--------+---------+--------------------+
Indexing is based on the name column.
If we look at the index attribute now using df.index, we can see that it changed from range to the list of names:
Index(['Dr. Mittens Lamar', 'Diesel', 'Peach', 'Richard'], dtype='object', name='name')
Tip: Only DataFrames have the .set_index() method.
We can set the index column back to default (integer range) by using reset_index(). As mentioned above, use inplace=True to initiate the changes:
df.reset_index(inplace=True)
Output:
+----+-------------------+-----------+--------+---------+--------------------+
| | name | species | legs | wings | looking_for_home |
|----+-------------------+-----------+--------+---------+--------------------|
| 0 | Dr. Mittens Lamar | cat | 4 | 0 | no |
| 1 | Diesel | dog | 4 | 0 | no |
| 2 | Peach | parrot | 2 | 2 | no |
| 3 | Richard | cockroach | 6 | 4 | yes |
+----+-------------------+-----------+--------+---------+--------------------+
Once we have reset the index, the name column becomes first. If you want to reindex your data and delete existing indexes, use drop=True.
Renaming columns
You can also use the .rename() method to rename your columns. Just pass a dictionary with old column names as keys and new column names as values:
df.rename(columns={'name': 'pet_name', 'looking_for_home': 'homeless'}, inplace=True)
df.head()
Here is the output:
+----+-------------------+-----------+--------+---------+------------+
| | pet_name | species | legs | wings | homeless |
|----+-------------------+-----------+--------+---------+------------|
| 0 | Dr. Mittens Lamar | cat | 4 | 0 | no |
| 1 | Diesel | dog | 4 | 0 | no |
| 2 | Peach | parrot | 2 | 2 | no |
| 3 | Richard | cockroach | 6 | 4 | yes |
+----+-------------------+-----------+--------+---------+------------+
As you can see, it's all very convenient. We don't need to mention every column if we want to rename only some of them. You can also use .rename() to change indexes: just pass the index = {...} argument instead of columns={...}.
Conclusion
In this topic, you've learned:
-
About Pandas
DataFrameaxes and how to display them using.axes,.index, and.columns; -
About index types: integer and label-based;
-
How to set, change and reset an index with
.set_index()and.reset_index(); -
How to rename columns with
.rename().
All our examples contained ordinal numbers as indexes. However, someday you will stumble upon examples that require label indexing. Take your time. Make sure that the method you opt for can bring you what you want instead of ruining everything with one typo.