Computer scienceProgramming languagesPythonAdditional instrumentsAlgorithms and data structures

Index() and in under the hood

5 minutes read

You've worked with a variety of collections including tuples, lists, and dictionaries. In this topic, we'll see the performance of the in operator and the index() method on several collections and analyze their algorithms under the hood.

In with ordered sequences

The in operator determines whether an element in the sequence exists; if so, it returns a boolean value. As you can imagine, the not in operator returns the opposite result:

tuple_months = ("January", "February", "March", "April")
print("July" in tuple_months)  # False
print("October" not in tuple_months)  # True

list_months = ["July", "August", "September", "October"]
print("December" in list_months)  # False
print("October" not in list_months)  # False

sentence = "It was the scarcity that fueled his creativity."
print("i" in sentence)  # True
print("scarcity" not in sentence)  # False

You can use the in operator with lists, strings, and tuples. The operator performs a linear search on all of them. It starts at the beginning and checks all the items in the sequence. What we can draw from this is that the time complexity of in is $O(n)$ .

In with dictionaries

This operator also works with dictionaries. As you remember, dictionaries store key-value pairs. If you know the key, you can access the value instantly. Imagine you are doing a word search in the dictionary. You're curious about the definition of the term "harmony". First, you go to the page with the letter "h" to get the word's meaning. So, you have instant access to the information because you know where to look for it. As a result, the time complexity is $O(1)$ .

numbers = {1: "one", 2: "two", 3: "three", 4: "four"}

print(2 in numbers)  # True
print("three" in numbers.values())  # True

As we see in the example above, it outputs True for two operations. The first process checks whether the key exists. So, its time complexity is $O(1)$ . In the second operation, we perform the membership control process on the values. The number_dict.values() method returns a list of values. The time complexity is $O(n)$ .

In with sets

You can also use in with sets. As it uses the hash table structure, the search process is very fast. It has a high impact on the search operation with a list. It doesn't go through all elements. Since each value has a key (hash) information, the search is instant. Its time complexity is $O(1)$ . Let's look at an example.

Imagine you have an e-book that contains around 300000 words. You want to get rid of certain forbidden words. We will write a program to perform the clean-up. We will refer to the in operator with lists and sets. Observe the differences:

import time

# We stored text and banned words in lists.
corpus_words = [...]  # 287_968 words
forbidden_words = [...]  # 18_683 words

start = time.time()

safe_words = [word for word in corpus_words if word not in forbidden_words]

end = time.time()
print(end - start)  # 0.22110567077

With list comprehension, if a word is not in the forbidden words, we have put it in the safe_words. We have utilized the time module to figure out how long it took us to complete this task. We see that the program fulfills the function in almost 0.2 seconds. The running time of the program is not good enough. What would be the benefit if we use a set in this operation?

start = time.time()

s_corpus_words = set(corpus_words)
s_forbidden_words = set(forbidden_words)
safe_words = [word for word in s_corpus_words if word not in s_forbidden_words]

end = time.time()
print(end - start)  # 0.02106636719

As we can see in the code, the in operation is 10.5 times faster than the other. The reason for this is that we perform the check operation after returning our lists to the sets. In this process, repetitive words are deleted when converting lists into sets, and the data in the list is saved to the set with the hash information. Since the hash information of the word is known, the search process takes place instantly.

In with for loops

We use the in operator in almost every for loop. For uses in and : to complete the syntax. In a for loop, the in operator is used with iterable objects. It enumerates the elements and enables us to use them:

grades = {"Anna": 9.5, "Bob": 8.0, "Claire": 7.5, "Dan": 6.0, "Emma": 9.0}

for name, grade in grades.items():
    if grade == min(grades.values()):
        print(name)  # Dan

In the code snippet above, it iterates over the dictionary items and prints the student with the lowest grade.

The index() method

The index() method is another essential function. It returns the initial index of the element. Since it scans all the items, the time complexity is $O(n)$ . In some ways, it is similar to the in operator. They can work on a string, list, and tuple. The in operator returns True, while the index() method returns its position. The index() method throws ValueError if it cannot find the element, and the in operator returns False.

tuple_names = ("Hannah", "Alex", "Jack", "Jill", "Hannah")

pos_Alex = tuple_names.index("Alex")
pos_second_Hannah = tuple_names.index("Hannah", pos_Alex + 1, len(tuple_names))

print(pos_Alex)  # 1
print(pos_second_Hannah)  # 4

The index() method can optionally take start and end parameters. If they are set, we perform the search process in a range.

Summary

To sum up, in this topic, you've learned:

How the in operator works on different data types;
The purpose of the in operator;
Similarities and differences between the index() method and in operator;
How to use the index() method.

Now let's move on to the practice.

7 learners liked this piece of theory. 0 didn't like it. What about you?

Report a typo