Pattern matching is a programming paradigm that enables developers to match data structures and extract information based on their shape and content. It is widely used in functional and declarative programming languages, and was introduced in Python 3.10, which has taken the language a significant step forward in terms of expressiveness and readability.
In this topic, you will learn the syntax of pattern matching and creation of complex patterns, as well as some core concepts and use cases of this feature.
Syntax
A similar concept had already been implemented in C++ or Java (switch construction). So, if you've previously worked with those languages, you might find the syntax familiar. To use pattern matching in Python 3.10, you need two main keywords: match and case. Let's look at a simple example to better understand this concept:
def http_handler(response):
match response:
case 200:
print('OK')
case 400:
print('Bad request')
case 404:
print('Not found')
case 500:
print('Internal server error')
case _:
print('Unknown error')
http_handler(200) # OK
http_handler(404) # Not found
http_handler(500) # Internal server error
http_handler(610) # Unknown error
As you can see, we've declared a function http_handler that processes HTTP responses and prints the matching message. Note that an underscore wildcard (_) covers any other cases not mentioned above.
Patterns
Now that we've grasped the basic syntax, let's dive deeper into pattern matching. In the previous example, we used an integer in the match expression. However, we can create much more complex cases. Patterns defined after the case keyword can include literals, variables, boolean operators, and many other constructs. For instance, if you want multiple statements to match a specific case, you can use the "boolean or" (|) operator:
def http_handler(response):
match response:
case 200 | 201:
print('OK')
case 400:
print('Bad request')
case 404:
print('Not found')
case 500:
print('Internal server error')
case _:
print('Unknown error')
http_handler(200) # OK
http_handler(201) # OK
http_handler(404) # Not found
http_handler(500) # Internal server error
http_handler(610) # Unknown error
Now, the message "OK" prints for both "200" and "201" HTTP responses.
You can also use lists and declare variables inside a case. Imagine you have a program that takes an input from the user and runs the corresponding command. To parse such input, you would usually create a complex if-else statement. Let's see how you could accomplish the same using pattern matching:
def parse_input():
user_input = input()
match user_input.split():
case ['greet']:
print('Hello!')
case ['say', word]:
print(f'I am saying {word}')
case ['execute', *args]:
for arg in args:
some_function(arg) # an arbitrary function you might have defined above
case _:
print(f'Unknown command: {user_input}')
Let's dive deeper into the code: the split() function returns a list, so we use lists in every case statement. In the second case, we defined a variable word. This syntax means that any list of two elements with the first element equal to "say" will match this case. The third case is similar, except that now the list can be of arbitrary length, and the first element should be "execute". We used a star (*) to make args iteratable, and looped through them, calling some_function for each of the arguments. You could also use tuples instead of lists by replacing parentheses with square brackets.
It's possible to blend lists with a "boolean or" (|) operator and add conditions for a more precise pattern, like so:
def parse_command(cmd):
match cmd.split():
case ['ls', ('-la' | '-a' | '-l')]:
print('Matches the "ls" command with one of the specified arguments')
case ['cd', path] if os.path.exists(path):
print('Change directory if the specified path exists')
case _:
print(f'Invalid command: {cmd}')
In this example, we're parsing a command from the input and checking if it meets the requirements. In the first case, we specified the possible options for the "ls" command. The second case can match if the second argument of the list represents an existing path in your file system. The function behavior is showcased below:
parse_command('ls -la') # matches the first case
parse_command('ls -a') # matches the first case
parse_command('ls -o') # matches the last case, prints "Invalid command: ls -o"
parse_command('ls') # matches the last case, prints "Invalid command: ls"
parse_command('cd venv') # matches the second case, if the "venv" directory exists
parse_command('cd non_existent_dir') # matches the last case,
# prints "Invalid command: cd non_existent dir"
Pattern matching also supports the use of dictionaries like this:
match my_dict:
case {'city': 'Paris'}:
print('This block matches any dictionary which has the pair "city" - "Paris"')
case {'city': 'London', 'country': 'Great Britain'}:
print('Similar to the previous case, but the dictionary must contain both pairs')
case {'language': lang, 'country': country, 'capital': capital}:
print('This block matches any dictionary with the keys '
'"language", "country", and "capital" linked to any values')
case _:
print('As usual, this matches anything!')
As you can see, the pattern matching feature lets you write significantly cleaner and more readable code. The examples above barely scratch the surface. You can even use custom classes as patterns to match, as such:
from dataclasses import dataclass
@dataclass
class Student:
name: str
age: int
grade: float
@dataclass
class Teacher:
name: str
subject: str
def match_person(person):
match person:
case Student('John', 17, 4.3):
print('This case matches a student with those exact properties')
case Student('Emily'):
print('Matches any student named "Emily"')
case Teacher('Mr. Smith', subject):
print(f'Matches any teacher with that name. The subject is stored: {subject}')
case _:
print('Unknown argument')
We've declared two classes: Student and Teacher. Please note the @dataclass annotation is essential for pattern matching to work. Remember, data classes are designed only to hold data values and usually don't have any methods. Now we can define patterns by describing the values stored in the fields. The syntax is similar to what we've previously discussed about lists and dictionaries. Our function will behave like this:
match_person(Student('John', 17, 4.3)) # matches the first case
match_person(Student('Emily', 12, 2.5)) # matches the second case
match_person(Teacher('Mr. Smith', 'geography')) # matches the third case
match_person(Teacher('Mr. Smith', 'literature')) # matches the third case
match_person(Teacher('Mrs. Holmes', 'math')) # matches the last caseMatching logic
In Python 3.10, the match statement evaluates each pattern in order and matches the first suitable case, disregarding the rest. So, if you were to declare two identical patterns, the one listed first would be matched:
def print_cmd(cmd):
match cmd:
case 'greet':
print('Hi!')
case 'greet':
print('Hello!')
case _:
print('Bye!')
print_cmd('greet') # prints 'Hi!'
For this reason, it's forbidden to declare case _ before all the other cases are listed. Since the wildcard (_) would make the remaining clauses unreachable. Also, writing such code results in a SyntaxError.
Pattern matching is excellent for working with data classes, not only because it helps write cleaner code. It's also implemented in a lightweight way. When declaring a data class in a pattern, no new instance is created. The case merely checks that the passed object is an instance of the corresponding class and has required field values, without creating a new object.
Compatibility and limitations
Despite how effective the pattern matching feature is, it's crucial to note that it's not backward compatible with earlier Python versions. It's available in Python 3.10 and subsequent versions. This means that if you want to leverage the features of the match statement, you must ensure you're working with these newer Python versions.
Pattern matching can also have performance constraints. If a programmer writes overly complex patterns, which are costly to assess, or uses large data structures, the performance impact of using the match statement can be considerable for large datasets, complex structures, or performance-critical applications. However, for typical code, this impact should be negligible. You should profile your code to identify any bottlenecks and consider alternative optimization strategies, such as simplifying patterns or optimizing data structures. Furthermore, overly complex patterns can diminish readability, as in the following example:
case [['key1', 1, 2, 3], 'key2', ['value']] | ['val', 1, 2, 3] | [{'key1':'value1'}]:
// An overly complex and unreadable pattern
By just looking at this line, it's tough to decipher exactly what it should do. Even though you might remember its purpose when you wrote it, you will likely soon forget. Not to mention other developers who see this line for the first time. Therefore, please avoid constructing such patterns.
Conclusion
Structural Pattern Matching in Python 3.10 is a transformative addition to the language. It offers a new way to work with data structures, simplify code, and improve readability. With the capacity to succinctly match and extract data from complex data, Python developers can now tackle a broader range of tasks more effectively, from data analysis to software development. As can you, now that you've learned:
The syntax of pattern matching;
How to create sophisticated patterns to work effectively with complex data;
How pattern matching can enhance the parsing process;
How to use data classes inside patterns.