Computer scienceProgramming languagesPythonWorking with dataWorking with files

Glob module

9 minutes read

In this topic, we would like to introduce the glob module. It can find files and directories on your computer whose names match a specific pattern. It is a simple yet steady module if you spend a lot of time working with files.

This topic covers two glob methods — one searches for files; the other creates searching patterns. Once you master this topic, you will be fully equipped to use the glob module! One more thing. This module is a part of the Python Standard Library, so you don't need to install it from an external source.

Patterns

The glob module lets you use wildcards to search for files and directories whose names follow particular patterns. The rules for these patterns are the ones used by Unix Shell. They resemble regular expressions but much simpler:

Wildcard	Meaning
`*`	matches 0 or more characters
`?`	matches a single character
`[0-9]`	specifies a range of alphanumeric characters (from `0` to `9` in this case)
`[abc]`	matches only one character from the sequence (either `a`, `b`, or `c` in this example)
`[!abc]`	matches any character that is not in the sequence (any character that is not `a`, `b` or `c`)

All other characters are literals that match themselves. Don't worry if you are a bit lost right now. We will show examples in the following sections.

Searching for one file

As we have already mentioned, the glob module is quite straightforward. It has only three methods: glob, iglob, and escape. We start with the first one, as it is most wanted. Before we begin, don't forget to import the glob module to your program.

The syntax of the glob() method is glob.glob(pathname, *, recursive=False).

It returns a list of filenames that match the pathname (a pattern where you can use wildcards). The recursive flag is False by default; it means that the search will be performed only in the provided directory. If you set it to True, the pattern ** will match any files and subdirectories not only in the provided directory but also inside all subdirectories.

The * in between the pathname and the recursive flag passes this flag as a keyword argument. In other words, you can write glob('my_dir\\**', recursive=True) instead of glob('my_dir\\**', True).

The pathname can be either a path to an existing file on your computer (an absolute or relative) or a pattern.

As you remember, an absolute path starts from the root of the file system like C:\\Users\\User\\my_dir\\image.gif. Relative paths start from the current directory. For example, if your current directory is User, it will be my_dir\\image.gif.

Both ways can help you with finding a file on your machine.

Searching for multiple files

Routinely, you may want to find multiple files that match a certain pattern. Let's look at some examples. For instance, you want to find all jpg files in a directory. First, you need to write a pattern. It would look like this: my_dir\\*.jpg. Remember that * matches any number of any characters. After this, you insert it into the glob.glob() method, and that's it:

glob.glob('my_dir\\*.jpg')  # returns: ['my_dir\\1.jpg', 'my_dir\\image.JPG']

Now, let's try to find all files the names of which contain only one character. We don't know possible extensions of these files. That's how we can do the trick:

glob.glob('my_dir\\?.*')   # returns: ['my_dir\\1.jpg', 'my_dir\\a.txt']

? matches one character; . matches a dot; * stands for any number of symbols. Since it comes after the dot, it also indicates the extension of our files.

What do we do if we need all filenames in one directory? There's a simple solution – use the asterisk!

glob.glob('my_dir\\*')

It returns a list of all files and subdirectories in the my_dir directory.

If no files or directories are matching your search, it will return an empty list.

Iterable glob

glob.iglob() returns an iterator that yields the same values as glob(); the only difference is that it doesn't store them. As with any iterator, it can come in handy if you have a limited amount of memory. Here's what the call looks like — glob.iglob(pathname, *, recursive=False)

The pathname is written in the same way as in the glob() method, and the recursive flag also works the same.
Let's say we want to find all files with a three characters name where the first two characters are any digits. The third one is any character except 0. To do it, we need square brackets and an exclamation mark:

generator = glob.iglob('my_dir\\[0-9][0-9][!0].*')

for item in generator:
    print(item)

[0-9] represents a range. It is any number from 0 to 9. [!0] means any character which is not 0; . is literally a dot; * is any number of characters after a dot, in other words — the extension.

Don't forget that ranges can include letters, too. For example, [p-s] means any letter from the p-q-r-s range.

This search can return the following files: 12a.txt, 345.jpg, 00j.csv but not 120.txt.

With Python 3.10, you can use the new root_dir and dir_fd parameters in glob() and iglob(), which allow you to specify the root directory for searching.

Escaping

The last method glob.escape() can escape special characters: *, ?, and []. In other words, * would no longer mean "any character"; it would mean only an asterisk. A question mark would be a literal question mark. The brackets will be mere brackets. It may be useful if these characters are in the filename you would like to find.

The syntax is even simpler than in the previous methods: glob.escape(pathname). Note that there is no recursive flag. This is explained by the fact that glob.escape() doesn't search for anything. It returns a string — a pathname with escaped characters that you can then pass to glob.glob().

Let's say we need to find a subdirectory with the [dir] name. glob.glob('my_dir\\[dir]') will not work in this case. Remember that [] is a special symbol, and, in this case, your query will return a subdirectory called d, i, or r, if there's one. So, how do we find our [dir]? That's where we use the escape method:

glob.escape('my_dir\\[dir]')  # returns: 'my_dir\\[[]dir]'
# now we pass the result to the glob() method:
glob.glob('my_dir\\[[]dir]')  # returns: ['my_dir\\[dir]']

First, we get a string with the escaped characters we needed for the search. Then we perform the actual search and get a list with results. The escaped string [[]dir] contains [] to select the left bracket [; it is no more identified as a special symbol. Then we write the rest of the string as-is — dir]. The right bracket is a special symbol only when it follows the left bracket.

Conclusion

In this topic, we have learned:

to write patterns for file searching with the help of special characters — *, ?, [ - ], [], [!];
to search for files and directories using the patterns and the glob.glob() method;
to create generators to yield filenames with glob.iglob();
to escape special characters with glob.escape().

Now let's move on to practice!

53 learners liked this piece of theory. 4 didn't like it. What about you?

Report a typo