8 minutes read

When designing your regular expression patterns, you can use lookaround assertions to make your patterns match specific strings that follow or precede another pattern. Lookaround assertions are enclosed in parentheses; they do not return the matched pattern. That's why we can also call them zero-width assertions. We will look at them in more detail in the following sections.

Positive lookaheads

The first type of assertion that we are going to discuss is the positive lookahead. It is a regexp pattern that looks as (?=pattern).
Patterns with a positive lookahead match the pattern to the right side of the target match. For example, JetBrains (?=Academy) will match JetBrains only if Academy follows it. Let's have a look at how they work in the code snippet below:

pattern = r'JetBrains (?=Academy)'
string_1 = 'JetBrains Academy'
string_2 = 'JetBrains Company'
 
result_1 = re.match(pattern, string_1)  # match 
result_2 = re.match(pattern, string_2)  # no match

Negative lookaheads

A negative lookahead is a regexp pattern that takes the form of (?!pattern). It does completely the opposite: patterns with a negative lookahead return a match if a pattern defined in parentheses doesn't follow a string. In our example, JetBrains (?!Academy) will return a match only if JetBrains is not followed by Academy. Compare the results below:

pattern = r'JetBrains (?!Academy)'
string_1 = 'JetBrains Academy'
string_2 = 'JetBrains Company'
 
result_1 = re.match(pattern, string_1)  # no match 
result_2 = re.match(pattern, string_2)  # match

Positive lookbehinds

A positive lookbehind assertion pattern is an expression like this: (?<=pattern). In the same way as the positive lookahead, a positive lookbehind matches a string if the specified phrase precedes it. In our (?<=JetBrains )Academy example, Academy is the output. Mind the following snippet:

pattern = '(?<=JetBrains )Academy'
string =  'JetBrains Academy'

result = re.search(pattern, string)
print(result.group())  # Academy

There are two crucial things to account for if you opt for positive lookbehinds:

  • A positive lookbehind pattern can only match strings of fixed length. In other words, you can use patterns like JetBrains or [Jet |Brains ], but you cannot use \w+, JetBrains{1, }, or JetBrains.*, as their length can vary. Similar patterns with positive lookbehinds raise an error:

pattern = '(?<=JetBrains.*)Academy'
string = 'JetBrains Academy'

result = re.search(pattern, string)
print(result.group())  # ...re.error: look-behind requires fixed-width pattern
  • Patterns that start with positive lookbehind assertions do not match the beginning of a string. We recommend using the search() method instead of match() if you want your pattern to match the beginning of a string:

result_1 = re.match('(?<=JetBrains )Academy', 'JetBrains Academy')  # None
result_2 = re.search('(?<=JetBrains )Academy', 'JetBrains Academy')  # 'Academy'

Negative lookbehinds

The last assertion that we are going to look at is the negative lookbehind. You can define it as (?<!pattern). Negative lookbehind matches a string if the current position in the string is not preceded by the match:

pattern = r'(?<!JetBrains )Academy'
string_1 = 'JetBrains Academy'
string_2 = 'Hyperskill Academy'

re.search(pattern, string_1)  # None
re.search(pattern, string_2)  # Academy

Similar to the positive lookbehind, the negative lookbehind pattern matches only strings of fixed length. Also, patterns starting with negative lookbehind assertions don’t match the beginning of a string, so avoid using match().

Conclusion

In this topic, we have covered simple but useful regex tools called lookaround assertions. Let’s recap:

  • Positive lookahead (?=pattern) provides a match if the text is followed by the specified pattern;

  • Negative lookahead (?!pattern) provides a match if the text is not followed by the specified pattern;

  • Positive lookbehind (?<=pattern) provides a match if the text is preceded by the pattern;

  • Negative lookbehind (?<!pattern) provides a match if the text is not preceded by the pattern;

  • Lookbehind assertions do not match the beginning of a string, so prefer search() over match();

  • Lookbehind patterns work with fixed-length strings only.

80 learners liked this piece of theory. 2 didn't like it. What about you?
Report a typo