In regex, there is a group of characters known as quantifiers, which define how many times a certain character (or a character class) occurs in a regex pattern. A quantifier can follow both a regular character and a special one. In general, quantifiers are one of the most essential and important features of the regex language, since they allow a single pattern to match different strings varying in length.
The list of quantifiers
Here is a list of quantifiers to be remembered:
+matches one or more instances of the preceding character;*matches zero or more instances of the preceding character;{n}matches exactlyninstances of the preceding character;{n,m}matches at leastnbut not more thanminstances of the preceding character;{n,}matches at leastninstances of the preceding character;{0,m}matches no more thanminstances of the preceding character.
Note that there is also another quantifier, ?, which makes the preceding character optional. It is short for {0,1}. We will not consider this quantifier here because you should already know it.
The plus quantifier
Below you can see how we use the plus character, which matches one or more occurrences of the preceding character:
val regex = "ca+b".toRegex()
regex.matches("cab") // true
regex.matches("caaaaab") // true
regex.matches("cb") // false because it does not have at least one instance of 'a'
As you can see, it matches only those strings that have one or more instances of the 'a' character.
The asterisk quantifier
The example below demonstrates the use of the asterisk character, which matches zero or more occurrences of the preceding character:
val regex = "A[0-3]*".toRegex()
regex.matches("A") // true because the pattern matches zero or more occurrences
regex.matches("A0") // true
regex.matches("A000111222333") // true
As you can see, the asterisk quantifier, unlike the plus quantifier, allows the pattern to also match the strings that do not contain the "quantified" character at all.
In the following example, there is a pattern describing the string "John" located between an undefined number of undefined characters in the text:
val johnRegex = ".*John.*".toRegex() // it matches all strings containing the substring "John"
val textWithJohn = "My friend John is a computer programmer"
johnRegex.matches(textWithJohn) // true
val john = "John"
johnRegex.matches(john) // true
val textWithoutJohn = "My friend is a computer programmer"
johnRegex.matches(textWithoutJohn) // false
So, the asterisk quantifier can be used to check whether a substring of a string matches a pattern. Using it, we can skip spaces or any other characters we don't want to predict in our pattern.
Specifying the number of repetitions
Both previous quantifiers have a wide range of applications, but they do not allow you to specify how many times a character may occur. Fortunately, there is a group of quantifiers that allow specifying the number of instances in curly braces: {n}, {n,m}, and {n,}.
Take a look at the example where we demonstrate how to match exactly n instances of the preceding character using the {n} quantifier:
val regex = "[0-9]{4}".toRegex() // four digits
regex.matches("6342") // true
regex.matches("9034") // true
regex.matches("182") // false
regex.matches("54312") // false
Matching from n to m instances is possible thanks to the {n,m} quantifier. Note that the range specified in curly braces is inclusive at both ends: m encountered instances also count as a match. This is standard for the regex language regardless of the implementation.
val regex = "1{2,3}".toRegex()
regex.matches("1") // false
regex.matches("11") // true
regex.matches("111") // true
regex.matches("1111") // false
The last example demonstrates how to match at least n instances using the {n,} quantifier:
val regex = "ab{4,}".toRegex()
regex.matches("abb") // false, not enough 'b'
regex.matches("abbbb") // true
regex.matches("abbbbbbb") // true
The quantifier that matches not more than m instances works similarly. Try it yourself.
Conclusions
The key points of this topic are:
- in the regex language, quantifiers allow us to match strings varying in length.
- the asterisk quantifier matches zero or more instances of the preceding character.
- the plus quantifier is almost the same as the asterisk, except that it doesn't match the absence of a character. The minimum number of instances for it is one.
- curly braces allow more careful control of the number of occurrences: you can specify the minimum or the maximum number of instances, or both.