Regular expressions have a very important feature called capturing groups. Kotlin offers many tools to manipulate this feature and use its power. This topic will help you understand those tools. But before that, you should start by getting to know a class called MatchResult. Eager to get started? Let's dive in!
The MatchResult class
Understanding the MatchResult class in Kotlin will help you work efficiently with regular expressions. This class gives you detailed information about the outcome of a match operation. For instance, if you find a match in a string with the find() function, it gives you a MatchResult object, or null if there is no match.
val regex: Regex = """a(\d+)b""".toRegex()
val input = "a123b a4bc"
val firstMatch: MatchResult? = regex.find(input)
Other functions that give you a MatchResult object include matchAt() and matchEntire(). There is also the findAll() function that gives you a sequence of MatchResult objects.
The MatchResult class has these properties and method:
value: This property holds the sequence the regular expression matched. It's aStringthat shows the whole match.
println(firstMatch?.value) // prints: a123b
range: This property tells you the range of indices where the match happened. It's anIntRangeinstance, which is a range of integers.
println(firstMatch?.range) // prints: 0..4
groups: This property gives you aMatchGroupCollection, which is like a map and lets you get to the captured groups. The first element in the collection is the whole match, and the ones after show the captured groups one by one. Each group is aMatchGroupwith avaluefor the matched sequence and arangefor the indices range.
Remember, a group in regex is the part inside parentheses.
println(firstMatch?.groups)
// prints: [MatchGroup(value=a123b, range=0..4), MatchGroup(value=123, range=1..3)]
groupValues: This property gets you a list of strings. The first one is the whole match, and the others are the captured groups. If a group captured nothing (for example, if it's optional and didn't appear in the input), its value in the list is an empty string.
println(firstMatch?.groupValues) // prints: [a123b, 123]
destructured: This property lets you unpack the group values. It returns aDestructuredobject, which has component functions corresponding to group values. You can use these when you work with group values.
The number of variables on the left-hand side of the assignment must match the number of capturing groups in the regex pattern.
Here's an example:
val regex = """(\d{4})-(\d{2})-(\d{2})""".toRegex()
val input = "2024-01-01"
val secondMatch = regex.find(input)!!
val (year, month, day) = secondMatch.destructured
println("Year: $year") // Year: 2024
println("Month: $month") // Month: 01
println("Day: $day") // Day: 01
In this example, the regex (\d{4})-(\d{2})-(\d{2}) finds dates in yyyy-mm-dd format. The find() function gives a MatchResult for the date "2023-12-31". We use the destructured property and unpack the values for year, month, and day capturing groups.
We use the !! operator to make sure matchResult is not null. Be careful with this - it can cause a NullPointerException if it's not true.
next(): Use this to find the next match after the current one in the input. If there are no more matches, it returnsnull.
What's the difference between findAll() with the Regex class and next() with the MatchResult class? Good question. While findAll() gives you all matches to look at, the next() method lets you control the process better, which might save memory and computational effort.
println(firstMatch?.next()?.groupValues) // prints: [a4b, 4]
So, the next() method comes in handy when you want to check each match in a string on its own, which can be better than finding all of them at once, especially for very long input strings.
Next, let’s use what we've learned with a practical example to see these ideas in action.
Practical example
Let's look at a practical example that shows how to use MatchResult's features to change strings. We will write a program that finds all the numbers in a string and raises them by 10%, rounding the result to two decimal places:
fun main() {
// Create a regular expression to match numbers with optional decimal point
val regex = Regex("""\d+(\.\d+)?""")
// Define a string with number information
val input = "The price is $12.99 for the first item, and $9.99 for each additional item."
// Create a string builder to store the modified string
val output = StringBuilder()
// Find the first match in the input string
var match = regex.find(input)
// Keep track of the last index of the previous match
var lastIndex = 0
// Loop until there are no more matches
while (match != null) {
// Append the substring from the input string before the current match
output.append(input.substring(lastIndex, match.range.first))
// Convert the matched value to a double and multiply it by 1.1
val number = match.value.toDouble()
val increasedNumber = number * 1.1
// Round the increased number to two decimal places and append it to the output
output.append("%.2f".format(increasedNumber))
// Update the last index to the end of the current match
lastIndex = match.range.last + 1
// Find the next match using the next() method
match = match.next()
}
// Append the remaining substring from the input string after the last match
output.append(input.substring(lastIndex))
// Print the output string
println(output)
}
In this example, the regular expression \d+(\.\d+)? is used to find numbers in the input string. You start by calling the find function to get the first MatchResult. You then start a loop that goes through each match found by the regular expression. For every match, you use the range property of the MatchResult to find the start and end indexes of the matched substring in the input string. This lets you capture the text before the matched number and add it to a StringBuilder.
Next, you get the new desired number and add it to the StringBuilder. After dealing with a match and adding the changed number, you use the next() method on the current MatchResult. This method continues the search and gives you the next MatchResult, which lets the loop go on with the following matches.
After you have handled all matches, you add the rest of the input string to the StringBuilder. You get this final part by using the range property of the last match to find the text that's left after the last match.
At the end, you turn the StringBuilder content into a string and print the new output. It will be:
The price is $14.29 for the first item, and $10.99 for each additional item.
In the next two sections, you will explore the idea of capturing groups and look at two examples: non-capturing groups and named capturing groups.
Non-capturing groups
You already know that groups in regular expressions treat a sequence of characters as a single unit by enclosing them in parentheses. For example, the pattern (abc) creates a single group that includes the characters 'a', 'b', and 'c'.
Usually, every group in a regular expression is a capturing group, meaning you can access each group separately after a match, using methods like groups and groupValues. But regular expressions also let you decide which groups to capture using a special syntax.
Non-capturing groups are handy when you need to structure your patterns with groups but don't want to capture their contents.
To show an example, let's look at the regex from our earlier practice:
val regex = Regex("""\d+(\.\d+)?""")
If you print the captured groups with groupValues in our loop, you'd see extra captures that aren't needed:
Groups: [12.99, .99]
Groups: [9.99, .99]
Here, the grouping is just for applying the optional ? quantifier, and capturing the decimal part as a separate group isn't needed. To skip capturing this group, use the syntax (?:pattern), which tells the regex engine to group without capturing.
By changing our regex pattern to:
val regex = Regex("""\d+(?:\.\d+)?""")
The groupValues output will now be:
Groups: [12.99]
Groups: [9.99]
This output only includes the full match and leaves out the unnecessary captured group.
Named capturing groups
Regular expressions can get complex, and remembering groups by their order can be hard. Kotlin helps by letting you name capturing groups. Named capturing groups make your regex patterns easier to read and manage, making it simpler to refer to parts of the pattern.
To make a named capturing group, you use the syntax (?<name>pattern), where name is the label for the group and pattern is the sequence of characters to match. For instance, (?<digits>\d+) makes a group named "digits" that matches a string of one or more digits.
It's easy to get to named capturing groups: use the groups property and the name of the group as the key.
Here's an example:
val regex = """(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})""".toRegex()
val input = "2023-12-31"
val matchResult = regex.find(input)
println(matchResult?.groups?.get("year")?.value) // prints: 2023
println(matchResult?.groups?.get("month")?.value) // prints: 12
println(matchResult?.groups?.get("day")?.value) // prints: 31
In this case, the regex matches dates in the format yyyy-mm-dd. The find() function gives you a MatchResult object for the match in "2023-12-31". Then you use the groups property of the MatchResult to get to the year, month, and day capturing groups.
You could write the previous code snippet in an idiomatic way like this:
val regex = """(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})""".toRegex()
val input = "2023-12-31"
val matchResult = regex.find(input)!!
println(matchResult.groups["year"]?.value)
println(matchResult.groups["month"]?.value)
println(matchResult.groups["day"]?.value)
You can also get to captured groups by their indices, where index 0 represents the whole match and the captured groups begin from index 1:
println(matchResult.groups[1]?.value) // same as groups["year"]Conclusion
Throughout this topic, you've learned about the MatchResult class in Kotlin and looked into its main properties and methods, such as value, range, groups, groupValues, destructured, and next().
You also discovered non-capturing groups for building regex patterns without saving data and named capturing groups for making code clearer and easier to maintain. These tools help you handle complex patterns and text with ease.
With these abilities, you can now skillfully handle the results of match operations.
Happy practicing!