Regular Expressions, or Regex, are a powerful tool for processing text in almost every programming language, including Scala. Regex allows you to define a pattern of characters, and then match that pattern against a string, or find and replace instances of that pattern within a string. Using Regex in Scala, you can perform a wide variety of text manipulation tasks, from simple validations and data extractions to complex parsing and search operations. By learning how to use Regex in Scala, you can significantly improve your text processing capabilities and reduce the amount of manual effort required for manipulating text data. So, let's dive into the world of Regex and learn how to harness this powerful tool to make our text-processing tasks easier and more efficient!
Simple matching
First of all, we can create a regular expression using the .r method which can convert any string in Scala to a special class Regex. This class delegates to the java.util.regex package of the Java Platform:
import scala.util.matching.Regex
val numberPattern: Regex = "[0-9]".rIf you want to check whether the input matches the regex, you can use the matches method from Regex:
numberPattern.matches("1") // true
numberPattern.matches("123") // falseOur regex pattern doesn't match the string "123". This is because the Scala regex implementation checks if the entire string can fit into the regex pattern, not just a substring.
However, for convenience, Scala also allows you to directly use the regular expression syntax with certain methods of the String class. When you use the regular expression syntax with these methods, Scala automatically creates a Regex object for you behind the scenes.
"1".matches("[0-9]") // true
"123".matches("[0-9]") // falseFind patterns in a string
If you want to find or replace matches of your pattern in a string, you can use various find and replace methods of Regex. For example, pattern matching with a Regex can be accomplished using the findFirstIn method:
"[0-9]".r.findFirstIn("super-password-123") match {
case Some(number) => s"Password OK, contains number: $number")
case None => "Password must contain a number"
} // Password OK, contains number: 1The findFirstIn method returns an optional first matching string of the regular expression in the given character sequence. It returns None if there is no match.
If you want to find all matches of your pattern in a string, you can use the findAllIn or findAllMatchIn methods. These methods return a MatchIterator and Iterator[Match], respectively, that provide an iterator over all the matches found in the string.
val pattern = "[0-9]+".r
val text = "I have 10 cats and 2 dogs"
val matchesIterator = pattern.findAllIn(text)
matchesIterator.foreach(println) // 10 2In regular expressions, parentheses are used to group subpatterns. In Scala, we can use regular expressions directly in pattern matching, which allows you to extract found groups as values:
val numberPattern = "(\\d+)".r
val lowercasePattern = "([a-z]+)".r
val combinedPattern = "([A-Z]+) ([0-9]+)".r
val input: String = "ABC 123"
input match {
case numberPattern(number) => s"Input has only a number: $number"
case lowercasePattern(text) => s"Input has only a lowercase text: $text"
case combinedPattern(letters, digits) => s"Letters: $letters, Digits: $digits"
case _ => "No matched"
} // Letters: ABC, Digits: 123All found groups despite the pattern structure remain strings, so sometimes in combination with Regex you have to use unsafe methods like toInt
Replace patterns in a string
Also, if you want to change some matches of your pattern in a string, you can use the replaceFirstIn or replaceAllIn methods:
val pattern = "[Jj]ava".r
val text = "Java is a powerful language. I like Java!"
pattern.replaceFirstIn(text, "Scala") // Scala is a powerful language. I like Java!
pattern.replaceAllIn(text, "Scala") // Scala is a powerful language. I like Scala!Special symbols in pattern
In Scala, regular expressions work the same way as in other programming languages, and all special characters such as dots, question marks, pipes, and so on, have the same meanings. Here are some examples of how you can use these special characters in Scala:
"h.t".r.matches("hit") // true
"ab?c".r.matches("ac") // true
"...".r.matches("A1c") // true
"apple|orange".r.matches("apple") // true
"[abc]d[ef]".r.matches("bdf") // trueLet me remind you of some special characters:
.(dot) matches any single character except for a newline.?matches the preceding element zero or one time.|(pipe) is used for alternation, which means it matches either the expression on its left or the one on its right.
The tricky escape character
If you want to use a special symbol as a regular punctuation mark in your pattern, you can protect it by adding a backslash \ before it in your regular expression pattern. The backslash is known as an escape character because it allows symbols to "escape" their special meaning in the regular expression.
However, implementing such patterns in your Scala program can be more complicated. This is because the backslash \ character functions as an escape character not only in regular expressions but also in String literals. So, to use a backslash in your regular expression, you need to use two backslashes in your Scala code — one to escape the other. For example:
val endRegex = "The End\\.".
endRegex.matches("The End.") // true
endRegex.matches("The End?") // false
val answerRegex = "Of cours., ..\\.".r
answerRegex.matches("Of course, no.") // trueYou can use triple quotes to simplify the use of backslashes. This allows you to define regular expressions more easily and avoid the need for extra backslashes. For instance, to match one or more digits in a regular expression, you can use the following code:
"""\d+""".r.matches("123") // true
"\\d+".r.matches("123") // trueConclusion
Great! Now you can create a Regex from a page by calling the .r method. You know how to use Regex in pattern matching to extract data from strings. And you'll watch out for the use of the backslash to escape characters with single or triple quotes.