Perl logo

Perl Refreshments ⌘

Regular Expressions #7 - Character Classes

In a previous page we had a list of metacharacters. The pair of [ ] square brackets are what we are going to discuss now.

They are used in a regex to indicate a character class, and matches any of the characters in the list.

For example, if you were looking for all the vowels in some text, you might do something like this:

$text = "this is a long stretch of characters containing some vowels and consonants";
print "\n$text";
$text =~ s/([aeiou])/\U$1/g;
print "\n$text\n";

regex7.1

Here we've substituted all lower case vowels with their upper case, using the \U metasymbol acting on $1, the first match, globally (greedily).

A range of letters can be indicated with a hyphen like this one that matches all lower case letters.

$text = "this is a long stretch of characters containing some vowels and consonants";
print "\n$text";
$text =~ s/([a-z])/\U$1/g;
print "\n$text\n";

regex7.2

One character that will confuse you is the caret: ^ .

Inside square brackets it means MATCH THE OPPOSITE
or DON'T MATCH THIS.

Otherwise it means match at the beginning of a string.


$text = "this is a long stretch of characters containing some vowels and consonants";
print "\n$text";
$text =~ s/([^a-z])/\U$1/g;
print "\n$text\n";

regex7.3

As you can see nothing was changed, because we stated we DID NOT want to match any lower case letters.

Character classes can contain any combination of characters or graphemes or glyphs you can imagine - even punctuation and Unicode characters.

word character\w
non-word character\W
digit\d
non-digit\D
whitespace (tab,form-feed,newline,return)\s
non-whitespace\S
horizontal whitespace character\h
non-horizontal whitespace character\H
vertical whitespace character\v
non-vertical whitespace character\V

Character Properties ...


-30-