Perl logo

Perl Refreshments ⌘

Regular Expressions #3 - Pattern Modifiers For 'm//' and 's///'

NOTE: My version of Perl is currently 5.18 so any lists of modifiers, quantifiers, etc. may be different from yours.

Continuing our examination of the 'm//' matching operator. We've seen one in the previous page. Here is a list of modifiers for both 'm//' and 's///':

Are you getting the sense that regexes are a bit complex?

We don't have examples of all of them, but here are some to give you an idea.

$text="xxxxxxxxx";
while ($text =~ m/x/g) {
     print "Found the 'x' spot.\n";
}

regex3.1

$text="Change this to all upper case.";
print "Old \$text: $text\n";
$text =~ s/(\w+)/uc($1)/ge;
print "New \$text: $text\n";

regex3.2.png

I've thrown something new at you - again.

The \w+ is one of special characters mentioned here. We use it here to match a word character (sometimes called a letter).

The new stuff is that first match we put into the back reference $1, which is then put into upper case uc($1).

And since we used the global modifier /g, it executes on ALL previous matches, which in our case is ALL the letters (word characters) in the string.

The e at the very end matches an escape character, which in our case essentially ends the match. Without it, here's what would we would see:

regex3.3.png

Back References

As the name implies, a back reference refers to a previous match. Very handy to be able to do this.

For example, HTML is based on tags such as <a to indicate the beginning of an Internet anchor. Most HTML tags must also have a closing tag. In this case </a>.

Word processors work the same way. When you center some text, there are special (hidden) codes inserted around that text to format it that way.

How would a coder check to make sure all his opening HTML anchor tags had a matching closing tag?

Back references can be indicated by a leading dollar sign $ and a digit: $1, $2 ... or by a back-slash \1, \2 ....

$text="<a>Here is an anchor.</A>";
if ($text =~ /<(a)>[\w\s\.]+<\/\1>/i) {
     print "\nComplete anchor tag found.\n\n";
}
else {print "\nIncomplete anchor\n\n";}

regex3.4.png

However, if there was no closing anchor tag ...

$text="<a>Here is an anchor.";
if ($text =~ /<(a)>[\w\s\.]+<$1>/i) {
     print "\nComplete anchor tag found.\n\n";
}
else {print "\nIncomplete anchor\n\n";}

regex3.5.png

Note we've used $1 for this reference, and i modifier to catch upper or lower case.

By running something like this on a page of HTML, a coder would see if all the anchor tags were closed.

Regex 4 - tr/// ...


-30-