Perl logo

Perl Refreshments ⌘

Regular Expressions #6 - Assertions / Positions

To get even more control of your searching and matching, use assertions, sometimes referred to positions or anchors.

They look for a particular condition in the target string, and the regex engine looks for the pattern match BEFORE or AFTER that match position.

That being said, there are basically 2 kinds of assertions: lookahead and lookbehind.

assertions refer to certain conditions or position in a string, NOT the actual data.

We've seen 2 of these positional expressions previously with the ^ and $ symbols to indicate the beginning and end of a line.

Since they refer to a position or condition not a character, these expressions are know as zero-width assertions.

Valid Assertions

^matches beginning of a line
$matches end of a line, or before an ending newline
\bmatch a word boundary
\Bmatch a nonword boundary
\Amatch only beginning of a string
\Zmatch only end of a string, or before ending newline
\zmatches only the end of a string
\Gmatch only where previous m//g left off
(? = expr)match if expr would match next
(?! expr)match if expr would NOT match next
(?< = expr)match if expr would match previously
(?< != expr)match if expr would NOT match previously

A word boundary is the position between a \w character and a \W character, in either order.

If the order is \W\w it's a beginning-of-the-word boundary.

If the order is \w\W it's an end-of-the-word boundary.

A typical situation - replacing part of a string IF it appears.

$text = "chicken cake";
print "\n\$text: $text\n";
$text=s/chicken (?=cake)/chocolate /; <--- if you find any cake swap chicken for chocolate
print "\n\$text: $text\n";

regex6.1

Looks good. Here's the other side of it:

$text = "chicken milkshake";
print "\n\$text: $text\n";
$text =~ s/chicken (?=cake)/chocolate /; <--- if you find any cake leave it alone
print "\$text: $text\n\n";

regex6.2

We didn't find cake in our target string, so no substitution was made. (Yuck)

That was positive lookahead. Here is some negative lookahead - the target string DOES NOT contain our match.

$text = "curry cake";
print "\n\$text: $text\n";
$text =~ s/chicken (?!curry)/chocolate /; <--- if you don't find any curry swap chicken with chocolate
print "\$text: $text\n\n";

$text = "chicken cake";
print "\n\$text: $text\n";
$text =~ s/chicken (?!curry)/chocolate /; <--- if you don't find any curry swap chicken with chocolate
print "\$text: $text\n\n";

regex6.3

The top result fails because our target string DID contain curry, so no change was made. The bottom result was successful because the target string DID NOT contain curry. (Lucky for us)

Regular Expressions 7 ...


-30-