Search #2

#0 | #1 | #2

meta characters

This is a good place to introduce some special characters (not the usual suspects) used in pattern matching.

Imagine trying to search for a forward slash (/) or a period (.) - those characters as we know already have special meaning in Perl.

This and many other situations are dealt with using metacharacters. The word 'meta' means "about" or "above" or "beyond", as in "metadata" which means "data about data". So 'metacharacters' are special characters with meaning 'beyond' their representative glyph.

In regular expressions, characters generally match themselves, unless of course they are metacharacters.

To put some of these to use here are some examples. First we look for the digit '3' at the end of a line (as the last glyph):

foreach (@stats) {
    print "$_\n" if ( /3$/);
}

As you can see, it found the 7 lines that match.

Now let's look for the substring 'to'.

foreach (@stats) {
    print "$_\n" if ( /to/);
}

Are there any apostrophes in our data?

foreach (@stats) {
    print "$_\n" if ( /'/);
}

A matching pattern is case sensitive by default. So now let's see how it works using the pattern modifier /i to ignore case.

my @londonStr = ("London","lonDON","LONDON","london","LoNdOn","londom");
foreach (@londonStr) {
    print "$_\n" if ( /london/i );
}

Note: we missed the last item because it's spelled wrong.

The literal pattern ("london" in this case) can also be replaced by a variable:

my @londonStr = ("London","lonDON","LONDON","london","LoNdOn","londom");
my $pattern = "london";
foreach (@londonStr) {
    print "$_\n" if ( /$pattern/i );
}

And if you find using front slashes ( / ) confusing, you can use other pairs of delimiters using m in front:

my @londonStr = ("London","lonDON","LONDON","london","LoNdOn","londom");
my $pattern = "london";
foreach (@londonStr) {
    print "$_\n" if ( m"$pattern"i );
}

Finding a match at the end of a string ("c" followed by 1 character, followed by "d"):

my $string="barcadbarbarbarcadbarbarcad";
print "found it\n" if $string =~ /c.d$/;

Sometimes we may want to only match a pattern if it is present a certain number of times. For this we use the {#} quantifer.

To see if the pattern "barbarbar" is in the string "barcadbarbarbarcadbarbar" we would do this:

my $string="barcadbarbarbarcadbarbar";
print "found it\n" if $string =~ /(bar){3}/;

Note that our pattern is enclosed in parentheses (known as a molecule). Our quantifier {3} and pattern are BOTH enclosed within forward slashes.

Whereas, if that was coded as /(bar{3})/, it would only match barrr in a string.

The {#} quantifier matches the character or molecule immediately before it.

Hopefully this shows you that regular expressions can be very exacting and precise. A regex developed by Jeffrey Friedl for matching an email address against the correct standard format, is over 6,000 bytes long and takes a whole page to print (in very small type)!

That concludes our introduction to searching and pattern matching, but there is a LOT more to this topic than covered here.