Search #1

#0 | #1 | #2

While sorting data is a fundamental procedure, you will quite often need to search for a particular item in a list.

For example, to use our @stats array from a previous page:

my @stats = (
"st. john's:nl:1999:29",
"st. john:nb:2002:12",

Now we want to pull out only lines with a particular year, say 2000.

We could do this in a loop using the index function:

foreach (@stats) {
    push (@matching,$_) if (index $_,"2000") > 0;
foreach (@matching) {
    print "$_\n";

The index function returns the value of the position where the substring was found - usually > 0. We've used this previously.

As you delve deeper into searching, you will come across the terms regular expressions (also known as regexes) and pattern matching.

These terms describe the complex field of searching for a particular item in a list - the proverbial needle in a haystack.

Perl being Perl however, there are some very powerful tools to assist you in this. Regular expressions can be be so complex, they are sometimes thought of as their own language. A very comprehensive book dealing with them is Mastering Regular Expressions by Jeffrey E.F. Friedl.

Regular expressions and pattern matching are also important for data validation. Checking user-supplied data for proper formatting before submitting it to your complex Perl program may save you hours of frustration.

As well, any web-based forms need to be checked on the server for proper data format and expectations. Web-based forms are a frequent source of problems (such as SQL injection). Each form basically hands control of your application over to the user, and we all know what THAT could result in.

Assume nothing
Check everything

pattern matching - regular expressions

Before we get into this topic, we should cover something else first - character classes.

To talk about 'matching' strings, we need to start at the 'atomic' level - characters. If you think about characters a bit, you realize there are a few ways we can organize them.

Those are just examples of character classes - we can put any combination of characters between our brackets.

Shortcuts exist for some common character classes (because programmers are typically lazy).

To work with regular expressions, we will introduce some Perl operators specifically for this:

=~, !~, m//, s///

=~ and !~ are binding operators. They determine if our pattern is matched (=~) or not matched (!~) in the string.

matching m//

The matching operator is perhaps easiest to explore first:

Here we are looking for the pattern "2000" in our @stats array from earlier:

foreach (@stats) {
    print "$_\n" if ($_ =~ m/2000/);

As you might expect, here is the result:

To use character classes:

foreach (@stats) {
    print "Lower case letters found\n" if (/[a-z]/);
    print "Digits found too\n" if (/[0-9]/);
    print "Non-whitespace found\n" if (/[\S]/);
    print "Tabs not found\n" if (/[^\t]/);
    print "Airport found\n" if (/yyz/|/YYZ/);

Something to note about this code:

To find lines that do not match a particular pattern, use the !~ operator:

foreach (@stats) {
    print "$_\n" if ($_ !~ /london/);

Of course, in our foreach block, we could do any number of things to $_ (the whole line), such as split it into separate elements.

my (@cities,@starts,@staff,@provs) = ();
my ($city,$prov,$start,$staff);
foreach (@stats) {
    ($city,$prov,$started,$staff) = split ":" if ($_ =~ m/2000/);
    print "$city\n" if ($_ =~ m/2000/);

Now a bit of a twist: sometimes the $_, =~, and m can be left out entirely in the if block. The following code results in the same output as the first example above:

foreach (@stats) {
    print "$_\n" if ( /2000/);

This works because $_ is our default variable for the current item in the loop, and pattern matching is very common, so Perl lets us do things like this. Don't you just love Perl?

The example above (/2000/) searches for the EXACT pattern between the 2 slashes. To emphasize this let's search for letters instead of numbers.

foreach (@stats) {
    print "$_\n" if ( /London/ );

Since the pattern "London" is not in our data list, no lines get printed.

Searches will try to match the EXACT pattern IF IT APPEARS ANYWHERE in the string. It is CASE SENSITIVE unless specifically coded.

substitution s///

We mentioned once before that Perl is known for having 'More Than One Way To Do It'. Here is another way to do string substitution using the s/// operator.

Capitalizing "london":

foreach (@stats) {
    $_ =~ s/london/London/;
    print "$_\n";

Can we fix all the cities?

foreach (@stats) {
    ($city,$prov,$started,$staff) = (split ":");
    my $cityUC = join ' ', map { ucfirst } split / /, $city;
    $_ =~ s/$city/$cityUC/;
    print "$_\n";

Might as well fix the provinces too:

foreach (@stats) {
    ($city,$prov,$started,$staff) = (split ":")[0,1,2,3];
    my $provUC = uc($prov);
    $_ =~ s/$prov/$provUC/;
    my $cityUC = join ' ', map { ucfirst } split / /, $city;
    $_ =~ s/$city/$cityUC/;
    print "$cityUC:$provUC:$started:$staff\n";

Next, meta characters ...