Strings #4

#0 | #1 | #2 | #3 | #4

There are many functions that Perl uses to handle strings. Perl was designed and written by Canadian linguist and programmer Larry Wall in 1987. It borrows some features of Lisp (LISt Processing) and C. The Human Genome Project relies heavily on Perl, since it deals with complex patterns of text.

Perl is more like a human language than most programming languages. Perl was designed to be easy for humans to write, not for computers to read.

According to 'Programming Perl', the language name is Practical Extraction and Report Language. Perl is used heavily in applications where lots of complex text needs to be managed or mangled.

String-handling is one of Perl's strong features. There are far too many to discuss here, and there are many web sites that cover everything as well. The functions I would like to discuss here will complement your toolbox of string-handling: reversing strings; comparing strings; splitting a string; joining a string.

String Reversal

You may laugh at this, especially if you're getting to know the way Perl works. Recall how arrays are reversed using the reverse function? Guess what?

The same function is used to reverse a string.

my $string3Rev = reverse($string3);
print "$string3\n$string3Rev\n";
print "\n";

Did I mention strings are considered 'arrays' of characters?

Or, you could just print it, without creating a new variable:

print "\nDESREVER:\n" . reverse($string3) . "\n\n";

Comparing Strings

Perl uses different 'operators' depending on whether you are comparing numbers or strings:

It is important to use the correct symbols depending on whether you are dealing with numbers or strings. For example, here we will compare 2 numbers:

my $var1 = 5;
my $var2 = 10;
if ($var1 gt $var2) { 
    print "$var1 is greater than $var2\n";

Obviously wrong numerically but correct if compared as strings one character at a time - '5' is greater than '1'.

Alternatively, using numeric compare symbols in a string reference:

my $var1 = "apples";
my $var2 = "oranges";
if ($var1 == $var2) { 
    print "$var1 are the same as $var2\n";

Not what you would expect.

Note that this result appears because the 'use warnings;' line was not included at the beginning of this script.

Here is what it tells me with 'use warnings;' included:

Thus a good reason to include that line.

Splitting A String

This is a very frequent operation when you are dealing with text, especially taking formatted (structured) text and building reports. Something Perl is very good at.

We could have introduced this function earlier, with the cities / countries project. But as you can see, there are many ways to accomplish the same result. This one might be easier.

To use the same example, begin with the same list of cities and countries. Start a new text file and name it ''.

Add this code:

use strict;
use warnings;

my @geog = (
my @cities; 
my @countries;
my $ndx=0; 
my $thiscity; 
my $thiscountry; 

foreach (@geog) {
    $thiscity=((split ":")[0]);
    $thiscountry=((split ":")[1]);
    print "$thiscountry :: $thiscity\n";

Save the file ( Here is how it works:

The split function splits a string into an array of strings. The arguments to split are:

In English:

Recall that 'split' creates an array of strings, and arrays are indexed numerically, beginning with 0. Ah ha!

So those 2 lines tell Perl that in the first 'split' operation we want the string indexed at 0, and in the second 'split' operation we want the string indexed at 1.

Since the lines are arranged in 'city:country' format, the first split gives us the city (0) and the second split gives us the country (1).

Splitting strings on a character is something you will probably do frequently. Many times you may have to search through a log file, or a large spreadsheet that has been saved as a 'CSV' (Comma-separated Values). This is where 'split' shows it power. As an example, here is a list of some files in a directory.

-rwxr-xr-x@ 1 user  staff  1725 Apr  3 13:09
-rwxr-xr-x@ 1 user  staff   709 Apr  3 21:50
-rwxr-xr-x@ 1 user  staff  1288 Apr  5 13:20
-rwxr-xr-x@ 1 user  staff   431 Apr  5 20:50
-rwxr-xr-x@ 1 user  staff  6011 Mar 27 22:13
-rwxr-xr-x@ 1 user  staff  1407 Apr  3 13:07
-rwxr-xr-x@ 1 user  staff  1046 Apr  5 20:17

Notice that the data seems to be in columns, separated by single spaces. In fact, that is exactly what it is. So we have essentially some formatted (structured) strings of data.

The columns in order (on my Mac) are 'permissions' 'links' 'owner' 'group' 'size' 'creation month' 'creation date' 'creation time' 'name of file'.

Knowing what we do now about splitting lines of text, here is how we could pull out just the file sizes (4th column): (remember we start at 0)

my @files=(
"-rwxr-xr-x@ 1 user  staff  1725 Apr  3 13:09",
"-rwxr-xr-x@ 1 user  staff   709 Apr  3 21:50",
"-rwxr-xr-x@ 1 user  staff  1288 Apr  5 13:20",
"-rwxr-xr-x@ 1 user  staff   431 Apr  5 20:50",
"-rwxr-xr-x@ 1 user  staff  6011 Mar 27 22:13",
"-rwxr-xr-x@ 1 user  staff  1407 Apr  3 13:07",
"-rwxr-xr-x@ 1 user  staff  1046 Apr  5 20:17",
foreach(@files) {
    print ( (split " ")[4] ."\n");

This is a pretty cool and useful feature to have - get familiar with it.

Joining Strings

What's the point of being able to split strings up if you can't put them back together again?

The join function surprisingly does just that!

Let's use the above file listing to see how it works. First we will split out the names and sizes. Then use join to put them together.

foreach(@files) {
    $size = ( (split " ")[4]);
    $file = ( (split " ")[8]);
    print join(" is ",$file,$size) . " bytes long.\n";

From this we can see that the arguments to the join function are:

  1. the expression we want between our strings joining them together
  2. the list of strings we want to join with our expression

Another simple example:

print join(" ","Can","all","these","words","be","joined","into","one","string?");

So much more professional (and less typing) than:

print "Can ";
print "all ";
print "these ";
print "words ";
print "be ";
print "joined ";
print "into ";
print "one ";
print "string?\n";
print "\n";

Or you can specify an 'empty string' to join the others, as in:

print join("","H","e","l","l","o") . "\n";

Or even ...

print join("","H","e","l","l","o","\n");

... avoiding the string concatenation operator ' . '.