Strings #3

#0 | #1 | #2 | #3 | #4

I hope you did try working on the last project - good for you if you did. Programming requires a particular way of thinking, and not everyone is suited to it. But if you are 'organized' and are curious, this might pique your interest.

Our project consisted of a few steps to get the data into a useful state. One of the steps was to sort the list.

We know that is very easy to do if we put the data into an array. I've started a new file in my text editor and called it ''. So, here is what my code looks like:

use strict;
use warnings;

my @geog = (

That puts the data into an array - a good start. I don't know about you but I'm lazy, so I use cut-and-paste as much as possible. In this case I already gave you the list of strings. If you selected that list into your clipboard, then copied it into your Perl script, it is a lot better and faster than typing it. Typing is always open to mistakes, so any time I have a chance to avoid mistakes I take it.

Next I declare (set up) some arrays I'll need later. Might as well get this done now:

my @cities; 
my @countries;
my @geogSorted;
my $geogSize; 
my $ndx=0; 
my $thiscity; 
my $thiscountry; 

Now let's get some programming going on:

$geogSize = $#geog+1;
@geogSorted = sort @geog;

print "\n$geogSize records.\n";
print "\nData sorted:\n";
foreach (@geogSorted) { 
    print "$_\n";

That takes care of finding out how many records we have, and sorting the list (by city since that's how the list is organized). Now some tricky stuff. We need to put both the cities and countries into separate arrays.

To do this we obviously have to 'walk through' the original list; pull out the city; put it into an array; pull out the country; put into an array.

While we're at it, can we capitalize the first letter of each word?

That sounds like a loop. Let's see if we can get this.

foreach (@geogSorted) {
    $thiscity=substr($_,0,index($_,":") );
    $thiscountry=substr($_,index($_,":")+1,length($_) );
    $cities[$ndx] = ucfirst($thiscity);
    $countries[$ndx] = ucfirst($thiscountry);

When you read over that code, see if you can understand how it's doing the steps we outlined. The order that you do things affects what you can do later. Perl is called 'self-documenting', which means the code should be (mostly) self-explanatory.

Let's go over it line-by-line. We know we're going to loop over the raw data to get at the contents - hence:

foreach (@geogSorted) {

Next we separate each string into a city and a country. In this case we start with a nicely formatted list: the cities and countries are separated by a colon ( : ). Structured (formatted) data is ALWAYS easier to deal with. We can use that to our advantage.

    $thiscity=substr($_,0,index($_,":") );
    $thiscountry=substr($_,index($_,":")+1,length($_) );

These 2 lines pull out the city and country from each line. We use the colon ":" as a separator (field terminator). The string on the left of the ":" is a city; the string on the right of it is a country.

Next we put each city and country into its own array, but capitalize the first letter:

    $cities[$ndx] = ucfirst($thiscity);
    $countries[$ndx] = ucfirst($thiscountry);

Remember that we set $ndx to be 0 at the beginning of the script. We use $ndx as the index into our 2 arrays, but we have to increment it each time we go through the loop (otherwise it is stuck at 0). The result is you only have 1 item in your array - the last one.

Hence the $ndx++. And we close our loop with the closing brace }.

The '++' is an auto-increment operator. It increments the variable by 1.

If it appears AFTER the variable, it returns the value of the variable, then does the increment.

If it appears BEFORE the variable, it increments the variable, then returns the variable.

There is also an auto-decrement operator: '--' which decrements the variable by 1, following the same rules.

Extremely handy in loops.

Now we have our data the way we want - all we have to do now is print it:

print "\nCities:\n";
foreach (@cities) { 
    print "$_\n" 
print "\nCountries:\n";
foreach (@countries) { 
    print "$_\n" 
print "\nSorted by city:\n";
for ($ndx=0; $ndx < $geogSize; $ndx++) {
    print $cities[$ndx] . " is the capital of " . $countries[$ndx] . "\n";
print "\n";

That's it! About 60 lines of code did all that. Not too shabby.

We could have done away with putting the cities and countries into separate arrays, but it was a good opportunity for learning! Apologies.

As you (if you?) continue on with programming in Perl, you will discover ways to shorten your code up, or do the same job a different way.

In our next refreshment we continue with a few more string-handling tools.