Trick or Treat #2

#0 | #1 | #2 | #3 | #4

Storing Data in Your Script

Many times I've been working on a script that needed a set of data to work with. But I don't want to worry about having the data in a separate file, that could get accidentally deleted.

In this case I would use a feature of Perl, using something called a token.

There are 2 of these you can use in your script: __DATA__ and __END__.

You put either of these tokens at the end of your script and include your data after it. The script sees __DATA__ and treats everything after it as a data block.

This data block can then be 'read' as a file, and used by your script. __DATA__ is actually a FILE HANDLE.

Let's use some data I happen to have handy - some data we've already used earlier.

Paste this code to the bottom of your script (all executable code must go ahead of this):

__DATA__
ottawa|canada
tokyo|japan
stolkholm|sweden
tripoli|libya
athens|greece
hamilton|bermuda
kiev|ukraine
vienna|austria
rome|italy
oslo|norway

The data is 'formatted' so it's easier to work with. I try to use the vertical bar | as a field separator in data like this - it is not usually used in typical data so pretty safe to use.

You never want to use a field separator that may appear in the data you are working with.

Each line has 2 fields: city, country.

We want to read each line of the data, split it up into separate strings, and then perform some stringy things on them.

Here is the code. Paste it AHEAD of the __DATA__ block.

my ($city,$country,@cities,@countries); # declare some variables
while (<DATA>) { # HUH?
	chomp $_; # lose the '\n' at the end of the line
	($city,$country) = (split '\|');
	push(@cities,$city);
	push(@countries,$country);
}

OK, I did throw something new in there. Recall that __DATA__ is a FILE HANDLE. That is what Perl uses to refer to a specific file name, which we don't have (the data is internally stored).

It's a simple while loop - while there is something in __DATA__, read it.

chomp is a function that ONLY removes the end of record marker (usually a newline "\n").

Just to make sure we have the data, let's see it:

print join (",",@cities);
print "\n";
print join (",",@countries);

Now we can split up each line and continue our project - sorting the names by city, and capitalizing where needed.

@dataS = sort @data; # get sorted array
foreach (@dataS) {
	($city,$country) = (split '\|');
    $cities[$ndx]=ucfirst($city); #capitalize
    $countries[$ndx]=ucfirst($country); #capitalize
    $ndx++;
}
print "\n"x2; # what the heck is that?
for my $i (0..$#cities) {
    print "$cities[$i] is the capital of $countries[$i]\n"
}

Job Done!

I keep throwing new stuff at you - this time in the print statement print "\n"x2;

That's a handy little trick to remember - x is a repetition operator. It returns a concatenated string of the left operand the number of times specified by the right operand. In this case it prints 2 newlines.

If you supply a list (array or hash) to the left (left operand), it performs some neat things too.

my @fives = (1) x 3; # a list of 3 1's
foreach (@fives) { print "\$_ is: $_\n"; }
@fives = (5) x @fives; # set everything in @fives to 5
print "\n"x2;
foreach (@fives) { print "\$_ is: $_\n"; }

In this case, we are using a list in parentheses (1), causing x to work as a list replicator, not a string replicator. This is neat way to initialize arrays (or hashes) to the same value, even if you don't know the size of the array.