Directory and File Handling #6

#0 | #1 | #2 | #3 | #4 | #5 | #6

Directories & Files

Our last page showed how to read specified lines from a file, but the line numbers had to be hard-coded into the script. That was just to show you how it could be done. The script would be much more useful if you could give it something to look for right on the command-line while executing the script.

That's what we'll do next - pass a request to the script and have the script pull out what we want.

I'm going to use the same data as previous - a partial list of my music. Here is an example of 1 CD in the list. Each CD is a kind of data 'record', separated from each other with a blank line:

/Volumes/Backup/MouthBreather/Plex/Jazz/Bill Frisell - Further East-Further West (Live 2003-2004):
101 Lookout For Hope.mp3
102 Monroe.mp3
103 Big Shoe.mp3
104 Egg Radio.mp3
201 Lost Highway.mp3
202 Masters of War.mp3
203 What The World Needs Now.mp3
204 Somewhere Over The Rainbow.mp3
205 Prelude - Body and Soul.mp3
206 Paradox.mp3
207 Cluck Old Hen.mp3

The list was created using the ls command. The very first line of each 'record' is the full path of where the CD is located. In my case it is on an external drive /Volumes/Backup, buried in 3 sub-directories /MouthBreather/Plex/Jazz/, and finally in it's own directory Bill Frisell - Further East-Further West (Live 2003-2004)

Following are the filenames in the directory.

It is very important when searching through data
that the data is CONSISTENT.

By that I mean that there is a definite pattern you can determine.

The set of data we are going to use is somewhat 'structured':

There may be variations in some of the data, but you need to know exactly what they are - you have to write the code to deal with it.

For example in some of this data, the track numbers are wrapped in square brackets [01] .

Others may have 3 digits 101 to specify CD1 track 1, or 2 digits followed by a dot 11.

The code has to be able to recognize all these slight 'differences' in order to do its job properly.

The rest of the lines we don't worry about for now.

To solve this little project, let's break it down into manageable steps:

  1. make sure there is a command-line argument & get it
  2. open the specified file
  3. search the file for the argument
  4. if found, search for tracks only
  5. stop searching if the line is empty

The first step in this process requires an introduction to another 'special' Perl variable: @ARGV.

@ARGV is used exclusively for holding command-line arguments - things you want the script to use. In this case it is what artist name we are looking for. Using shift we can grab the first element in the array, which will get passed to $Input. See Arrays #2 for a reminder.

Some of these steps will go into a loop, since there may be several CDs by same artist.

For step one, we can use the 'Artist name' as it appears after the last front slash. We need to do it this way, because if we just search for the name, it may appear on a line that isn't what we want.

So, using the example above we would enter 'Bill Frisell' or even 'Bill Fris'. Just using 'Bill' will find 'Bill Evans', 'Bill Holman Band', and 'Bill Frisell'. But looking for 'Frisell' won't work.

We are going to use the front slashes as 'delimiters' in our search to break up the first line. Since we are only interested in the Artist name at this point, we know that the artist name starts after the last front slash.

Steps 1 & 2:

#! /usr/bin/perl
use strict;
use warnings;
	open (STDERR,">> $0.txt");
	print STDERR "\n", scalar localtime, "\n";
my $File="JazzB.txt"; # file to search
my $Artist; # what we find after the last '/'
my ($start,$end,@keepers); # used to keep track of where we are
my $Want=6; # how many '/' we need to find
my $count=0; # how many have we found
my $pat = "/"; # pattern to use to find $Artist in our search
my $Input=shift (@ARGV); # command-line input

if ($Input eq "") {
    print "Artist name required.\n";
    exit; # quit if no input
open (my $FILE,"<",$File) or die "Can't open $File: $!\n";
print "Looking for '$Input' ...\n";
while (<$FILE>) {

Once we've checked for input, we can open our file and start a loop (one of 3).

Step 3:

    if ($_ =~ m{/$Input}) { # finds $Input as directory
        $Artist=((split $pat,$_)[$Want]); # gets Artist - CD
        print "$Artist:\n";

The first line above @keepers=(); sets up an array to hold any artists matching our query. Next we delete the 'newline' from the end of each line we read: chomp;

Now we start a loop, searching for our input. Note that we are looking for '$Input' immediately following a front slash if ($_ =~ m{/$Input}). This eliminates other lines.

If we find a match we now need to split out the Artist from the line: $Artist=((split $pat,$_)[$Want]);

We split the line ($_) on our pre-defined pattern ($pat) and keep what comes after the 6th match ($Want) - which happens to be the Artist name & CD title. So let's print it!

Step 4:

        while (<$FILE>) { # restart
            last if /^$/;
            push @keepers,$_ if ($_ =~ m/^\[?\d\d/); #match [##] ##. ##

This isn't a typo - we actually start another loop reading the file. What this does is get the next line in the file, but the loop gives us a chance to perform a few things with each line from here on.

Again, we delete the newline from the end of the line.

If the line is blank we exit the loop: last if /^$/;, because a blank line is our record separator.

That pattern should be familiar - it matches the beginning and end of a line, with nothing between them.

At this point we have a line that isn't blank, so we 'push' it into our array @keepers, but ONLY if it matches how a track is listed in our file: if ($_ =~ m/^\[?\d\d/);.

We know our data has tracks listed on lines starting either with at least 2 digits, or at least 2 digits wrapped in square brackets. That's what m/^\[?\d\d/ does.

This also eliminates other lines, such as any image files or PDFs included in the directory.

        foreach my $tr (@keepers)  { print "\t$tr\n"; }
close $FILE;

Now that we have only the track data saved in our @keepers array, we can print them out.

After that we can close the loop we started earlier: if ($_ =~ m{/$Input}).

And finally close our file: close $FILE or die "Can't close $File: $!\n";.

A note about that last statement: $! in the 'die' output stands for the system error message that just occurred. Another debugging trick.

All done!

Here is what this script produces on my iMac using my data:
Note: I'm looking for 'Bill Evans', not 'Bill Frisell'

You may see that the first track is number [101] but the rest do not have brackets. This was intentional to check our code actually caught a line number beginning with a bracket - and it does!

If you organize your music collection (or perhaps movie collection) in a similar way, you could build a search script like this. To create your list of files to search through, use ls on a Mac or Linux machine, or dir on a Windows machine. Redirect the output to a file, then point to that file in the code with the variable $File.

Of course, at some point you may want to put your data into a regular database. Perhaps we will cover that in future pages of refreshment.