Directory and File Handling #1

#0 | #1 | #2 | #3 | #4 | #5 | #6

Directories & Files

In our last set of pages we introduced the hash structure for managing data of various complexities.

While hashes may be suitable for some types (and volume) of data, at some point you may find your script slowing down to a crawl.

This would be a good time to investigate other methods of storing and accessing your data. Building a database is one option, but certainly not for the timid and inexperienced.

Another option might be to put your data into an external file, and have your script read the file and then do whatever you like with the data.

This has some advantages, depending on the type of data you have, and what your available tools are already.

For example, if all you want is a viewable list of your music collection, here is something to consider.

My particular collection is organized at the directory level. I have various genres of music saved into separate directories.

Then under each genre I have directories named after the artist and CD title: Artist - CD Title. This directory then contains tracks and other files.

It's very simple to get a list of your files saved into a separate file, which can then be used by your Perl script.

This step is done from a shell window. I will concentrate on OSX or Linux methods here.

One of the first things to understand is how Perl relates to a physical file or directory.

Directories are just special types of files

To open a file we must use a FILE HANDLE. This is similar to a variable with 1 major exception:

We've already seen the use of a file handle in our BEGIN statement at the top of our scripts: open (STDERR

There are 2 more that we have access to without even declaring them: STDIN and STDOUT:

Using directories and files typically involves 3 steps:

  1. open the file/directory via the file handle
  2. do something with it
  3. close the file/directory handle

It is considered 'good practice' to always check the result of opening AND closing directory handles and file handles.

opendir (my $DIR,"$dirCur") or die "Can't open $dirCur:$!\n";
closedir ($DIR) or die "Can't close $dirSource:$!\n";
open ($myFile,">>","myFile.txt") or die "Can't open $myFile: $!\n";
close ($myFile) or die "Can't open $myFile: $!\n";

Let's jump in shall we? We're going to open the current directory and get a list of files in it.

#! /usr/bin/perl
use strict;
use warnings;
use Cwd;
	open (STDERR,">> $0.txt");
	print STDERR "\n", scalar localtime, "\n";

my ($dirCur,$dirSource,$file);
opendir (my $DIR,"$dirCur") or die "Can't open $dirCur:$!\n";
print "Currently in ",$dirCur, "\n";
closedir ($DIR) or die "Can't close $dirCur:$!\n";

To give us some extra utility we use a Perl module that is already part of the Standard installation - Cwd.

Cwd is a set of platform-independent functions to tell us some things about the Current working directory. We use it here to 'get the current working directory', no matter where we are.

Executing that script should simply print out what directory you are currently in.

To get a list of files in this directory, we need to add the following code:

print join "\n", readdir($DIR);

after the 'opendir' statement and before the 'closedir' statement.

Now executing the script should also give you a list of the files in this directory (the list may be sorted or not).

Notice the first 2 'files' are . and .. - the dot and double-dot. They stand for 'the current directory' and the 'parent of the current directory'. That is very standard across several operating systems.

You may not want to see the whole list of files - there may be hundreds or thousands. Instead we can specify a 'filter' to the readdir function using the glob function.

To get a specific list of files, we need to do something like this:

opendir (my $DIR,"$dirSource") or die "Can't open $dirSource:$!\n";
print "Currently in ",$dirSource, "\n";

print join "\n",glob ('l*.pl'),"\n";

closedir ($DIR) or die "Can't close $dirSource:$!\n";

Here, I'm only looking for files that start with l and have an extension of .pl. The * is a special matching character that catches all other characters. So basically all my Perl files starting with 'l'.

The opendir and readdir functions only provide access to the filenames. Other data (size, date, permissions, etc.) is metadata stored in an inode, and has to be accessed via different means. For that we need to use the stat function.

Since we have the ability to open a directory, and get its contents, we can easily put that data into an array. Add the following 2 lines before the closedir statement, then execute the script again:

my @files = glob('*.pl');
print join ", ",@files;

If you are in the directory with all your Perl files, you should see a list of them separated by a comma, as indicated by the print join statement.

And of course once we have access to files in a directory, we can do several things with each one. Some useful things to know would be the size of the file, when it was created, and the permissions of the file.

For that we have to use the stat function mentioned earlier, since this information is stored separately from the filename.

stat is a function that returns several pieces of data about a file, accessed as slices in an array:

  1. 0: device number
  2. 1: inode number
  3. 2: file mode (file type and permissions)
  4. 3: number of hard links to the file
  5. 4: numeric user ID (owner of the file)
  6. 5: numeric group ID of the file's designated group
  7. 6: device number (special files only)
  8. 7: size of the file in bytes
  9. 8: last access time in seconds since the epoch
  10. 9: last modify time in seconds since the epoch
  11. 10: inode change time in seconds since the epoch (NOT creation time)
  12. 11: file system's block size for files
  13. 12: number of blocks allocated for this file

I can see your eyes starting to glaze over so let me clear a few things up.

First, unless you're a system administrator, you will most likely only need a few of these items (or perhaps none at all - just the file name):

To see how we could use the stat function let's use our list of Perl files from above. After closedir add the following lines:

foreach my $file (@files) {
    next if $file =~ /^\.\.?$/; # skip dot and double-dot
    ($size,$mtime) = (stat $file)[7,9]; # <-- array slice
    print qq{$file|$size bytes|},scalar gmtime($mtime);

This time when you execute the script you should some useful data about each file.

We've used the next operator and a match pattern in our loop to avoid displaying our 2 'dot' directories.

There is a special quoting operator 'qq' in the print statement. It prints the string enclosed in braces { } in double quotes. It also interpolates any variables in the string.

To print the time $mtime in a meaningful manner we have to use the gmtime function in a scalar context.

Here is the list of my Perl files with the size and time stamp added:

Note that I've used the 'vertical bar' | as a separator between each piece of data. This will come in handy later.

Without using gmtime to convert the time, here's what it would look like:

The time is the last modification time of the file, in seconds since the epoch! Very handy to know.

More file metadata ...