Sorting #2

#0 | #1 | #2

map - sort - map

In our last page (Sorting #1) we mentioned some Perl magick.

We have covered the ingredients that make up this next bit of code, but it is used in a way that is more efficient. Sorting can be very intensive so any chance we have to reduce the number of comparisons, the better.

For example, sorting 10 elements requires about 46 comparisons. However, sorting 1,000 elements requires about 14,000 comparisons! This could easily make your sorting routine crawl to a halt.

Something we will be asked to do is to sort a list by 'a computable field'.

This situation occurs for example, when you have a list to be sorted, but pulling out the piece you want to sort on is a complicated function. You don't want to be calling that complicated function in your sort routine.

Pull out the piece separately first, then run the sort.

The routine to do that uses map and sort in a way now known as the Schwartzian Transform. It is more efficient because it doesn't use 'temporary' arrays to hold items.

Randal L. Schwartz came up with the solution to a problem and his name was given to it. The algorithm is also used in other languages, such as LISP and awk, and the Unix shell.

Recall how we used an array to hold our similar elements together previously (cities, provinces, start dates, staff numbers). But they are not really necessary.

Instead, we use a map-sort-map combination, as follows:

my @quickly_sorted_files =
    map  { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map  { [$_, -s $_] }
    @files;

Using our data from @stats, here is how to sort by staff number:

my @statsOrdered =  
    map {$_->[0] } # map the whole line back
    sort { $a->[4] <=> $b->[4] } # 4 is staff
    map { [$_, (split ":"),$_,[4,3,2,1] ]} # split $_ into 4
    @stats;
}

Another new item for you: #

Everything after the # (hash mark, octothorpe) on a line is considered a "comment", and is not used by Perl.

Use them to explain something to yourself or others.

Read this code from the bottom up - if it were written on 1 line it would be read from right to left.

Recall that $a and $b are internal variables used to determine the the sorting order: ascending or descending.

Now we print the ordered array:

foreach (@statsOrdered) {
    print "$_\n";

Here's what it produces:

To sort the data by start year instead, we merely change the sort line to work with the correct field. In this case, start year is field #3.

my @statsOrdered =  
    map {$_->[0] } # map the whole line back
    sort { $a->[3] <=> $b->[3] } # 3 is start
    map { [$_, (split ":"),$_]}
    @stats;

We run into the same situation here as earlier - we have several branches starting in the same year. But we can use the same sort technique to fall through to a secondary comparison block.

In this case when 2 branches have the same start date, we will then sort by city; if a city has more than 1 branch starting the same year, we will then sort by the number of staff.

my @statsOrdered =  # what to end up with
    map {$_->[0] } # map the whole line back
    sort 
        { $a->[3] <=> $b->[3] # start date
        or 
        $a->[1] cmp $b->[1] # city
        or 
        $a->[4] <=> $b->[4] } # staff
    map { [$_, (split ":"),$_]} # split each line on the colon
    @stats; # what to start with

print "\ncity\tprov\tstart\tstaff\n";
foreach (@statsOrdered) {
    print "$_\n";
}

Note that we use the correct comparison operator depending on whether we're comparing strings or numbers.