Archie - Just One of the Gang

Copyright © 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 Amer Neely

In my last article I introduced you to FTP, the Internet tool for transferring files quickly from one computer to another. Like many complex topics, trying to talk about just one part of the Internet is difficult - so much of it is based on a large body of knowledge that's been around for years, and a lot of what happens is hidden from the end user. Knowing how to use FTP is like knowing what you want and where the haystacks are, but without a magnifying glass.

The haystacks are the anonymous FTP sites introduced last issue. There are thousands of these sites on the Internet just packed with software for you to download. You can get a list of the sites by browsing the news.answers newsgroup and looking for the multi-part posting. You can also get a list by visiting Perry Rover's Web site.

So now you know where the haystacks are, but without a magnifying glass. This is where Archie comes in. It's the search tool you need to find files on FTP archive sites (remember - Archie is "archive" without the "v"). You're quite free to browse the anonymous sites to see what they have, but if you're really looking for a particular program, Archie is the way to go.

Who is this Archie character?

Archie was developed by computer science students at the University of Montreal initially, but is now maintained by a Montreal company calledBunyip. Hmmm, wonder if ... nahhh, must be just a coincidence. There are roughly 60 servers in 20 countries worldwide. A program is run twice a month that gets a list of files on each of the servers listed in a database. The database searched by Archie is comprised of about 1,500 ftp sites, holding over 5 million files. Yes, that's FIVE MILLION files! So every time you do an archie search, you are searching a very large index.

Why would you use Archie?

Let's say you're looking for a calculator program for your computer - the one that comes with it just doesn't do what you need. You could spend a lot of time surfing the Web looking for something, but you would most likely be wasting your time. After all, you're looking for a program, not a document.

How does it work?

Twice a month a program is run on each of the archie servers listed in a database. The program visits these sites and collects a list of files it finds there. This becomes the database searched by Archie. Each Archie server contains the same information, so it really doesn't matter which one you choose to do your search. Of course, it makes more sense to choose a server geographically close to you, although it might seem cool to perform your search on a server in Australia. Be a good netizen and stay close to home when performing searches if possible. Nowadays each server collects only the lists on servers close at hand, and the data is shared with other archie servers as needed.

Remember, you are searching for filenames. If you have ever used the "find a file" feature of your computer, you are well on the way to getting the hang of Archie.

Where is Archie anyway?

Like other Internet applications, Archie can be accessed several ways. If you have a computer running Windows or a Macintosh, you may already have an Archie "client" that you can run on your computer. It may have been part of your package when you signed up with your Internet service provider. A popular one for the Windows 3.x crowd is WS- ARCHIE. Windows 95 users have three choices, and Macintosh users have a couple of choices available at TUCOWS in the Macintosh section.

If you don't have your own, don't worry. Archie servers are around which you can access to perform your search. Here's a list of some which you can telnet to:

archie.au Australia
archie.univie.ac.at Austria

archie.luth.se Sweden
archie.doc.ic.ac.uk United Kingdom
archie.internic.net United States

Telnet

If you can telnet to one of these sites, login as "archie" (except in the case of the U.S. site - login as "guest". Most times you will not be asked for a password. If so just press the [Enter] key.

You might be able to point your browser to these using "telnet://archie.au" for example in the "goto" or "open" feature.

Once logged in, you should get a prompt such as "archie.au>". Read the introductory screen/s completely. They tell you what the default settings are for the Archie search and any other important information you should know.

At the prompt, the simplest command is "prog" (for program!) followed by the search string you are looking for. In our case it would be "prog calculator". If "prog" doesn't work, try "find".

Archie Client

If you have your own client program, you still have to point it to an archie server before performing a search.

World Wide Web

And last but not least - yes, you can even do Archie searches over the Web, (and even by e-mail). Here are a couple of archie sites available from within your Web browser:

The nitty gritty

OK, here's the part we've all been waiting for - what exactly do you type in to do your search? Well, in our case, we're looking for a calculator program, so how about using "calculator" as our search term? Hmmm, no rocket science here. In fact, a bit of imagination and creativity would do you well. Other people may not have the same sense as you when it comes to naming files intuitively, so you might have to play around a little. But "calculator" is as good as any starting point.

What you get as a result may differ slightly from what I got, but essentially you get a list of files / directories that match your search string. The default settings are usually to match any sub-string, and case is not important. Thus if you were to search for "calc", that would find matches for "calcification" as well as "calculus", "Calcutta", "decalcify", and "calculator".

Here is a partial list of my results:

Host ftp.fh-rosenheim.de

   In Directory pub/systems/WIN95/Utils
          Directory calculator 1024 Jan 3 13:52

Host ftp.uni-mannheim.de

   In Directory systems/msdos
          Directory calculator 512 Apr 17 15:35

Host ftp.uwo.ca

   In Directory pub/unix/X-windows/X.V11R5/contrib/lib/xview3/bitmaps
          File calculator 1283 Oct 5 1991

   In Directory pub/unix/X-windows/X.V11R5/mit/include/bitmaps
          File calculator 1274 Jul 23 1989

Note that the host name is given first, then all matches of your query at that site. The first two above are in Germany (.de is the country code for Germany) and the last two are here at the University of Western Ontario. However, only the ones in Germany look promising for our purposes. Check out those directories to see what's in them.

Now that you've found a couple of possibilities, you can use FTP to go to these sites and download the software to see if it's what you want. Which brings up a good point - it's not JUST software that's available at FTP sites. You may also find image files, source code, documents, statistical data, or other information. Try using other search strings to see what you find.

Fine tuning

Most Archie servers let you adjust a few parameters before starting a search. For example, you can specify that case IS important, or that the query matches exactly your request, not just a substring. Some servers even let you sort the results, for example by host name or date. You can specify to search only certain domains (.com .ca for example). This is useful if your search returns a large number of "hits". You can narrow it down to those that reflect what your needs are. Searching for an Italian version of Eudora would most likely be found on domains ending in ".it".

If you are accessing Archie by telnet, here are some commands you should know about. Client users may be able to configure their program to reflect the same settings.

quit
      Quits the Archie program.

help
      By itself gets you a "help" prompt. From there you can enter "?"
      to see a list of help topics. eg. can enter "help set" to see help
      on that topic. To get back to the regular archie prompt, enter
      "done" or press the [Enter] key until you do.

prog [pattern] *** this might be renamed to "find". ***
      This is the actual command to do the search. pattern is the actual
      search string you are looking for, without the brackets.

show search
      Tells you what the current setting is for the type of search
      (exact, regex, sub, subcase).

set search [type]
      Apply a different rule to the search. Must be one of [exact,
      regex, sub, subcase].

      "exact" matches exactly, and case must match. eg "calcu" only
      matches "calcu", not "Calculus" or "calculator".

      "regex" uses Unix regular expression. This may be the default.
      Very powerful. Similar to MS-DOS wildcards. By default regex
      assumes ".*" at the beginning and end of a string unless a caret
      (^) or dollar sign ($) is used.
      +     the dot (.) stands for any single character.
      +     the asterisk (*) stands for zero or more occurrences
            of the preceeding regular expression.
      +     a caret (^) as the first character of the search
            string matches strings that begin with the search
            string. eg "^calc" matches "calculator" but not
            "decalcify".
      +     a dollar sign ($) as the last character of the search
            string matches strings that end with your search string.
            "calculat.*\.exe$" matches "calculator.exe",
            "calculate.exe" but not "calculator.doc" or "calcula-
            tor.txt".
      +     square brackets ([]) can be used to surround a set of
            characters you want to match. eg "[trj]une" matches
            "tune", "rune" and "june" but not "dune".
      +     to specify a set of characters to exclude, wrap them
            in square brackets, and begin the string with a caret
            (^). eg "[^rj]une" would return "tune" but not "rune"
            or "june".
      +     to search for one of the regex special characters (. *
            [ ] ^ $), put a backslash (\) in front of it. eg
            "temp\$" searches for "temp$".

      "sub" looks for any string containing your query within it. Case
      is ignored. Most useful for general purpose. eg "calc" matches
      "Calculator" and "decalcify".

      "subcase" matches anything containing exactly your query in it. eg
      "calc" matches "decalcify" and "calculator".

set match_domain [pattern]
      Only look on servers matching the domain specified. To limit the
      search to Canadian servers, try "set match_domain ca".

set match_path [pattern]
      Limit the directories searched to match "pattern".

set sortby [filename, hostname, none, size, time]
      Sorts the output according to one of the above. "none" is usually
      the default. You can also reverse the sort by prefixing the string
      with the letter "r". eg "set sortby rfilename" sorts the results
      in reverse by filename.

list
      By itself, shows the list of archie servers in the database on the
      current server. Just in case you're curious. You can practise your
      regex usage here too. eg "list .*\.ca.*" would list matches to
      ".ca".

servers
      Show the list of archie servers known to the site you are currently on.

Other interesting sights / sites

There are a number of starting points available to find software, some of them including:

ArchiePlex
WWW Archie Servers
ftpsearch

And for a list of software collections you might try my list.

Wrap it up

Whew! That was quite a lot of stuff to digest. I never told you this would be easy. But remember, you don't HAVE to know all this stuff - but it's here if you need it.

As mentioned last column, I'll be covering the topics of file compression and viruses soon. In the meantime have fun. Happy trails and bcnu ...


Top of Page

Amer Neely