awk awk awk

awk Fundamentals

To awk, a text file is a series of records - each line is a record. So if you need to deal with any kind of formatted text files, this may be of interest to you.

Records are separated into fields (default field delimiter is space character) indexed numerically from 1. Each 'word' is a field.

An awk command is made of pattern {action} statements (sometimes called rules or a conditional filter).

The pattern is used to match part of a record (line) and if matched, the action is taken.

The default 'action' is print.

Since awk is a line-oriented command-line program, we will be doing all of our programming in a shell window. If you run a Windows machine you will need to download and install your own. Gawk is a popular and free version. Otherwise Mac and Linux computers typically have a version already installed.

We can also write scripts in awk much like a shell script or perl script. But for now we will begin with simple command-line 'one-liners'.

awk has a number of built-in variables that can be used in scripts or one-liners. Here are some of the most common ...

And since awk is used for text processing, we will need a file to work with. Here's a list of possible co-workers and their phone extensions ...

Marty:5687
Sarah:7878
Bill:9951
Gerry:8520
Jerry:6310
Valery:7774

Copy & paste that text into a file and name it 'names.txt'. Be sure to use a regular text editor NOT a word processor. Save it somewhere, then open your shell window and CD to that directory.

To give you an idea on how to construct an awk command, here is how we would print a particular extension to the screen. Just type (copy & paste) the command below at your shell prompt and hit Enter to find Gerry's extension.

awk '{FS = ":"} $1=="Gerry" {print $2}' names.txt

If all went well, you should simply see '8520' displayed on your monitor, which of course is Gerry's extension. Here's how it works ...

The FS (Field Separator) in this file is a colon ' : ', so we need to tell that to awk. That's what we do in the first set of curly braces.

Recall that awk commands are a 2-part affair: [pattern] [action]. In this case $1 is the pattern to match (co-worker's name). This is the first field in the record.

The equality operator '==' tests whether there is a match or not. Be sure to enter 2 '==' signs - '=' by itself is the 'assignment' operator.

And since awk is now looking for a colon as a field delimiter, each record (line) has 2 fields.

The second field would be the extension, and that is what we want as the 'action': print $2.

If a pattern is missing the action is applied to each record (line).

If an action is missing, the default is to print each record (line).

So, our 'pattern' is 'Gerry', which we test against the first field: '$1'. If '$1' matches our pattern , the action we want is to print the 2nd field ($2).

Well done!