awk awk awk

Field Separators

One thing awk is good at is finding patterns in strings. A lot of the text you might have to deal with is formatted into fields or columns separated by some character. These fields are called records. The separating character/s (called Field Separators or FS) can be anything you define, but typically are a colon, semi-colon, tab, or space.

I like to use the pipe symbol, or vertical bar (|) because it is a character not usually found in regular text.

For example, if you have some text using the pipe symbol '|' as a delimiter or Field Separator, how do we indicate the FS?

To define or indicate a FS, we use the -F option like this:

awk -F"|"
or
awk -F'|'

Note that we can use either a double-quoute or a single-quote in this case. So to run awk against a text file ...

bufar.txt =
123|abc
456|def

How can we pull out just the text on either side of the FS (Field Separator)?

Typically we might try...

awk -F"|" '{print $1}' bufar.txt
abc
def

Voila! That also works with single quotes.

But, what happens when the FS is 2 pipe symbols (||)?

For that we need to use the Escape character \, one of several 'metacharacters'.

awk -F"\|\|" '{print $1}' bufar.txt
awk: warning: escape sequence `\|' treated as plain `|'
abc||123
def||456

Hmmm. No workie. Single quotes don't work either. The warning gives us a clue.

awk -F'\\|\\|' '{print $1}' bufar.txt
abc
def

We had to use 2 escape characters - one escapes the escape character.

Excellent. But be careful if you use double-quotes ...

awk -F"\\|\\|" '{print $1}' bufar.txt
awk: warning: escape sequence `\|' treated as plain `|'
abc||123
def||456

We have to prepend the FS with the escape character. This tells awk that the pipe symbol is a literal character, not a command character.

It turns out the back-slash itself has to be escaped when enclosed in double-quotes!

awk -F"\\\\|\\\\|" '{print $1}' bufar.txt
abc
def

This should get you started on defining and using Field Separators. Good luck and play!