⌘ Web Mechanic ⌘

Bash Scripting

User Input & Validation

Getting input from a user is one of the most useful (and risky) things programmers do.

It is risky, because whenever you ask the user for some input,
you are giving control of your script over to the user.

This is where the process of validating input comes in.

Serious data validation is beyond the scope of this site, but we will cover some basic user input to get you started.

This site is designed for a single user doing scripting at home, so we will continue on that premise.

You may be just curious about bash programming, or you may definitely require to learn more about it. For example, I use bash scripts to enter data into several personal databases I maintain at Web Mechanic.

If you are the only person entering text into your scripts and running them, then you should be fine.

read

Using echo and printf are one half of the system I/O facility. This is the other half: getting user input.

Getting input from a user is a common task, so with no further ado, let's get at it.

You need to get a birth date from a user in your script. Here is one way.

As always, our scripts begin with the shebang and location of the bash shell:

#!/usr/local/bin/bash
printf "%s" "Enter your date of birth (yyyy-mm-dd):"
read BirthDate
printf "%s" "Is $BirthDate correct? (y/n):"
read BirthResponse
echo $BirthResponse
echo
read -s -p "Please enter your password: " passwd
echo
printf "%s""\nI think your password is -->$passwd<--\n"

Here we've asked for a date of birth, and assigned it to the variable BirthDate.

Then asked for a y or n response, and assigned it to the variable BirthResponse.

Lastly, we've asked for a password, and assigned it to passwd.

Two useful options to the read command are -s and -p, which we used asking for a password.

The -s does not echo the response to the screen (silent), so anyone looking over your shoulder won't know what you're typing.

And -p prints the prompt before reading the input.

User Input Validation 01

But we haven't checked whether the user entered y or n (or something else) for the BirthDate.

One test we can do is check whether a variable contains anything or is empty.

Note that it DOES NOT check whether the input is ASCII text or not
- just whether ANYTHING has been entered.

#!/usr/local/bin/bash
# input2.sh
# test input for content
VAR="$1"
if [ "$VAR" ]
then
	echo Not empty
else
	echo Zero length;echo Please enter something
fi

Copy and paste type the above code into a new file in your text editor, and save to your ~/bin/bash directory.

Setting VAR to "$1" is looking for an argument when you run the script.

You can run the script with an argument in double quotes, or with nothing. Do both to see what happens.

./input2.sh
Zero length
Please enter something

./input2.sh "There is some text in here somewhere"
Not empty

In the then block of code, you can provide whatever code is suitable for the situation.

And the else block of code requires your own decision as well.

A rather simple test, but it may be all you need. More complex testing is possible though.

Input From a File

Getting input from a user via the keyboard is 1 way of getting data, but much of scripting depends on doing something with data that exists already in a file.

For example we may have

a log file from an application or system

a spreadsheet file (.csv)

a few hundred text files

source code for a program

Another source of input may be from an operation or command we are running in Terminal, and we want to perform some operation on it. More about this in Mathematics and Redirection.

As long as the file is made of pure ASCII text we can do many things with the content.

We can get input from a text file a few ways. We can redirect it into our script or command.

We have a file of track names and play times from a CD:

01 - Vira Poeira Burning To Dust.flac|0:04:47
02 - Come As You Are.flac|0:04:46
03 - 700 Years.flac|0:04:39
04 - After These Messages.flac|0:03:03
05 - Street Vendors D'JMBO.flac|0:05:03
06 - Wake Up Now.flac|0:05:03
07 - Homeless Around The Fire.flac|0:06:19
08 - Samba 4 Sale.flac|0:06:59
09 - Ginga Sem Fronteira.flac|0:04:42

Copy and paste that text into a new text file named 'track-times.txt' in a directory. Then 'cd' to that directory if you aren't already there.

In Terminal, doing a 'cd' to another directory (perhaps on another drive) can involve quite a bit of typing, which is an excellent source of errors. This is where I use a little 'magik'.

In the Finder window, open it to the directory you want to 'cd' to.

Then in the Terminal window, enter 'cd ' - make sure there is a space after 'cd'.

Then in Finder, drag the icon of the target directory into the Terminal window, and drop it after the 'cd ' you just entered.

Now hit 'return' / 'enter' in Terminal, and it will do it for you.

A handy use of drag-and-drop.

We see that the running times of each track are separated by the pipe | character. Using that, we can write a script to separate them from each line, and write them to a new file.

In this case, the pipe symbol is merely a character in some text, and not a metacharacter.

Now to write the script...

You are encouraged to actually type all of this code in,
not copy-pasting into a new script.

#!/usr/local/bin/bash
# split text into 2 on the '|'
clear
if [ -z "$1" ]
then
    echo 'Please give me a filename to work on eh!'
    exit 0
fi

[ -n "`file $1|grep text`" ] && { # is it a text file?
    echo "$1 is a text file"
} || { # or
    echo "$1 isn't a text file"
    exit 0
}

# if we get this far, it's a text file
InFile=$1

echo "Complete lines:"
cat $InFile;echo # print whole file
jot -b '=' -s '' 20  # print '=' 20 times
echo "Tracks only:"
# pipe output to awk with Field Separator of '|'
cat $InFile | awk 'BEGIN{FS="\|" }{print $1}'
jot -b '=' -s '' 20 # print '=' 20 times
echo "Times only:"
cat $InFile | awk 'BEGIN{FS="\|" }{print $2}' # >> times.txt
jot -b '=' -s '' 20 # print '=' 20 times

Well, a few new things in there.
jot is a tool used to write sequential data (usually numbers).
I've used it here to print "=" 20 times.

But it does what we wanted:

track-times.txt is a text file
Complete lines:
01 - Vira Poeira Burning To Dust.flac|0:04:47
02 - Come As You Are.flac|0:04:46
03 - 700 Years.flac|0:04:39
04 - After These Messages.flac|0:03:03
05 - Street Vendors D'JMBO.flac|0:05:03
06 - Wake Up Now.flac|0:05:03
07 - Homeless Around The Fire.flac|0:06:19
08 - Samba 4 Sale.flac|0:06:59
09 - Ginga Sem Fronteira.flac|0:04:42
====================
Tracks only:
01 - Vira Poeira Burning To Dust.flac
02 - Come As You Are.flac
03 - 700 Years.flac
04 - After These Messages.flac
05 - Street Vendors D'JMBO.flac
06 - Wake Up Now.flac
07 - Homeless Around The Fire.flac
08 - Samba 4 Sale.flac
09 - Ginga Sem Fronteira.flac
====================
Times only:
0:04:47
0:04:46
0:04:39
0:03:03
0:05:03
0:05:03
0:06:19
0:06:59
0:04:42
====================

If the output of each step needs to be saved to a file, that can be done as shown in 2nd-last line of the script:

cat $InFile | awk 'BEGIN{FS="\|" }{print $2}' # >> times.txt

recall that '>>' appends output to a file.

Just remove the '#' to write output to the file.

Validation 02

Equality

Many times we want to compare a user input or a variable to an existing value. There are 2 forms of this test:

For string comparison:

= or == matches

!= does not match

< less than

> greater than

-n is not null (has length > 0)

-z is null (has length = 0)

For numeric comparison

-eq equal to

-lt less than

-le less than or equal to

-gt greater than

-ge greater than or equal to

-ne not equal to

Save this code in a new file input3.sh as before.

#!/usr/local/bin/bash
# input3.sh
# test input for equality
MyVar="OICURMT"
VAR="$1"
if [ "$VAR" == "$MyVar" ]
then
	echo "Yep! They're the same"
else
	echo "Nope, try again..."
fi

When you run it with and without any input, you should see ...

./input3.sh
Nope, try again...

./input3.sh "oicurmt"
Nope, try again...

./input3.sh "OICURMT"
Yep! They're the same

To check numeric input:

change the '==' to '-eq'

add MyVar=1337

save and execute.

#!/usr/local/bin/bash
# input3.sh
# test input for equality
MyVar="1337"
VAR="$1"
if [ "$VAR" -eq "$MyVar" ]
then
	echo "Yep! They're the same"
else
	echo "Nope, try again..."
fi

./input3.sh
./input3.sh: line 6: [: : integer expression expected
Nope, try again...

./input3.sh 1733
Nope, try again...

./input3.sh 1337
Yep! They're the same

Validation 03

Binary or Text

An important feature to know about user input, is whether they are entering straight text (ASCII) or binary code, such as a .jpg file.

This may not be a concern if YOU are the one entering the data, but if you are writing code to be run on a different computer, or if others have access to your computer, this would be something to be concerned with.

One command that can be used is file.

#!/usr/local/bin/bash
# whatis.sh
# determine type of file
file "$1"

Recall that $1 refers to the first argument supplied to the script on the command line:

file "WhatKindOfFileIsThis.fubar"

Run the script like this:

/bin/bash/whatis.sh "01 - Omax 1 (Tokyo).flac"
01 - Omax 1 (Tokyo).flac: FLAC audio bitstream data, 16 bit, stereo, 44.1 kHz, 19328736 samples

This may output a few lines of text. I've tested several file formats:

.afpub data
.afdesign data
.ai PDF document, version 1.5, 1 pages
.ape Monkey's Audio compressed format version 3970 with high compression, stereo, sample rate 44100
.asf Microsoft ASF
.avi RIFF (little-endian) data, AVI, 640 x 360, 25.00 fps, video XviD, audio MPEG-1 Layer 3 (stereo, 48000 Hz)
.app directory
.azw3 Mobipocket E-book
.bat ASCII text, with no line terminators
.bz2 bzip2 compressed data, block size = 900k
.cfg data
.cdf ASCII text, with CRLF line terminators
.com COM executable for DOS
.css ASCII text
.csv Unicode text, UTF-8 (with BOM) text
.cue ASCII text, with CRLF line terminators
.db SQLite 3.x database, last written using SQLite version 3047000, file counter 6569, database pages 235, 1st free page 22, free pages 98, cookie 0x41, schema 4, UTF-8, version-valid-for 6569
.dbf amd 29k coff prebar executable
.dll MS-DOS executable, NE for MS Windows 3.x (DLL or font)
.dmg zlib compressed data
.dsf data
.eml HTML document text, ASCII text, with CRLF line terminators
.epub EPUB document
.exe MS-DOS executable
.flac FLAC audio bitstream data, 16 bit, stereo, 44.1 kHz, 5453112 samples
.gif GIF image data, version 89a, 467 x 467
.gsp data
.gz gzip compressed data, was "acr-diag", last modified Wed Oct 22 061219 2003, from Unix, original size modulo 2^32 32560
.html HTML document text, Unicode text, UTF-8 text
.ico MS Windows icon resource
.indd Adobe InDesign Document
.iso ISO 9660 CD-ROM filesystem data (DOS/MBR boot sector) 'Linux Mint 21.2 Cinnamon 64-bit' (bootable)
.jar Java archive data (JAR)
.json JSON data
.jpg JPEG image data, JFIF standard 1.01, aspect ratio, density 144x144, segment length 16, Exif Standard [TIFF image data, big-endian, direntries=4, xresolution=62, yresolution=70, resolutionunit=2], baseline, precision 8, 1330x1208, components 3
.jpeg JPEG image data, JFIF standard 1.01, resolution (DPI), density 72x72, segment length 16, baseline, precision 8, 472x341, components 3
.js ASCII text
.log ASCII text, with CRLF line terminators
.m3u ASCII text, with CRLF line terminators
.mobi Mobipocket E-book "Practical SVG", 509033 bytes uncompressed, version 6, codepage 65001
.mod MS-DOS executable PE32 executable (console) Intel 80386, for MS Windows, MZ for MS-DOS
.mov ISO Media, Apple QuickTime movie, Apple QuickTime (.MOV/QT)
.mp3 Audio file with ID3 version 2.4.0, contains MPEG ADTS, layer III, v1, 192 kbps, 44.1 kHz, Stereo
.mp4 ISO Media, MP4 Base Media v1 [ISO 14496-122003]
.mpeg MPEG sequence, v1, system multiplex
.mpg MPEG sequence, v1, system multiplex
.mkv Matroska data
.numbers Zip archive data, at least v2.0 to extract, compression method=store
.odb OpenDocument Database
.odf OpenDocument Formula
.odg OpenDocument Drawing
.odp OpenDocument Presentation
.ods OpenDocument Spreadsheet
.odt OpenDocument Text
.pages Zip archive data, at least v2.0 to extract, compression method=store
.pdf PDF document, version 1.4, 0 pages
.pl Perl script text executable
.pm Perl5 module source text, ASCII text, with CRLF line terminators
.png PNG image data, 1410 x 1395, 8-bit/color RGB, non-interlaced
.pps Composite Document File V2 Document, Little Endian, Os Windows, Version 5.1, Code page 1252, Title Diapositive 1, Last Saved By peglehn, Revision Number 2, Last Saved Time/Date Tue Apr 25 150930 2006, Number of Words 99
.ppt Composite Document File V2 Document, Little Endian, Os MacOS, Version 10.3, Code page 10000,
.prj ASCII text, with no line terminators
.rb Ruby script text executable, ASCII text
.sh Bourne-Again shell script text executable, ASCII text
.shp ESRI Shapefile version 1000 length 19302 type Polygon
.shx ESRI Shapefile version 1000 length 850 type Polygon
.sql Unicode text, UTF-8 text
.srt Unicode text, UTF-8 (with BOM) text, with CRLF line terminators
.svg SVG Scalable Vector Graphics image
.sys DOS executable (character device driver,close media-,control strings-support)
.tar POSIX tar archive
.tif TIFF image data, little-endian, direntries=14, height=3353, bps=182, compression=none, PhotometricIntepretation=RGB, width=2551
.txt Unicode text, UTF-8 text
.webarchive Apple binary property list
.wma Microsoft ASF
.wmv Microsoft ASF
.xls Composite Document File V2 Document, Little Endian, Os Windows, Version 1.0, Code page -535,
.zip Zip archive data, at least v1.0 to extract, compression method=store
.7z 7-zip archive data, version 0.4

With this knowledge, we should be able to construct a test to determine what to do with that file.

It seems that the files that really are ASCII / text have 'text' in the response. SVG and JSON files are also plain text.

Note that '.odt Open Document Text' files are NOT plain text or ASCII, even though this result says so.

By accident I also had a '.cue' named as a '.que' file, but the response was as though it were named correctly. This leads one to believe that the file program is not merely looking at file extensions.

So we can check for 'text' in the response, or if the file extension is .svg or .json we can assume they are ASCII as well.

If you deal with files other than those above, your mileage may vary, so be aware.

This is under the assumption that you are working on your computer with input provided by you.
Assuming ANYTHING provided by a user is safe, is not a best practice.

In a directory (test) I have several files of various formats:

Bruce Katz Band - Get Your Groove.cue
conway-life-glider.svg
cover.jpg
ffmpeg-115034-ge09164940e.7z
FileZilla_3.68.1_macos-x86.app.tar.bz2
id_rsa.1684544711.pub
Joel Ross - early.flac
linuxmint-21.3-cinnamon-64bit-edge.iso
platypus.zip
Plex-1.6.5.1097-3bb9dc68-x86_64.zip
Plex.app
redirect.pages
ruby2.rb
shotcut-macos-241117.dmg
UnicodeData.txt

Adding a bit of magic to the whatis.sh script:

#!/usr/local/bin/bash
# whatis.sh
# determine type of file

file test/* | grep 'text'

Making a pipeline with grep, we accomplish in 1 line of code several steps. When I run the script, results are:

whatis.sh
test/Bruce Katz Band - Get Your Groove.cue:  ASCII text, with CRLF line terminators
test/UnicodeData.txt:                        ASCII text
test/ruby2.rb:                               Ruby script text executable, ASCII text

This goes a long way to helping us determine what kind of file we might be working with.

At this point we would need to check each response to see if contains the string 'text' in lower case.

I will leave this an exercise for the reader.