JavaScript Scratches

Regular Expressions

regex

If you have done any programming or read about other languages, you may have come across something called a regular expression or regex.

A 'regular expression' is a series of characters that define a pattern that represents what you are looking for.
They can ONLY be used for searching TEXT (ASCII characters).

Regular expressions are not something you throw together. They are built carefully, one step at a time, with patience and attention.

You might use them for searching log files or configuration files.
Or you have a huge list of references or citations (as a text file) and want to find all references to a particular author, but you're not quite sure of their last name.
Maybe you have a web site asking for a phone number and want to confirm it's in a particular format before sending it to the server - or BEST PRACTICE: checking it once it hits the server.
The Human Genome project used Perl extensively due to its powerful regular expression use.

'regular' is somewhat misleading though. Here is what a common regex might look like:

^\w+=[^\n\\]*(\\\n[^\n\\]*)*

By the end of this section you *might* be able to decipher that, or I could tell you now:

This regex matches a line that starts with a word-character key, followed by an equals sign, and then a value that can be a single line or can be continued across multiple lines using a backslash followed by a newline. The value itself cannot contain unescaped newlines or backslashes. Really.

regex are a special way of describing a particular text pattern. They can be quite complex, but therein lies their power. Whole books have been written abut regular expessions. In fact, it IS a mini programming language of its own.

I first came across them in my Perl programming days, so was curious how JavaScript treated them. In fact, Regular expressions are used in many programming languages, and JavaScript's syntax is inspired by Perl. Hopefully I can explain and show you examples that give you a good starting point to them.

Regexes in JavaScript have a slight variation from the standard regex. In JavaScript a literal regex is the pattern surrounded by front slashes, like this: /hello/.

regex 00

Before we begin building regexes, we need to grok some fundamental concepts:

Others:

regex 01

JavaScript regular expressions can be built 2 ways:

Note that using the constructor method we DO NOT use the front slash delimiters.

var myPattern;
myPattern = RegExp("t{3}"); // 'ttt'
myPattern = RegExp("(b|t){4}"); // bbbb or tttt
myPattern = RegExp("456"); // 456 anywhere
myPattern = RegExp("(b|t)ers"); // b or t followed by 'ers'
myPattern = RegExp("ers");   // 'ers' anywhere
myPattern = RegExp("ote.$"); // 'ote.' at the end

const myInput="This line contains numbers (​0123456789) and some letttters (AbCDfgHiImnoPqR) and a cat and a coyote.";
Sprint(myInput+'<br/>');
Sprint("Looking for "+myPattern+" ...<br/>");
if (myInput.match(myPattern)) {

	Sprint('Yes! Found your pattern.<br/>');
}
else {
	Sprint("Sorry, couldn't find "+myPattern+".<br/>");
}

Play with the above code in your own file, commenting out various 'myPattern' assignments, to see the results for yourself.

Remember the last variable assignment is the one being executed.

regex 02

As mentioned, regexes let you define a pattern which can then be used to find a possible match in some text.

Starting with the following 'expressions', we can already do quite a bit: (NOTE: case is important)

.
A single period (dot) matches any single character (letter, number, space, or other symbol
\w
Any word character including a-z, A-Z, the digits 0-9, and the underscore (_)
\W
Opposite of '\w': any character NOT a word character
\d
Any digit 0-9
\D
Opposite of '\d': any character EXCEPT a digit
\s
A space, tab, carriage return, or new line (white space)
\S
Opposite of '\s': anything BUT space, tab, carriage return, new line
^
Begin search at beginning of a string
$
Begin search at end of a string
\b
A space, end or beginning of a string, or any non-letter or not-digit character like +,=, or '.
[]
Match any single character between the brackets. Use hyphen for a range of characters [d-m],[3-7]. Same as \d.
[^]
Match any character EXCEPT one in the brackets. [^aeiou] matches any character that is NOT a lower-case vowel.
|
Match the character or item BEFORE or AFTER the pipe symbol.
\
The escape symbol. Used to indicate a literal character.

regex 03

Any time you need to do some searching or replacing of text, regex may be what you have to use. It's the simple 's/this/that/' on steroids.

First we have to learn some grammar. Here are some of the punctuation characters that have special meaning in regular expressions:

^ $ . * ? = ! | \ ( ) [ ] { }

You may recognize some of those from other languages (perl, bash) and what they represent. But just to be sure we'll list them here.

Character Classes []

These provide a way to specify an exact range of characters to match.

Within the brackets, indicate precisely which characters, digits, or punctuation you want to match.

Quantifiers / Repetition {n,m}

These are simply how many times a particular pattern may match a target string.

Typically indicated thus: {min,max}

regex Tools

The tools grep or egrep use regex exclusively - grep stands for General Regular Expression Print.

Some Examples