If you have done any programming or read about other languages, you may have come across something called a regular expression or regex.
Regular expressions are not something you throw together. They are built carefully, one step at a time, with patience and attention.
You might use them for searching log files or configuration files.
Or you have a huge list of references or citations (as a text file) and want to find all references to a particular author, but you're not quite sure of their last name.
Maybe you have a web site asking for a phone number and want to confirm it's in a particular format before sending it to the server - or BEST PRACTICE: checking it once it hits the server.
The Human Genome project used Perl extensively due to its powerful regular expression use.
'regular' is somewhat misleading though. Here is what a common regex might look like:
By the end of this section you *might* be able to decipher that, or I could tell you now:
This regex matches a line that starts with a word-character key, followed by an equals sign, and then a value that can be a single line or can be continued across multiple lines using a backslash followed by a newline. The value itself cannot contain unescaped newlines or backslashes. Really.
regex are a special way of describing a particular text pattern. They can be quite complex, but therein lies their power. Whole books have been written abut regular expessions. In fact, it IS a mini programming language of its own.
I first came across them in my Perl programming days, so was curious how JavaScript treated them. In fact, Regular expressions are used in many programming languages, and JavaScript's syntax is inspired by Perl. Hopefully I can explain and show you examples that give you a good starting point to them.
Regexes in JavaScript have a slight variation from the standard regex. In JavaScript a literal regex is the pattern surrounded by front slashes, like this: /hello/.
Before we begin building regexes, we need to grok some fundamental concepts:
Others:
JavaScript regular expressions can be built 2 ways:
Note that using the constructor method we DO NOT use the front slash delimiters.
var myPattern;
myPattern = RegExp("t{3}"); // 'ttt'
myPattern = RegExp("(b|t){4}"); // bbbb or tttt
myPattern = RegExp("456"); // 456 anywhere
myPattern = RegExp("(b|t)ers"); // b or t followed by 'ers'
myPattern = RegExp("ers"); // 'ers' anywhere
myPattern = RegExp("ote.$"); // 'ote.' at the end
const myInput="This line contains numbers (0123456789) and some letttters (AbCDfgHiImnoPqR) and a cat and a coyote.";
Sprint(myInput+'<br/>');
Sprint("Looking for "+myPattern+" ...<br/>");
if (myInput.match(myPattern)) {
Sprint('Yes! Found your pattern.<br/>');
}
else {
Sprint("Sorry, couldn't find "+myPattern+".<br/>");
}
Play with the above code in your own file, commenting out various 'myPattern' assignments, to see the results for yourself.
Remember the last variable assignment is the one being executed.
As mentioned, regexes let you define a pattern which can then be used to find a possible match in some text.
Starting with the following 'expressions', we can already do quite a bit: (NOTE: case is important)
Any time you need to do some searching or replacing of text, regex may be what you have to use. It's the simple 's/this/that/' on steroids.
First we have to learn some grammar. Here are some of the punctuation characters that have special meaning in regular expressions:
You may recognize some of those from other languages (perl, bash) and what they represent. But just to be sure we'll list them here.
These provide a way to specify an exact range of characters to match.
Within the brackets, indicate precisely which characters, digits, or punctuation you want to match.
These are simply how many times a particular pattern may match a target string.
Typically indicated thus: {min,max}
The tools grep or egrep use regex exclusively - grep stands for General Regular Expression Print.
Some Examples