Perl logo

Perl Refreshments ⌘

Regular Expressions #1 - Metacharacters / Special Characters

You may have read or heard about regular expressions as something programmers sometimes need. But what are they?

Perl was designed to deal with all kinds of text, and uses regular expressions a lot. When you hear the term, think pattern matching.

Metacharacters are something you will hear about whenever you talk about regular expressions (regexes) or pattern matching. They are the funny-looking symbols you put inside the patterns. These patterns are constructed to search for a particular string (set of characters) inside another string.

Regular expressions are also passed to the string operators m// and s/// for matching and substituting.

Regexes are not regular Perl code, but their own language inside of Perl (and many other languages). If you don't believe me, or wish to know more about them, I recommend the book Mastering Regular Expressions by Jeffrey E.F. Friedl.

As an example, any time you've been asked to enter an email address on a web form, chances are the coder used a regular expression to ensure it at least looks like a legitimate email.

This is an extremely difficult thing to confirm! In the book mentioned above, there is a regex consisting of 6,598 bytes of text to check an email. In fine print it takes up a whole page.

Verifying that it is a real address is a whole other ballgame and beyond this refreshment.

The metacharacters are:


      
\ (backslash)
match a metacharacter /\\/
| (bar)
alternation /me|you/
( ) (parentheses)
grouping (...)
[ ](brackets)
character class match 1 character from a set /[a-zA-Z]/
{ } (braces)
holds a quantity
{m}
match exactly m times
{m,}
match at least m times
{m,n}
match at least m times but less than n timess
{m,}?
match at least m times
{m,n}?
match at least m times but no more than n times
^ (caret)
beginning of a string or after any newline
$ (dollar)
end of a string or before any newline
. (dot)
match one character except newline
* (asterisk)
match 0 or more times
+ (plus sign)
match 1 or more times
? (question mark)
match 1 or 0 times
*?
match 0 or more times
+?
match 1 or more times
??
match 0 or 1 time

In a regex any single character matches itself

The anatomy of a regex consists of several parts:

Special Characters

As you can see regular expressions can get complicated. It is suggested that you practice them using the eval statement to trap errors.

Before you go, here are some special characters that Perl defines (not the Usual Suspects).

NOTE: They MUST be escaped using the back-slash.


Back References

Sarand Business Software
A comprehensive page of regular expression and pattern matching syntax

Regex 2


-30-