
Continuing our tour of regular expressions, properties are a very expressive and detailed part of the beastiary.
Properties are available by using the \p{PROP} and its complement \P{PROP}.
Perl closely follows the Unicode Standard, and new properties are added to Perl when they appear. And as if that weren't enough, you can even design your own.
The alpha and word properties each cover over 100,000 characters!
Here are the one-character categories of the Unicode General collection:
| Short Property | Long Property | Meaning |
|---|---|---|
| C | Other | Crazy control codes etc. |
| L | Letter | Letters & ideographs |
| M | Mark | Combining marks |
| N | Number | Numbers |
| P | Punctuation | Punctuation marks |
| S | Symbol | Symbols, signs, & sigils |
| Z | Separator | Separators (Zeparators?) |
There are about 30 'finer' properties for the above 'general' properties, but we won't cover them here. For that you can link to the perlunicode reference.
Now, I know you're going to need an example of this (I sure did), so ...
$text = "Isn't this is a long stretch of 'characters' containing some vowels and consonants, with 104 characters?"; print "\n$text\n"; $text =~ s/[\pP\pN]/*/g; print "$text\n\n";
![]()
We replaced the punctuation marks (\pP) and digits (\pN) in $text with asterisks.
(Great comic btw :)