Friday, July 24, 2009

Perl: Explanation of your regular expression


Share at Facebook

Perl CPAN has a module called "YAPE::Regex::Explain" which will help you to understand what is a perl regular expression doing.

Lets say you have a Regex [Za-z0-9]+ Here is the sample code if you want to get the explanation of your regex.

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new('[A-Za-z0-9]+')->explain;


This simple code will output as below on your screen, and which is easily understandable to get what is going on.

The regular expression:

(?-imsx:[A-Za-z0-9]+)

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
[A-Za-z0-9]+ any character of: 'A' to 'Z', 'a' to 'z',
'0' to '9' (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------



Lets try with a bit complex regular expression ST[^;]*(?:[A-Za-z0-9]+|[\.,\s]+)ET

The code for explaining for REGEX will be,

use YAPE::Regex::Explain;
print YAPE::Regex::Explain->new('ST[^;]*(?:[A-Za-z0-9]+|[\.,\s]+)ET')->explain;


Output is as below, and which is damn understandable.

The regular expression:

(?-imsx:ST[^;]*(?:[A-Za-z0-9]+|[\.,\s]+)ET)

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
ST 'ST'
----------------------------------------------------------------------
[^;]* any character except: ';' (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[A-Za-z0-9]+ any character of: 'A' to 'Z', 'a' to
'z', '0' to '9' (1 or more times
(matching the most amount possible))
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
[\.,\s]+ any character of: '\.', ',', whitespace
(\n, \r, \t, \f, and " ") (1 or more
times (matching the most amount
possible))
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
ET 'ET'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------


Running the code from Command Line:

If you don't like to write a code for this, you can do it from your command line. Just one liner perl script. For above expression the code will be as below.

perl -MYAPE::Regex::Explain -e "print YAPE::Regex::Explain->new('ST[^;]*(?:[A-Za-z0-9]+|[\.,\s]+)ET')->explain;"


How to install the module:

To install the module from your command prompt,
ppm install YAPE::Regex::Explain

And then type
ppm install YAPE::Regex

This installation can be done using cpan too.




2 comments:

mithaldu said...

Thanks for writing this up. It looks insanely useful for creating automatic documentation for complex expressions.

oylenshpeegul said...

Note that to install it from CPAN, you have to install YAPE::Regex first and then YAPE::Regex::Explain. YAPE::Regex::Explain depends on YAPE::Regex, but it's not packaged correctly, so it simply fails without it.