Mechanical

Bad start

You know you've having a bad start to the day day when your hungover self is doing laundry and you pull a wine glass out of the washer.

Fortunately, this was prior to the washing and it hadn't broken, but that could have been a disaster.

And to relax while fending off my hangover, I eliminated the bug in a regular expression I was working on. Believe it or not, these are much easier to write than you would think. The trick is to build smaller regexes and combine them into larger ones. Read Jeffrey Friedl's book "Mastering Regular Expressions" and you'll understand how the trivial the following really is, despite how intimidating it looks.

(?x-ism:
    ((?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b)))
    \s+
    ((?-xism:(?:=|is|[<>]=?)))
    \s+
    ((?x-ism:
    (?-xism:(?:(?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))
        \s*
    )*
)\s*(?-xism:\)))))
    \s*
    (?:
        (?-xism:[-+*/%])
        \s*
        (?-xism:(?:(?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?:(?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))|(?-xism:(?-xism:\()\s*(?x-ism:
    (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
    \s*
    (?:
        (?-xism:[-+*/%]) 
        \s*
        (?-xism:(?!\.(?![0-9]))(?:$RE{num}{real}\b|\b(?-xism:[[:upper:]][[:alnum:]_]*)\b|\b(?-xism:_)\b))
        \s*
    )*
)\s*(?-xism:\)))))
        \s*
    )*
)\s*(?-xism:\)))))
        \s*
    )*
))
    (?=[,.])
)
  • Current Mood: geeky geeky
It's pretty, but...
...really, far too readable. That would never get you anywhere in an obfuscated Perl contest. ;)

Of course I realize this was not your aim. I'm just sort of obviously trying to provoke you into blinding us all with your mad regex skills. Or perhaps I should write a regex to obscure your regex?

Is this getting too meta? Do I need to get some sleep?
Re: It's pretty, but...
no, cpan really needs a regex obfuscator. if javascript can have a built-in code obfuscator, perl can have a regex obfuscator.

All I ask is that it have an option to run the resultant regex through Acme::Bleach. :)
That sounds more like my life than what I know of yours.

Although my washer would have had a cocktail glass in it.
oh dear lord....i DO wish i had a clue what that says! :P

when you gonna download the pics?? i should have mine back by Wed.

you know, that regexp would be far scarier if you eliminated the usage of Regexp::Common and expanded those $RE{num}{real}. :)

one question, however: wtf are you matching against? :)
Actually, when I printed the final regex, it didn't have $RE{num}{real}. I used vim to put that in since otherwise there was waaaaay too much horizontal scroll.

And what I'm matching against is a subset of Prolog (the math, in this case). I've realized that the easiest way to extend the grammar in AI::Prolog is to not extend it. Instead, I have created a pre-processor that finds the bits my current parser doesn't understand and it rewrites them:

X is 3 * (2 + Y). % becomes: is(X, mult(3, plus(2, Y))).

It gets pretty bizarre, though. Eventually I want to use my pre-processor to handle what are known as "DCGs", or Declarative Clause Grammars. These are conceptually similar to Perl 6 grammars, but they're built into Prolog. My current parser can't handle them, but they're really just syntactic sugar around something called difference lists, which in turn can be represented by predicates that my parser can handle. Which means that I'd be writing a mini-grammar to parse a subset of Prolog that handles parsing and then handing that off to my parser. Got that?

I think my brain just went flat.
I wish this extra piece had been in the post, but I'm glad I ran across it anyways.

In my office, I'm generally the regex guru, but my skills pale in comparison to your amazing ability there. I understand it for the most part, but think it'd take me forever to write.

I really need to pick up MRE.
Seriously, go look at the use.perl.org journal entry I made about this. The code there is the precursor to what you see above (and what you see above is not the final form.) Once you see how I build it, you'll understand how it's much easier than it looks.

I should really put together a talk about this.
You're right, that's really not bad at all. It's just funny to look at it in the final form above and try to read it then. It was quite amusing.

You *should* put together a talk about it, it'd be really interesting. I loved Prolog when I was exposed to it, but never delved deeply into it, so I always quite enjoy postings of your relating to your Prolog work in Perl.
I tried to pass this through "perl -MO=Deparse" with no luck. Dang.

Since each part seems to start with "?-xism:", wouldn't it be cleaner to split(/?-ism:/) and parse the bits?

Tom
Actually, I would be surprised if Deparse would show anything there.

The various parts don't start with ?-xism:. That's one way of embedding final switches into a regex in order to limit their effect. If you build the regex programmatically and print it out, you see those:

$ perl -le 'print qr/foo/'
(?-xism:foo)