Whee! I've just had perl.com accept an article proposal I sent in a couple of months ago. The technical details are pretty straightforward. When working with text, it's often the case that we need to extract information from that text but the text itself is horribly complicated and there's no guarantee that a parser for that text is available. For example, consider the following SQL (this is from an actual problem someone had):
select the_date as "date", round(months_between(first_date,second_date),0) months_old ,product,extract(year from the_date) year ,case when a=b then 'c' else 'd' end tough_one from XXX
The programmer had a bunch of SQL similar to this but which had been automatically generated by code that a bunch of programmers had written, all with slightly different styles. Assuming that you can't actually execute the SQL, how do you extract a list of the column aliases? If you're familiar with Perl, you might think to use SQL::Statement (which uses SQL::Parser). However, the parser doesn't understand case statements. So how do you get the table aliases?
Writing a huge regex is complicated and likely to be inflexible and difficult to maintain. I'll be writing about a technique called lexing which is easy to understand and makes extracting that information trivial.