Category: regex

  • In praise of Regexp::Assemble

    …and of the Perl modules in general. I had the following problem: Given a list of 16 character alphanumeric IDs, find all the lines from a large-ish (~6GB) logfile which contain at least one of the IDs. The naive approach was to construct a big regular expression like W(QID1E|QID2E|QID3E…)W and match it against every line…

  • RegEx which matches strings not containing a substring

    This is an interesting problem which can appear in certain cases (although not very often). A little searching around led me to many posts stating that there is no easy solution and the following easy solution: ^((?!my string).)*$ It works as follows: the matching string must contain zero or more characters which are not preceded…

  • Optimizing regular expressions with PHP

    I was intrigued by the following text in the PHP reference, especially because there is considerable regex use in the wehoneypot project: S When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is…

  • Alternative regular expression syntax

    For a long time I was a believer in the “Perl way” of doing regular expressions and an avid reader of perlre. All other implementations I viewed as a “poor man’s copy” of the one true idea. However, after reading the Lua Patterns Tutorial, I found it quite enlightening. Even though it is called “patterns”…

  • Javascript regex quirk

    When I’ve written the SMOG analyzer javascript I found a quirk of javascript and this recent post inspired me to share it: The javascript regex specification doesn’t have the s modifier. This is necessary when you want to match multiple lines with a construct like .*. The suggested workaround I found was to specify the…

  • The big java regex shoutout

    I discovered recently that the built-in java regex library has problems with some expressions, so I set out to find alternatives. Searching for regex benchmarks, I found the following page: Java Regular expression library benchmarks (it also has an older version). The original IBM article also contains a benchmark. However both of these resources are…

  • Regular Expressions in Java

    I was wondering why the gnu.regexp package exists, when Java already includes libraries for it. One thing I can think of is the fact that they’ve been added only in 1.4. During searching around I found some surprising facts about the built-in regex libraries (the site goes up and and down, so here is the…

  • Regex magic

    First of all I want to apologize to my readers (both of them :-)) for bein AWOL, but real life sometimes interferes pretty badly. I always been a big fan or regular expressions and one of the main reasons I love Perl is because they are so deeply integrated in it and are natural to…

  • Input validation

    The month of PHP bugs is over, but you should still watch the PHP-Security blog, since there are good things coming from there, like this article: Holes in most preg_match() filters. Go read it if you are using regular expressions for input validation. Two tips to avoid these pitfalls: Cast your input to the datatype…

  • Moving to Ubuntu – The Regex Coach

    After reaching 21 posts and caching up with the Security Now! episodes, I thought that it’s time to start a new series. I am what I consider a pro Windows user and lately I started moving to Ubuntu. I toyed with Linux distros before, but this is the first I feel that I can learn.…