-
In praise of Regexp::Assemble
…and of the Perl modules in general. I had the following problem: Given a list of 16 character alphanumeric IDs, find all the lines from a large-ish (~6GB) logfile which contain at least one of the IDs. The naive approach was to construct a big regular expression like W(QID1E|QID2E|QID3E…)W and match it against every line…
-
RegEx which matches strings not containing a substring
This is an interesting problem which can appear in certain cases (although not very often). A little searching around led me to many posts stating that there is no easy solution and the following easy solution: ^((?!my string).)*$ It works as follows: the matching string must contain zero or more characters which are not preceded…
-
Optimizing regular expressions with PHP
I was intrigued by the following text in the PHP reference, especially because there is considerable regex use in the wehoneypot project: S When a pattern is going to be used several times, it is worth spending more time analyzing it in order to speed up the time taken for matching. If this modifier is…
-
Alternative regular expression syntax
For a long time I was a believer in the “Perl way” of doing regular expressions and an avid reader of perlre. All other implementations I viewed as a “poor man’s copy” of the one true idea. However, after reading the Lua Patterns Tutorial, I found it quite enlightening. Even though it is called “patterns”…
-
Javascript regex quirk
When I’ve written the SMOG analyzer javascript I found a quirk of javascript and this recent post inspired me to share it: The javascript regex specification doesn’t have the s modifier. This is necessary when you want to match multiple lines with a construct like .*. The suggested workaround I found was to specify the…
-
The big java regex shoutout
I discovered recently that the built-in java regex library has problems with some expressions, so I set out to find alternatives. Searching for regex benchmarks, I found the following page: Java Regular expression library benchmarks (it also has an older version). The original IBM article also contains a benchmark. However both of these resources are…
-
Regular Expressions in Java
I was wondering why the gnu.regexp package exists, when Java already includes libraries for it. One thing I can think of is the fact that they’ve been added only in 1.4. During searching around I found some surprising facts about the built-in regex libraries (the site goes up and and down, so here is the…
-
Regex magic
First of all I want to apologize to my readers (both of them :-)) for bein AWOL, but real life sometimes interferes pretty badly. I always been a big fan or regular expressions and one of the main reasons I love Perl is because they are so deeply integrated in it and are natural to…
-
Input validation
The month of PHP bugs is over, but you should still watch the PHP-Security blog, since there are good things coming from there, like this article: Holes in most preg_match() filters. Go read it if you are using regular expressions for input validation. Two tips to avoid these pitfalls: Cast your input to the datatype…
-
Moving to Ubuntu – The Regex Coach
After reaching 21 posts and caching up with the Security Now! episodes, I thought that it’s time to start a new series. I am what I consider a pro Windows user and lately I started moving to Ubuntu. I toyed with Linux distros before, but this is the first I feel that I can learn.…