Monday, February 18, 2008

Nifty Regular Expressions (RegExp)

I personally use TextPad as my default text editor. There are probably some much fancier ones out there, but TextPad has all the features that I demand, like great regExp support, vertical cut/paste, and a few other nifty features. It is free to download, and only something like $15 to buy. Well worth it. After all, that is not much more than that venti Caramel Macchiato that you normally order. And TextPad has great help on regular expression. I strongly encourage you to read their regExp help. And by the way, if you try TextPad, then I would suggest that you first change these 2 settings:
1) Under Preference/General, set Context Menu, so you can quickly send any file to TextPad
2
) Under Preference/Editor, set Microsoft compatible.
3) Under Preference/View, set line numbers.
4) Under Preference/Assoc Files, add any file type you wish to open with TextPad (e.g. .txt)

OK, back to our scheduled programming.
Below are a few tricks I've come to appreciate with regular expression. I do expect the reader to have basic familiarity with regexp (^ means beginning of line, $ means end of line, . means any character, etc)

Problem: Remove all lines containing a certain text string.
Find: ^.*FINDME.*\n
Replace:
Explained: Locate the string you want, and select the entire line (including the newline character). Simply replace with nothing. Note that if you chose $ instead of \n, then you would end up with a lot of blank lines instead, but assuming that you want to completely remove these lines, you need to use \n at the end.

Problem: Insert line numbers in front of all lines
Find: ^
Replace: \i(100,10)\t
Explained: Find beginning of line, replace with a numeric counter starting at 100 and incrementing by 10 followed by a tab. So now you'll have a first column with numbers 100,110,120, etc. in it. By the way, \i by default starts at 1 and increments by 1.

Problem: You have a text files with dates in a DDMMYYYY (day, month, year) format in column 1 and you would like to quickly convert them over to an YYYYMMDD format. This is quickly done using regular expression.
Find: ^\(..\)\(..\)\(....\)
Replace: \3\2\1
Explained: Create 3 match sets of 2, 2, and 4 characters respectively from the beginning of the line. Simply put the 3 match sets in the desired order. If you are unfamiliar with match sets, then \( and \) define each set and \1 refers to the first set, \2 to the second, etc. Please note that if you use POSIX style regexp then you do not need to escape the parentheses (i.e. use (..) instead of \(..\) ) to create the match sets.

Nifty

3 comments:

Jack said...

Useful post. Yes I am aware of this helpful feature that increases the typing speed and spelling accuracy. I will try this feature on the email signature that I have created.
digital signatures

praosv said...

\i(100,10)\t is not incrementing the line numbers by 10, but adding 10 for every file number like '1100,10';'2100,10';'3100,10'; etc.,
I have tried with textpad. So, what exactly the string should be.

Thomas Gemal said...

Sorry, but the \i syntax is TextPad specific (and I have since migrated to NotePad++). Can't see an easy standard substitute for the \i so you'll have to Google it. Good luck. And please feel free to post the answer here in a comment.