Wednesday, January 16, 2008

Command line tools for dummies

Let me start by saying that I receive no kickbacks or any other compensation from any of the products that I mention. I would certainly like to, but I don't :-) So the products mentioned are simply products that I have come to love and simply can't live without in my daily life.

Back in the days at the school of engineering at Santa Clara University, I took a great course in UNIX command line tools (grep, awk, sed, etc), and ever since I have not been able to live without them. I mean how do you quickly view the last few lines of a very large log file without a cool command line tool like tail, or how do you quickly find which XLST file that contains the XML tag named recordIdentifier without using a tool like grep.



So when my career took a turn from a UNIX environment to a Windows environment, I was scrambling to find some cool command line tools for Windows. I found my comfort in the MKS Toolkit, which are all the beloved UNIX command line tools compiled for Windows. It is not the cheapest tool ($479), but it is worth every penny if you for example do a lot of text file handling like me. It comes with hundreds of UNIX command line tools, shells, etc.

So the next few blog entries will be dedicated to learning a few nifty tricks on how to use UNIX command line tools. It is not going to be extremely advanced, but it will solve a lot of the everyday problems you encounter. Warning: Once you learn these little tricks, your co-workers with regard you as a complete nerd or even a hack :-) If you are working on a UNIX platform, the tools are of course readily available at the prompt. If you are on Windows, go purchase a set of command line tools (like the MKS Toolkit I mentioned). If you work on a Mac, you probably haven't heard the term command line before :-)

Anyway, let's start out with the very basics by first introducing a few simple tools. The power of command line tools is in stringing them together, but before we get there, we need to understand what each tool does. Please note that you can always access the online help for each tool by typing "man toolname". These online help pages (aka man pages) contain descriptions, available options, and examples, and they are a great source for exploring the tools even further.

For all the examples below, assume that you have a very simple tab delimited text file named file.txt with 3 columns. An easy way to create it is to create it in Excel and then paste it into your favorite text editor. The file content is shown below:
line1 A Mike
line2 C Chang
line3 A Tom
line4 B Mike

OK, here we go with the top 7 command line tools I use the most. Each tool has a large number of options, but the examples just shows one of them.

grep: Find lines containing or not containing text string

-i: option to ignore case (i.e. case insensitive)
C:\>grep -i mike file.txt
line1 A Mike

line4 B Mike

wc (word count): Show number of lines, word, etc
-l: option to get only the line count
C:\>wc -l file.txt
4 file.txt

tail: Show only last few lines.
head: Show only first few lines.
-10: you can specify how many lines you want
C:\>tail -2 file.txt
line3 A Tom
line4 B Mike
C:\>head -1 file.txt
line1 A Mike

cut: Cut file vertically by characters or fields
-f: option to specify fields to cut
C:\>cut -f1,3 file.txt
line1 Mike
line2 Chang
line3 Tom
line4 Mike

uniq: show unique or repeated lines
-c: option to count number of times a line occurs
C:\>uniq -c file.txt
1 line1 A Mike
1 line2 C Chang
1 line3 A Tom
1 line4 B Mike

sort: Sort input lines
-k: specify which fields to sort on
C:\>sort -k2 file.txt
line1 A Mike
line3 A Tom
line4 B Mike
line2 C Chang

Ok, that was the very very basic stuff. Not terribly useful when you only use one command. The real nifty stuff we'll look at in the next blog entry.

Nifty

No comments: