Saturday, August 2, 2008

Grep'ing for a column value

A few days ago I needed to get those lines from a file that only contained a certain value in a certain field. To simplify the example, let's say that I have a file containing id, first name, last name, state, and phone number, and that I am only interested in those entries containing a state of California (e.g. ca or CA).

Here is the sample file named in.txt:
line1 joe smith ca 4085551212
line2 joe carlson az 3334445555
line3 carl smith ny 2049998888
line4 joe smith or 5035551234
line5 mike erwin ca 4159876543
line6 mike erwin CA 4159876543

I can not simply do a "grep -i ca in.txt" as I would get all lines containing "ca" even if "ca" occurred in the name fields (e.g. like in carl or carlson). In the above case I would get all lines, except line 4, which is of course incorrect.

If I did "cut -f4 in.txt | grep -i ca" I would get the correct number of results (i.e. 3), but I would only get the 3 "ca" values and not the whole line as I cut everything else away.

So one way to solve the problem (there may be other more clever ones I'm not aware of), is to use the "awk" command instead of the "grep" command.
cat in.txt | awk '{if ($4=="ca") print $0;}'

I think the above is self explanatory even for non-awk folks, the only trick is that $0 shows the entire line. If you only wanted to get certain columns in the output, you could decide to only print $2,$3,$5 for example to get first name, last name and phone number.

If you want to make the solution case insensitive, simply use the awk command "tolower()" in your if statement:
cat in.txt | awk '{if (tolower($4)=="ca") print $0;}'

Nifty.

3 comments:

Derek Brewer said...

Great tip. I've done this in the past with multiple commands and fiddling around with variables. This is nice and clean.

jleslie48 said...

to expand on this, what if you want to wild card some of the characters of the field, for example if you want a list of containing all lines where somewhere in the phone number is the string "876" ?

Thomas Gemal said...

@jleslie48 you can do this 2 ways. either expand on your AWK using a regular expression (RegEx), or simply use grep as it looks for the value anywhere in the string.