UNIX has wonderfully powerful text processing capabilities. There are numerous ways to solve the same problem. Frequently, for example, it is necessary to extract a single column of data from a text file or output stream. This tech-recipe will present several solutions to this problem.
Many data files have data fields delimited by a single character like a tab or colon. To extract the full name field out of /etc/passwd, the fifth colon-delimited field, use:
cut -d : -f 5 /etc/passwd
The cut command allows a great deal of flexibility in cutting data. In this case, the -d : directs cut to use a colon character as the delimiter. The -f 5 parameter directs cut to extract only the fifth field. The field parameter makes cut extrememly flexible. Other examples are -f 2-5 to extract fields 2 through 5, -f 1,3,7 to extract the first, third, and seventh fields.
To extract a fixed set of columns, for example the column numbers 44 through 49 from a long directory listing (ls -l), use the following command:
ls -l | cut -c 44-49
On many UNIX systems, these columns represent the modification date. Like the -f parameter, the -c parameter can accept alternative values such as -c 5,7,6,8 will present those character positions in that order.
One of the trickier column extraction involves the presence of a variable amount of whitespace between fields. To extract the process id (second) field from a process listing (ps -ef), cut will not work. Another powerful text manipulator in UNIX is awk which understands that several spaces should be counted as a single whitespace. To extract the pid from a ps -ef, use:
ps -ef | head | awk '{print $2}'
Awk is an incredibly powerful tool, and this is a trivial but useful application of it.