Awk: Separating delimited files on the command line
By Eric Downing. Filed in Linux, Scripting, Utilities |I have had to process data files from customers that have come in as delimited files. The files are usually in CSV formats and the delimiters might change depending on the content of the files. Sometimes the delimiters are usually tabs”t”, pipes “|”, comma “,” or semi-colon “;”. A useful tool for processing this type of file is to use awk.
Example command line:
awk -F 't' '{OFS="|"; if (NR != 1) print $3,$2,$1}'
The ‘-F’ argument is followed by the character that you have the file separated on. The example above uses a tab escape sequence ‘t’ but you could have any character. One potential problem is if the data contains the same character that you are attempting to split on. As in a comma separated list using commas within a long text sequence.
The Second sequence of quoted text above starts the processing of the file. The if statement uses the NF keyword which stands for Number of Row. This tells the script that it should not fire for the first row. The print statement allows you to pick and choose the columns(numbered from 1) to use or allows you to reorder them.
The ‘OFS’ part of the command line that helps with the print statement describes the Output Field Separator. In the case above it is using a pipe character ‘|’ to separate the text as it is being printed out. This allows you comma separate the fields that you want instead of having to use a more sophisticated printf statement.
The input file is of the form
Label,name,address
And the output is going to be of the form:
address,name,Label