Liveblogging: Senior Skills: Grok awk

[author's note: personally, I use awk a bunch in MySQL DBA work, for tasks like scrubbing data from a production export for use in qa/dev, but usually have to resort to Perl for really complex stuff, but now I know how to do .]
Basics:
By default, fields are separated by any number of spaces. The -F option to awk changes the separator on commandline.
Print the first field, fields are separated by a colon.
awk -F: '{print $1}' /etc/passwd
Print the first and fifth field:
awk -F: '{$print $1,$5}' /etc/passwd
Can pattern match and use files, so you can replace:
grep foo /etc/passwd | awk -F: '{print $1,$5}'
with:
awk -F: '/foo/ {print $1,$5}' /etc/passwd
NF = built in variable (no $) used to mean “field number”
This will print the first and last fields of lines where the first field matches “foo”
awk -F: '$1 ~/foo/ {print $1,$NF}' /etc/passwd
NF = number of fields, ie, “7″
$NF = value of last field, ie “/bin/bash”
(similarly, NR is record number)

Awk makes assumptions about input, variables, and processing that you’d otherwise have to code yourself.
- “main loop” of input processing is done for you
- awk initializes variables for you, to 0
- input is viewed by awk as ‘records’ which are splittable into ‘fields’
This all makes a lot of operations very concise in awk, many things can be done w/ a one-liner that would otherwise require several lines of code.
awk key points:
- splits text into fields
- default delimiter is “any number of spaces”
- reference fields
- $0 is entire line
- create filters using ‘addresses’ which can be regexps (similar to sed)
- Turing-complete language
- has if, while, for, do-while, etc
- built-in math like exp, log, rand, sin, cos
- built-in string sub, split, index, toupper/lower
Patterns and actions
Pattern is first, then action(s)
Actions are enclosed in {}
only a pattern, no action:
'length>42'
but, the default action is to print the whole line, so this will actually do something — print lines where the length of the line is > 42. strings are just arrays in awk
only action, no pattern:
{print $2,$1}
do this to all lines of input
NR % 3 == 0
print every third line (pattern is %NR mod 3)
{print $1, $NF, $(NF-1)}
print the first field, last field, and 2nd to last field
built-in variables
NF, NR we’ve done
FS = field separator (can be regexp)
OFMT = output format for numbers (default %.6g)
Patterns
- used to filter lines processed by awk
- can be regexp
/^root/ is the pattern in the following
awk -F:'/^root/ {print $1,$NF}' /etc/passwd
- Patterns can use fields and relational operators
To print 1st, 4th and last field if value of 4th field >10:
awk -F: '$4 > 10 {print $1, $4, $NF}' /etc/passwd
awk -F: '$0 !~ /^#/ && $4 > 10 {print $1, $4, $NF}' /etc/passwd
Range patterns
sed-like addressing : you can have start and end addresses
awk ‘NR==1,NR==3′
prints only first three lines of the file
You can use regular expressions in range patterns:
awk -F:’/^root/,/^daemon/ {print $1,$NF}’ /etc/passwd
start printing at the line that starts with “root”, the last line that is processed is the line starting with “daemon”
Range pattern “gotcha” – can’t mix a range with other patterns:
To do “start at non-commented line where value of $4 is less than 10, end at the first line where value of $4 is greater than 10″
This does not work!
awk -F: '$0 !~ /^#/ $4 <= 10, $4 > 10' /etc/passwd
This is how to do it, {next} is an action that skips:
awk -F: '$0 ~ /^#/ {next} $4 <= 10, $4 > 10 {print $1, $4' /etc/passwd
Basic Aggregation
awk -F: ‘$3 > 100 {x+=1; print x}’ /etc/passwd
This gives a line of output as each matching line is processed. This gives a running total of x.
awk -F: ‘$3 > 100 {x+=1} END {print x}’ /etc/passwd
This processes the “{print x}” action only after the entire file has been processed. This gives only the final value of x.
Arrays:
Support for regular arrays
Technically multi-dimensional arrays are not supported, but array indexes are not supported, so you can make your own associative arrays.
Example:
awk -F: ‘{x[$1] = $2*($4 – $3)} END {for(key in x) {print key, x[key]}}’ stocks.txt
The part before the END creates the associative array, the part after the END prints the array.
Extreme data munging:
awk -f: '{x[$1]=($2'($4 - $3))} END {for(z in x) {print z, x[z]}}' stocks.txt
ABC,100,12.14,19.12
FOO,100,24.01,17.45

output
BAR 271.5
ABC 698
For the line “ABC,100,12.14,19.12″
the function becomes
x[ABC] = 100 * (19.12 - 12.14) = 698
Aggregate across multiple variables:
awk -F, '{x[$1] = $2*($4 - $3); y+=x[$1]} END {for(z in x) {print z, x[z]}} {print "Net:"y}}' stocks.txt
Note that y is a running *sum* (not a running count like before).
Now, the above is hard to read, this is much easier.

#!/usr/bin/awk -f

BEGIN { FS="," }
{ x[$1] = $2*($4 - $3)
y+=x[$1]
}
END {
for(z in x) {
print z, x[z]
}
} # end for loop
{
print "Net:"y
} # end END block

This was liveblogged, so please point out any issues, as they may be typos on my part….

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd> <object> <embed> <script>
  • You can use "object", "embed" and "script" tags from the following sites to add media from the following sites to your posts:

    • http://www.youtube.com
    • http://youtube.com

  • Lines and paragraphs break automatically.
  • You may use <swf file="song.mp3"> to display Flash files inline
  • Avast! This website be taken over by pirates on September 19th. Yarr!
  • You may post code using <code>...</code> (generic) or <?php ... ?> (highlighted PHP) tags.

More information about formatting options

Captcha
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
1 + 0 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.
Книгу windows server 2003 Мультфильм приключение аленушки и яремы Пожизненый ключ на nod32 Форумы Baldwin 2 40 pro