Friday, May 07, 2010

Quick profiling script to gather performance metrics in ruby

So over at work we are using an interpreted language for performing many tasks. The interpreter has a debug mode wherein it prints out START and END messages for every function executed with a timestamp in milliseconds. Here's a sample -

LOG : 1221489714064 : foo() : START
LOG : 1221489714286 : bar1() : START
LOG : 1221489714411 : baz1() : START
LOG : 1221489714676 : baz1() : END
LOG : 1221489714812 : baz2() : START
LOG : 1221489715478 : baz2() : END
LOG : 1221489715574 : bar1() : END
LOG : 1221489715698 : bar2() : START
LOG : 1221489718073 : bar2() : END
LOG : 1221489720092 : foo() : END

You can see what is going on there. The huge number is a milliseconds timestamp, followed by the function name and a START or an END depending on whether it entered or exited the function.

This is pretty useful information when you are trying to squeeze out every bit of performance out of these functions.. Unfortunately looking at the log files and performing mental subtraction between 13-14 digit numbers on the fly is really not something I wanted to do, EVER.

So I wrote this simple ruby script in a jiffy. It converts the previous bit of information into something a lot easier to parse -

foo() == 6028 msec
-->bar1() == 1288 msec
-->-->baz1() == 265 msec
-->-->baz2() == 666 msec
-->bar2() == 2375 msec

As you can see, the script gives you the elapsed times for all the functions. It supports nesting and depicts the function stack graphically. It makes it easy, for example, to see that foo calls two more functions bar1 and bar2 which take more than half of the processing time of foo. Here's the script -

$pat = /LOG : ([\w.]+) : ([\w.()]+) : (START|END)/
$re = Regexp.new($pat)

stk = []
out = [[]]
File.open('log').each do |line|
if line =~ $re
if ($3 == 'START')
out.push []
stk.push $~
elsif ($3 == 'END') and not stk.empty?
# gather all information
end_msec = $1
$~ = stk.pop
elapsed = end_msec.to_i-$1.to_i
children = out.pop
out.last.push("#{"-->"*stk.length}#{$2} == #{elapsed} msec\n")
out[out.length-1] = out.last + children
end
end
end
out.each {|ls|ls.each{|l|print l}}


It expects to find the START and END statements in a file called 'log'.

All the manipulation with the stack are needed to support nested functions and still support correct function call order.

The only tricky bits in the code are really the pushing and popping of regular expression results. Hats off to Ruby for allowing us to do that! Really. Lesser languages would have made this task harder than it needs to be, by requiring us to extract and then save the regex match results into our own data structure.

No comments: