OProfile is cool - it lets you dig into where your CPU is being spent. But aggregating statistics can be aggrevating. (Yes yes, it was bad, I know..)
Take this example from cacheboy:
CPU: Core 2, speed 2194.48 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
samples % image name symbol name
216049 6.5469 libc-2.7.so memcpy
115581 3.5024 libc-2.7.so _int_malloc
103345 3.1316 libc-2.7.so vfprintf
85197 2.5817 squid memPoolAlloc
64652 1.9591 libc-2.7.so memchr
60720 1.8400 libc-2.7.so strlen
Now, these tell you that CPU is being spent in the function (which is great) but its not the entire picture. The trouble is this: there's 527 functions in the top-level list, and 25 of them account for 1 or more percent of total runtime. Those top 25 account for ~ 45% of the total CPU time - so another 55% is being spent in the 501 functions remaining.
You may now ask yourself what the problem with that is - just optimise those top 25 functions and you'll be fine. Unfortunately, those top 25 functions aren't being called in one place - they're being called all over the shop.
Here's a example. Notice the strlen time:
496 13.7816 squid httpRequestFree
773 21.4782 squid httpHeaderPutStrf
9518 0.3432 libc-2.7.so vsnprintf
85433 55.6846 libc-2.7.so vfprintf
18212 11.8704 libc-2.7.so strchrnul
16037 10.4528 libc-2.7.so _IO_default_xsputn
13351 8.7021 libc-2.7.so _itoa_word
10872 7.0863 libc-2.7.so strlen
9518 6.2038 libc-2.7.so vsnprintf [self]
Note that the CPU times above "vsnprintf" are from the functions which call it, and CPU times below "vsnprintf" are the calls which it makes. Its not immediately obvious that I have to optimise "vsnprintf" calls from the top-level trace, as most of the *printf() calls end up being to "vsnprintf" (which shows up at 0.3% of CPU time) rather than "vfprintf" and friends.
Its obvious here that finding those places which call the *printf() functions in performance critical code - and then exorcising them - will probably help quite a bit.
What about the rest of the 500 odd functions? What I'd like to do is build aggregates of CPU time spent in different functions, including their called functions, and figure out which execution stacks are chewing the most CPU. Thats something to do after Cacheboy-1 is stable, and then only after my June exams.
The important thing here is that I have the data to figure out where Squid does things poorly and given enough time, I'm going to start fixing them in the next Cacheboy release.