Here's the output trace: I'm running it on the Sun X2100 running a flavour of ubuntu; this is doing ~ 300mbit FDX at about 9000 req/sec (tiny transactions!) w/ 1000 concurrent connections; I'm specifically trying to trace the management overhead versus the data copying overhead. This has maxed out both thttpd on the server-side and the tcp proxy itself.
Gah, look at all of those mallocs and stdio calls doing "stuff"..
root@rachelle:/home/adrian/work/cacheboy/branches/CACHEBOY_PRE/app/tcptest# opreport -l ./tcptest | less
CPU: AMD64 processors, speed 2613.43 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples % image name symbol name
96851 11.3738 libc-2.6.1.so vfprintf
62317 7.3182 libc-2.6.1.so _int_malloc
37556 4.4104 tcptest comm_select
35405 4.1578 tcptest commSetEvents
32901 3.8638 libc-2.6.1.so _int_free
30245 3.5518 tcptest commSetSelect
29890 3.5102 tcptest commUpdateEvents
28812 3.3836 libc-2.6.1.so _IO_default_xsputn
20360 2.3910 tcptest sslSetSelect
17279 2.0292 libc-2.6.1.so malloc_consolidate
16610 1.9506 libc-2.6.1.so epoll_ctl
16307 1.9150 tcptest sslReadServer
16154 1.8971 libc-2.6.1.so fcntl
14601 1.7147 tcptest xstrncpy
12003 1.4096 libc-2.6.1.so memset
11617 1.3643 tcptest memPoolAlloc
10931 1.2837 libc-2.6.1.so calloc