Friday, January 16, 2009

Refcounted string buffers!

Those of you who have been watching may have noticed a few String tidyups going into CACHEBOY_HEAD recently (one of which caused a bug in the first cacheboy-1.6 stable release that made it very non-stable!)

This is all in preparation for more sensible string and buffer handling. Unfortunately the Cacheboy codebase inherited a lot of dirty string handling and it needed some house cleaning before I could look towards the future.

Well, the future is here now (well, in /svn/branches/CACHEBOY_HEAD_strref ...) - I brought in my refcounted buffer routines from my previous attempts at all of this and converted String.[ch] over to use it.

For now, the refcounted string implementation doubles the malloc overhead for new strings (since it has to create a small buf_t and a string buffer) but stringDup() becomes essentially free. Since in a lot of cases, the stringDup() occurs when copying string headers and basically leaving them alone, this saves on a bunch of memory copying.

Decent performance benefits will only come with a whole lot of work:
  • Remove all of the current assumptions in code which uses String that the actual backing buffer (accessible via strBuf()) is NUL-terminated;
  • Rewrite sections of the code which go between String and C string buffers (with copying, etc) to use String where applicable. Unfortunately a whole lot of the original client_side.c code which handles parsing the request involves a fair bit of crap - so..
  • .. writing replacement request and reply HTTP parsers is probably the next thing to do;
  • Shuffling around the client-side code and the http code to use a buf_t as a incoming socket buffer, instead of how they currently do things (in an ugly way..)
  • Propagate down the incoming socket buffer to the request/reply parsing code, so said code can simply create references to the original socket buffer, bypassing any and all requirement for copying the request/reply data seperately.
I'm reasonably excited about the future benefits this code holds, but for now I'm going to remain reasonably conservative and leave the current String improvements where they are. I don't mind if these and the next round of changes to the MemBuf code reduce performance but improve the code; I know that the medium-term goal is going to provide some pretty decent benefits and I want to keep things stable and usable in production whilst I get there.

Next on my list though; looking at removing the places where *printf() is used in critical sections..

Friday, January 9, 2009

More profiling!

The following info is for a 10,000 concurrent connections, keep-alived, of just a fetch of an internal icon object from Squid. This is using my apachebench-adrian package which can handle such traffic loads.

The below accounts for roughly 60% of total CPU time (ie, 60% of the CPU is spent in userspace) on one core.
With oprofile, it hits around 12,300 transactions a second.

I have much, much hatred for how Squid uses *printf() everywhere. Sigh.




CPU: AMD64 processors, speed 2613.4 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples cum. samples % cum. % image name symbol name
5383709 5383709 4.5316 4.5316 libc-2.6.1.so vfprintf
4025991 9409700 3.3888 7.9203 libc-2.6.1.so memcpy
3673722 13083422 3.0922 11.0126 libc-2.6.1.so _int_malloc
3428362 16511784 2.8857 13.8983 libc-2.6.1.so memset
3306571 19818355 2.7832 16.6815 libc-2.6.1.so malloc_consolidate
2847887 22666242 2.3971 19.0787 squid memPoolFree
2634120 25300362 2.2172 21.2958 libm-2.6.1.so floor
2609922 27910284 2.1968 23.4927 squid memPoolAlloc
2408836 30319120 2.0276 25.5202 libc-2.6.1.so re_search_internal
2296612 32615732 1.9331 27.4534 libc-2.6.1.so strlen
2265816 34881548 1.9072 29.3605 libc-2.6.1.so _int_free
1826493 36708041 1.5374 30.8979 libc-2.6.1.so _IO_default_xsputn
1641986 38350027 1.3821 32.2800 libc-2.6.1.so free
1601997 39952024 1.3484 33.6285 squid httpHeaderGetEntry
1575919 41527943 1.3265 34.9549 libc-2.6.1.so memchr
1466114 42994057 1.2341 36.1890 libc-2.6.1.so re_string_reconstruct
1275377 44269434 1.0735 37.2625 squid clientTryParseRequest
1214714 45484148 1.0225 38.2850 squid httpMsgFindHeadersEnd
1185932 46670080 0.9982 39.2832 squid statHistBin
1170361 47840441 0.9851 40.2683 squid urlCanonicalClean
1169694 49010135 0.9846 41.2529 libc-2.6.1.so strtok
1145933 50156068 0.9646 42.2174 squid comm_select
1128595 51284663 0.9500 43.1674 libc-2.6.1.so __GI_____strtoll_l_internal
1116573 52401236 0.9398 44.1072 squid httpHeaderIdByName
956209 53357445 0.8049 44.9121 squid SQUID_MD5Transform
915844 54273289 0.7709 45.6830 squid memBufAppend
907609 55180898 0.7640 46.4469 squid stringLimitInit
898666 56079564 0.7564 47.2034 libc-2.6.1.so strspn
883282 56962846 0.7435 47.9468 squid urlParse
852875 57815721 0.7179 48.6647 libc-2.6.1.so calloc
819613 58635334 0.6899 49.3546 squid clientWriteComplete
800196 59435530 0.6735 50.0281 squid httpMsgParseRequestLine

Thursday, January 8, 2009

FreeBSD TPROXY works!

The FreeBSD TPROXY support (with a patched FreeBSD kernel for now) works just fine in testing.

I'm going to commit the changes to FreeBSD in the next couple of days. I'll then bring in the TPROXY4 support from Squid-3, and hopefully get functioning TPROXY2, TPROXY4 and FreeBSD TPROXY support into the upcoming Cacheboy-1.6 release.

Wednesday, January 7, 2009

TPROXY support

G'day,

I've done a bit of shuffling in the communication code to include a more modular approach to IP source address spoofing.

There's (currently) untested support for some FreeBSD source IP address spoofing that I'm bringing over courtesy of Julian Elischer; and there's a Linux TPROXY2 module.

I'll look at porting over the TPROXY4 support from Squid-3 in a few days.

I think this release is about as close to "stable" as Cacheboy-1.6 is going to get, so look forward to a "stable" release as soon as the FreeBSD port has been setup.

I already have a list of things to do for Cacheboy-1.7 which should prove to be interesting. Stay tuned..

Sunday, January 4, 2009

next steps..

I've been slowly fixing whatever bugs creep up in my local testing. The few people publicly testing Cacheboy-1.6 have reported that all the bugs have been fixed. I'd appreciate some further testing but I'll get what I can for now. :)

I'll be doing a few things to the CACHEBOY_HEAD branch after the 1.6 release which will hopefully lead towards a mostly thread-safe core. I'm also busy documenting various things in the core libraries which I haven't yet gotten around to. I also really should sort out some changes to the apple HeaderDoc software to support generating slightly better looking documents from C source.

I've almost finished the first round of code reorganisation. The stuff that I would have liked to have done this round includes shuffling out the threaded async operations from the AUFS code and building a proper disk library, sort of like what Squid-2.2 "had" and what Squid-3.0 almost did; but instead use them to implement general disk IO versus specialised modules just for storage. I'd like to take advantage of threaded/non-blocking disk IO in a variety of situations, including logfile writing.



Sunday, December 28, 2008

Cacheboy-HEAD updates

I've finished cleaning up the bits of the IPv6 work from CACHEBOY_HEAD - it should be just a slightly better structured Squid-2.HEAD / Cacheboy-1.5.

Right now I'm pulling out as much of the HTTP related code from src/ into libhttp/ before the 1.6 release. I'm hoping to glue together bits and pieces of the HTTP code into a very lightweight (for Squid) HTTP server implementation which can be used to test out various things like thread safe-ness. Of course, properly testing thread-safeness in production relies on a lot of the other code being thread-safe, like the comm code, the event registration code, the memory allocation code, the debugging and logging code ... aiee, etc. Oh well, I said I wanted to..

I'm also going through and adding some headerDoc comments to various library files. headerDoc (from apple) is actually rather nice. It lacks -one- function - the ability to merge multiple files together (say, libsqinet/sqinet.[ch]) into one "module" for documentation. I may look at doing that in some of my spare time.

Saturday, December 27, 2008

Reverting IPv6 for now; moving forward with structural changes

I've been working on the IPv6 support in Cacheboy for a couple months now and I've come to the conclusion that I'm not getting anywhere near as far along the development path as I wanted to be.

So I've taken a rather drastic step - I've branched off CACHEBOY_HEAD from the last point along the main codebase where the non-intrusive IPv6 changes had occured and I'm going to pursue Cacheboy-1.6 development from that.

The primary short-term goal with Cacheboy was to restructure the codebase in such a way as to make further development much, much simpler. I sort of lost track with the IPv6 development stuff and I rushed it in when the codebase obviously wasn't ready.

So, the IPv6 changes will stay in the CACHEBOY_PRE branch for now; development will continue in CACHEBOY_HEAD. I'll continue the restructuring work and stability work towards a Cacheboy-1.6 release come January 1. I'll then look at merging over the IPv6 infrastructure work into CACHEBOY_HEAD far before I merge in the client and server related code - specifically, completing the DNS updates, the ipcache/fqdncache updates, port over the IPv6 SNMP changes from Squid-3, and look at modularising the ACL code in preparation for IPv6'ifying that. The goal is less to IPv6-ify Cacheboy; its more to tidy up the code to the point where IPv6 becomes trivial.