Tuesday, July 29, 2008

Benchmarking is available!

I've begun benchmarking Cacheboy-1.4. The details are available at http://www.cacheboy.net/benchmarks.html. They aren't spectacular - I'm mainly doing them to keep track on development and make sure I'm not introducing regressions anywhere.

I'm not all that happy with 50% CPU (on one CPU too!) at 500 req/sec. Alas, thats what I have to work with - I can't push these disks nor the polygraph hosts any harder at the present time. Maybe if I spent two weeks fixing polygraph so it used kqueue() instead of poll() ..

Saturday, July 26, 2008

Commercial work updates!

I've just completed the development and local testing of the client-side delay pools. That'll go into Squid-2.HEAD in the next few days. I'll try untangling the client-side delay pools from the class 5 delay pool work (which shouldn't be -that- difficult, just slightly tedious) and commit them as two seperate chunks.

I'll post more details on my company blog - http://xenionhosting.blogspot.com/ - as I think the details of my current and future commercial Squid stuff should be detailed over there.

Wednesday, July 23, 2008

Surviving polymix-4..

I'm putting Cacheboy-1.4 through a basic polymix-4 polygraph workload. So far so good - its just unfortunate that polygraph still uses poll() / select(). Most of the process CPU time is spent in those two system calls and not doing any useful work.

So far, so good at ~ 500 req/sec (with <10% CPU usage..) I'm going to resolve a few strange issues I'm seeing and then begin publishing some actual performance numbers over the next few weeks. I'll also start publishing some microbench numbers comparing Squid-2.6, Squid-2.7, Squid-3.0, Squid-3.1 and Cacheboy. Cacheboy will come out on top, of that I'm quite sure. :)

Tuesday, July 22, 2008

Threading Squid - initial observations

My next task after some IPv6 related reshuffling is to bring in the bare essentials needed to make Squid^WCacheboy SMP-happy.

There are a few potential ideas:


  • Leave Squid single-threaded. Stop it from doing its own disk/memory caching; push that out to a shared external process and abuse sysvshm IPC/anonymous mmap/etc to share large amounts of data efficiently;

  • Thread Squid entirely. Allow multiple concurrent copies of squid running in threads - whichever "model" of thread helpers you choose - and parallelise everything;

  • Provide basic thread services but leave Squid monolithic - push certain things into threads for now, figure out what benefits from being run in parallel;

  • A mix of all of the above.



Some of the problems that are faced!

cbdata



The cbdata type makes it a pain in the ass. Specifically, anything which wants to be shared between threads needs to be able to be 'locked' into memory until the thread hands it back either completed, or cancelled.

cbdata doesn't give you any guarantees that the pointer is pointing to something even remotely valid - even if you cbdataLock()'ed the item, the owner (or not! Thats how horrible the code can get) can cbdataFree() the underlying pointer and suddenly you're pointing at gunk. It might smell mostly right, it might even have somewhat valid data, but its still freed gunk, and thats not good enough.

Shared Statistics



Squid keeps a lot of statistics and histograms. Something needs to be done to allow these to be kept in multiple threads without lots of fine-grain locks and/or stalling.

I may just get rid of a lot of the complicated statistics and require them to be post-process derived externally.

Memory Pools



The memory pools framework will be a nightmare to thread efficiently. Well, memory allocators in general are. I -could- just fine-grain lock it, but it gets a -lot- of requests and so I'd have to first fix the pool abusers before I consider this. (I'm going to do it anyway, but not so I can then fine-grain thread mempools.) I could figure out the best way to thread it - or run multiple pools per pool, one per thread - but damnit, this is 2008, there are better malloc implementations out there by people who understand concurrency issues better than I. Its a waste of time to try and thread it until I understand the workload and implications better.

So I'll -probably- be turfing mempools as it stands and replacing it with just enough to keep statistics before going direct to malloc(). See the statistics section above. I won't do this until I've modified the heaviest mempool abusers to -not- put such large demands on the allocator system, so it'll be a win/win situation everywhere.

more to come..

Tuesday, July 15, 2008

Commercial projects and such..

I've got a few commercial projects to finish up on Squid over the next few weeks which will be taking my time away from Cacheboy development.

Specifically:


  • I'm adding client-side -write- delay pools, so you can rate limit the replies sent back to clients whether they are a cache hit or miss (specifically for reverse proxies, but I'm sure forward proxies will have a use for them);

  • Buffering POST requests a bit before connecting to the back-end origin server, which matters when your back-end server pays a high price for holding a connection open with no data going over it;

  • Finally - some log reporting tools (hopefully written in Lua! :) for basic WebUI logfile reporting in a fast, sensible manner



I've got a few other possibilities which might creep up over the next couple of months but nothing yet concrete.

Client-side IPv6, HTTP/1.1 and a threaded core will have to wait until I've completed the paid work I'm afraid! OSS coders have to eat too!

Sunday, July 13, 2008

Watching things evolve..

I'm finding it interesting to watch myself "evolve" the Cacheboy roadmap over time. Take the previous two cacheboy-users posts: first I thought Cacheboy-1.4 will get the IPv6 enabled core, but after doing the latest set of changes I've decided the best thing to do is to get Cacheboy-1.4 out with the current code layout, sort out whatever bugs crept in, then build the IPv6 enabled core in Cacheboy-1.5 and IPv6 client-side support in Cacheboy-1.6.

I have a general idea where I'd like to take things and I have a specific set of goals in mind along the way, but everything is still evolving with time. Its an interesting experience - there are dozens of areas in the codebase which I'd like to spend time working on but I have to keep the medium and long-term project goals in mind.

Which isn't to say I won't get distracted from time to time and break out a test branch to play with something, like one of the branches playing around with memory allocation overheads. I just treat that, like the last 10 or so years of experimenting with the codebase, as a way to get more of an idea what work needs to be done.

Saturday, July 12, 2008

Cacheboy: shuffling around the DNS code

I'm shuffling around the DNS code in preparation for some work toward an IPv6 core. Strictly speaking, I could have just left the dns code in src/ and IPv6'ed the raw network/socket layer but I've decided "basic" functional IPv6 support will require DNS support and so be it. It'll let me write test cases to make sure that the new code handles IPv4 and IPv6 DNS "right". I still don't know what "right" entails and I'm sure that journey will be very enlightening!

Its been more tedious than complicated. There's a bunch of config file parsing which needs to stay in src/ and I've split out the "libsqdns" DNS initialisation from the "squid" DNS initialisation. It compiles and runs here, resolving DNS requests happily, so I guess I'm mostly on track. I had to shuffle around some config variables so its entirely possible I've screwed that up somewhere.

This highlights the requirement for a much more sensible configuration management framework. It doesn't even have to be that complicated - just not the "one great big Config struct" that Squid currently has. I've got some plans in the back of my head to generic-ify that much later on down the track but it'll have to wait a while. It'll probably come in when the ACL code is split out into squid-specific and generic ACL types. (A lot of the ACL types aren't really specific to HTTP and in reality can be reused in a variety of network applications.)

So tomorrow I'll find some time to get the external DNS code working again which I hope will be slightly easier than the internal DNS code. Then I can let this codebase simmer for a bit, push Cacheboy-1.4 out the door and wait for it to stabilise before my next round of changes towards IPv6.