Adrian Chadd's Ramblings: July 2008

Tuesday, July 29, 2008

Benchmarking is available!

I've begun benchmarking Cacheboy-1.4. The details are available at http://www.cacheboy.net/benchmarks.html. They aren't spectacular - I'm mainly doing them to keep track on development and make sure I'm not introducing regressions anywhere.

I'm not all that happy with 50% CPU (on one CPU too!) at 500 req/sec. Alas, thats what I have to work with - I can't push these disks nor the polygraph hosts any harder at the present time. Maybe if I spent two weeks fixing polygraph so it used kqueue() instead of poll() ..

Saturday, July 26, 2008

Commercial work updates!

I've just completed the development and local testing of the client-side delay pools. That'll go into Squid-2.HEAD in the next few days. I'll try untangling the client-side delay pools from the class 5 delay pool work (which shouldn't be -that- difficult, just slightly tedious) and commit them as two seperate chunks.

I'll post more details on my company blog - http://xenionhosting.blogspot.com/ - as I think the details of my current and future commercial Squid stuff should be detailed over there.

Wednesday, July 23, 2008

Surviving polymix-4..

I'm putting Cacheboy-1.4 through a basic polymix-4 polygraph workload. So far so good - its just unfortunate that polygraph still uses poll() / select(). Most of the process CPU time is spent in those two system calls and not doing any useful work.

So far, so good at ~ 500 req/sec (with <10% CPU usage..) I'm going to resolve a few strange issues I'm seeing and then begin publishing some actual performance numbers over the next few weeks. I'll also start publishing some microbench numbers comparing Squid-2.6, Squid-2.7, Squid-3.0, Squid-3.1 and Cacheboy. Cacheboy will come out on top, of that I'm quite sure. :)

Tuesday, July 22, 2008

Threading Squid - initial observations

My next task after some IPv6 related reshuffling is to bring in the bare essentials needed to make Squid^WCacheboy SMP-happy.

There are a few potential ideas:

Leave Squid single-threaded. Stop it from doing its own disk/memory caching; push that out to a shared external process and abuse sysvshm IPC/anonymous mmap/etc to share large amounts of data efficiently;

Thread Squid entirely. Allow multiple concurrent copies of squid running in threads - whichever "model" of thread helpers you choose - and parallelise everything;

Provide basic thread services but leave Squid monolithic - push certain things into threads for now, figure out what benefits from being run in parallel;

A mix of all of the above.

Some of the problems that are faced!

cbdata

The cbdata type makes it a pain in the ass. Specifically, anything which wants to be shared between threads needs to be able to be 'locked' into memory until the thread hands it back either completed, or cancelled.

cbdata doesn't give you any guarantees that the pointer is pointing to something even remotely valid - even if you cbdataLock()'ed the item, the owner (or not! Thats how horrible the code can get) can cbdataFree() the underlying pointer and suddenly you're pointing at gunk. It might smell mostly right, it might even have somewhat valid data, but its still freed gunk, and thats not good enough.

Shared Statistics

Squid keeps a lot of statistics and histograms. Something needs to be done to allow these to be kept in multiple threads without lots of fine-grain locks and/or stalling.

I may just get rid of a lot of the complicated statistics and require them to be post-process derived externally.

Memory Pools

The memory pools framework will be a nightmare to thread efficiently. Well, memory allocators in general are. I -could- just fine-grain lock it, but it gets a -lot- of requests and so I'd have to first fix the pool abusers before I consider this. (I'm going to do it anyway, but not so I can then fine-grain thread mempools.) I could figure out the best way to thread it - or run multiple pools per pool, one per thread - but damnit, this is 2008, there are better malloc implementations out there by people who understand concurrency issues better than I. Its a waste of time to try and thread it until I understand the workload and implications better.

So I'll -probably- be turfing mempools as it stands and replacing it with just enough to keep statistics before going direct to malloc(). See the statistics section above. I won't do this until I've modified the heaviest mempool abusers to -not- put such large demands on the allocator system, so it'll be a win/win situation everywhere.

more to come..

Tuesday, July 15, 2008

Commercial projects and such..

I've got a few commercial projects to finish up on Squid over the next few weeks which will be taking my time away from Cacheboy development.

Specifically:

I'm adding client-side -write- delay pools, so you can rate limit the replies sent back to clients whether they are a cache hit or miss (specifically for reverse proxies, but I'm sure forward proxies will have a use for them);

Buffering POST requests a bit before connecting to the back-end origin server, which matters when your back-end server pays a high price for holding a connection open with no data going over it;

Finally - some log reporting tools (hopefully written in Lua! :) for basic WebUI logfile reporting in a fast, sensible manner

I've got a few other possibilities which might creep up over the next couple of months but nothing yet concrete.

Client-side IPv6, HTTP/1.1 and a threaded core will have to wait until I've completed the paid work I'm afraid! OSS coders have to eat too!

Sunday, July 13, 2008

Watching things evolve..

I'm finding it interesting to watch myself "evolve" the Cacheboy roadmap over time. Take the previous two cacheboy-users posts: first I thought Cacheboy-1.4 will get the IPv6 enabled core, but after doing the latest set of changes I've decided the best thing to do is to get Cacheboy-1.4 out with the current code layout, sort out whatever bugs crept in, then build the IPv6 enabled core in Cacheboy-1.5 and IPv6 client-side support in Cacheboy-1.6.

I have a general idea where I'd like to take things and I have a specific set of goals in mind along the way, but everything is still evolving with time. Its an interesting experience - there are dozens of areas in the codebase which I'd like to spend time working on but I have to keep the medium and long-term project goals in mind.

Which isn't to say I won't get distracted from time to time and break out a test branch to play with something, like one of the branches playing around with memory allocation overheads. I just treat that, like the last 10 or so years of experimenting with the codebase, as a way to get more of an idea what work needs to be done.

Saturday, July 12, 2008

Cacheboy: shuffling around the DNS code

I'm shuffling around the DNS code in preparation for some work toward an IPv6 core. Strictly speaking, I could have just left the dns code in src/ and IPv6'ed the raw network/socket layer but I've decided "basic" functional IPv6 support will require DNS support and so be it. It'll let me write test cases to make sure that the new code handles IPv4 and IPv6 DNS "right". I still don't know what "right" entails and I'm sure that journey will be very enlightening!

Its been more tedious than complicated. There's a bunch of config file parsing which needs to stay in src/ and I've split out the "libsqdns" DNS initialisation from the "squid" DNS initialisation. It compiles and runs here, resolving DNS requests happily, so I guess I'm mostly on track. I had to shuffle around some config variables so its entirely possible I've screwed that up somewhere.

This highlights the requirement for a much more sensible configuration management framework. It doesn't even have to be that complicated - just not the "one great big Config struct" that Squid currently has. I've got some plans in the back of my head to generic-ify that much later on down the track but it'll have to wait a while. It'll probably come in when the ACL code is split out into squid-specific and generic ACL types. (A lot of the ACL types aren't really specific to HTTP and in reality can be reused in a variety of network applications.)

So tomorrow I'll find some time to get the external DNS code working again which I hope will be slightly easier than the internal DNS code. Then I can let this codebase simmer for a bit, push Cacheboy-1.4 out the door and wait for it to stabilise before my next round of changes towards IPv6.

Saturday, July 5, 2008

The Squid Callback Data Type

One of the issues programmers frequently face is knowing whether some piece of data you have is actually valid. Modern languages provide a variety of methods for creating an "invariant condition" about the validity of your data - reference counting, for example, allows you to ensure that data is not free'd before all references to it have been removed. This invariant is not always what you first think. The invariant condition for reference counting, for example, is that the data is either referenced by something or by nothing at all. Generally the programmer will treat the transition from "referenced by something" to "referenced by nothing" as the important transition and do something like removing the object from whatever lists its on, notifying other objects that its going away, cleaning up allocated memory, etc.

Take traditional "callback" type programming. The programmer decides that some function is to be called after an event has occured (for example, "the ACL lookup has completed", or "the network write has completed") and this function needs some sort of "state" to know what its operating on. You could view this state as a sort of object. The trouble in C is that the language itself doesn't give you any tools to know that the supplied pointer is valid or not. Now, think about this - firstly, whats "valid" mean. The pointer is pointing to some region of memory that hasn't been freed? What about the state of the object? What if the object state changed between the callback being scheduled and the callback being executed? Is this "valid"?

Squid implements "callback data". Initially, this "callback data" (called cbdata in the code) was a registry for callback data pointers. Pointers were reference counted when passed in as part of a scheduled callback; they would be decremented before the callback was about to run, and the callback would only be executed if the callback pointer was "valid". The "owner" (for whatever meanings of "own" you'd like to try and define) could "free" the data pointer - in which case the callback data registry would mark that pointer as invalid; subsequent checks for the validity of said pointer would return invalid, and any callbacks that were going to occur could be ignored. Eventually, the reference count would hit 0 and at 0 the memory at the pointer would be freed.

Expressed as code:

ptr = cbdataAlloc(type);
...
doSomething(someFunc, ptr)

which would:
state->cb = someFunc;
state->cbdata = ptr;
cbdataLock(ptr);

.. then, when the chain of events which doSomething() started would finish, this would occur:

if (state->cb && cbdataValid(ptr)) {
state->cb(state->cbdata);
}
cbdataUnlock(ptr);

This way, the callback would only occur IFF there was a callback and the callback data was still valid.

cbdataAlloc() returns a pointer with refcount = 0 and valid = true ; cbdataLock() incremented refcount; cbdataUnlock() decremented refcount and would free the pointer if (valid == false && refcount == 0); cbdataValid() returned (valid == true); cbdataFree() would set valid = false and free the object if (valid == false && refcount == 0)

This worked out to be quite helpful in preventing callbacks from being run if the data was freed. It however introduces a few assumptions which make certain things difficult to debug and implement.

Firstly, you don't have any guarantee that the callback will be called when you schedule for the call. So in the above code, if something calls cbdataFree(ptr) between the callback registration and the completion of the action initiated by doSomething(), the action will complete but the callback won't be made. The programmer needs to make sure that the code can handle not having the callback ever be made. Traditionally, you would instead either cancel the operation explicitly instead of letting it continue to completion and handle the situation where it couldn't be cancelled, or let the operation complete before transitioning to some "dying" state.

Secondly, generally the "object destructor" here is called not by the cbdata reference count hitting 0, but by some explicit destruction call elsewhere in the code. For example, you would have this in the code:

fooComplete(foo *ptr)
{
free(ptr->data);
cbdataFree(ptr);
}

There still may be references to the callback data but no callbacks will occur on it because cbdataFree() marks that ptr as invalid. So cbdata isn't quite behaving traditional reference counted "types" behave.

Here's where this gets ugly: but it can - you can register a function to be called just before the ptr is finally freed. _SOME_ areas of code do this. _SOME_ areas of code do not. You can't assume that the behaviour for a given cbdata pointer type will be one or the other.

Thirdly, if the action initiated by doSomething() requires some part of ptr to be valid then it will need to wrap every access to the data inside ptr with a if (cbdataValid(ptr)) check. This doesn't always happen :) and has been the cause of all sorts of silly bugs because although the memory pointed to by ptr is still valid, the object may have gone through its "destruction" phase and whats left in memory (which again, hasn't been freed) is actually the last traces of object state. This may be valid, this may be invalid. Who knows. I can't guarantee that accesses to cbdata pointer dereferences are always done conditional to said pointer being valid. That would be a fun thing to hack in as a valgrind module!

This all started rearing its ugly head in Squid-3 as a few things were converted from cbdata type pointers to more traditional reference counted types. The programmers assumed the behaviour was equivalent when it wasn't and all kinds of strange bugs arose some of which took over 12 months to find and fix.

What would I like to see? Thats a good question and will probably form the basis of further improvements in Cacheboy..

Wednesday, July 2, 2008

libevent httperf!

I decided to poke httperf a little as a testing suite - and it uses select()! What the hell?

Four hours later, I think I have a libevent enabled httperf.

http://code.google.com/p/httperf-adrian/