This is the sort of stuff lurking in the Squid codebase which really needs to be cleaned out.
I'm going through the legacy memory allocator uses and pushing the allocator initialisation out to the modules themselves rather than having some of them globally initialised. This will let me push the buffer allocator code (ie, the "rest" of the current legacy memory allocator uses) outside of the Squid/cacheboy src/ directory and allows them to be reused by other modules. I can then begin pushing some more code out of the src/ directory and into libraries to make dependencies saner, code reuse easier and unit testing much easier.
One of these types is MEM_LINK_LIST.
A FIFO type queue implementation was implemented using an single linked list. An SLIST has an O(1) dequeue behaviour but an O(n) queue behaviour - the whole list has to be traversed to find the end before it can append to the end. This requires touching potentially dirty pages which may also stall the bus a little. (I haven't measured that in my testing btw; my benchmarking focused on the memory-hit/miss pathway and left ACLs/disk access out - thus thats all currently conjecture!)
The FIFO implementation allocated a temporary list object (MEM_LINK_LIST) to hold the next and data pointers. This was mempooled and thus "cached", rather than hitting the system malloc each time.
The only user is the threaded aufs storage code - to store the pending disk read and write operations for a given open storage file.
Now, the "n" in O(n) shouldn't be that great as not very many operations are queued on an open file - generally, there's one read pending on a store file and potentially many writes pending on a store file (if the object is large and coming in faster than 4kbytes every few milliseconds.) In any case, I dislike unbounded cases like this, so I created a new function in the double-linked list type which pops the head item off the dlink list and returns it (and returns NULL if the list is empty) and re-worked the aufs code to use it. The above link is the second half of the work - read the previous few commits for the dlink changes.
Note: I really want to move all of this code over to the BSD queue/list types. ARGH! But I digress.
Initial testing shows that I haven't screwed anything up too badly. (~400 req/sec to a pair of 10,000 RPM 18gig SCSI disks, ~50mbit client traffic, 80% idle CPU.)