Adrian Chadd's Ramblings

Saturday, November 29, 2008

Working on NNRP proxies, or writing threaded C code..

So it turns out that I'm working on some closed-source NNRP proxy code. It sits between clients / readers and backend spool servers and directs/balances requests to the correct backend servers as required.

News is a kind of interesting setup. There are servers with just lots of articles, indexed via message ID (or a hash thereof.) There are servers with the overview databases, which keep track of article ids, message ids, group names, and all that junk. The client reader interface has a series of commands which may invoke a combination of access to both the overview databases and the article spools.

I'm working on a bit of code which began life as a reader -> spool load balancer; I'm turning it into a general reader and client facing bit of software which speaks enough NNRP to route connections to the relevant backend servers. The architecture is pretty simplistic - one thread per backend connection, one thread per client connection, "message queues" sit between all of these and use pthread primitives to serialise access. For the number of requests and concurrency, it scales quite well. It won't scale to 100,000 connections by any means but considering the article sizes (megabytes at a time) a 10GE pipe will be filled far, far before that sort of connection and request rate limit is reached.

So, today's post is going to cover a few things I've learnt whilst writing this bit of code. Its in C, so by its very nature its going to be horrible. The question is whether I can make it less horrible to work with.

Each client thread sits in a loop reading requests, parsing them, figuring out what needs to happen, then queuing messages to the relevant spool or reader server queue to be handled by one of the connection threads. Its relatively straightforward. The trick is to figure out how to keep connections around long enough so the thing you've sent the request to is still there when you reply.

There's a couple of options which are used in the codebase.

The first is what the previous authors did - they would parse an article request (ARTICLE, HEAD, BODY), create a request, push it onto the queue, and wait 1 second for a reply. If the reply didn't occur in that time they would push another request to another server. The idea is to minimise latency on the article fetches - instead of waiting around for a potentially overloaded server, they just queue requests to the other servers which may have the article and then stop queuing requests when one issues a reply. The rest of the replies then have to be dequeued and tossed away.

The second is what I did for the reader/overview side - I would parse a client request, (GROUP, STAT, XOVER, etc), create a request to the backend, push it onto the queue, and wait for the reply. The backend code took care of trying the set of commands required to handle that client request (eg a STAT would require a GROUP, then a STAT; but a STAT would only require a STAT on the backend), with explicit timeouts. If the request didn't happen by then, the backend reader thread would send a "timeout" reply to the client thread, and then attempt to complete the transaction before dequeuing the next.

There are some implications from the above!

The first method is easier to code and easier to understand conceptually - the client handles timeouts and throws away unwanted responses. The backend server code is easy - dequeue, attempt the request until completion or error, return. The problem is that there is no guaranteed time in which the client will be notified of the completion of the request.

The second method is trickier. The backend thread handles timeouts and sends them to the client thread. The backend then needs to track the NNTP transaction state so it can resume it and run the request to completion, tossing away whatever data was being returned. The benefit is that the client -will- get a message from the backend in the specified time period.

These approaches aren't mutually exclusive either. The first works better for article fetches where the isn't any code to try and monitor server performance and issue requests to servers that are responding quickly. I'm going to add that code in soon anyway. The second approach works great for the reader commands because they're either nice and quick, or they're extremely long-lived. Article replies are generally maxing out at a few megabytes. Overview commands can server back tens or hundreds of megabytes of database information and this can take time.

One of the important implications is when the client thread can be freed. In the first method, the client thread MUST stay around until all the pending article requests have been replied to in some fashion. In the second method, the client thread waits for a response to its message immediately after queuing it, so it doesn't have to reap queue events on connection shutdown.

The current crash bugs I've seen seem to be related to message queuing. I'm seeing both junk being dequeued from the client reader queue (when there should be NO messages pending in that queue once a command has been fully processed!) and I'm seeing article responses being sent to queues for clients which have been destroyed for one reason or another. I'm going to spend some time over the next few hours putting in assert()ions to track these conditions down and naff them on the head before the stack gets scrambled and I end up with a 4 gigabyte core which gives me absolutely no useful traceback. :P

Oh look, the application cored again, this time in the GNU malloc code! Time to figure out what is going on again..

Saturday, November 22, 2008

Updates!

A few updates!

I've fixed a few bugs in CACHEBOY_PRE which will be back-ported to CACHEBOY_1.5. This is in line with my current goal of stability before features. CACHEBOY_PRE and CACHEBOY_1.5 have passed all the polygraph runs I've been throwing at them and there aren't any outstanding stability issues in the Issue tracker.

I'll roll CACHEBOY_1.6.PRE3 and CACHEBOY_1.5.2 releases in the next day or two and get those out there.

Thursday, October 16, 2008

Serving IPv6 from Cacheboy-1.6.PRE2

I've done the very minimum amount of work required to get Cacheboy-1.6.PRE2 to the point where it'll handle IPv6 client requests. I've put it in front of http://www.cacheboy.net/ which now has v4 and v6 records.

There's still plenty of work to do to bring it up to par with the Squid-3 IPv6 support but that will have to wait a while. Specifically, (if anyone feels up to handling it), the dns, ipcache and fqdncache code all needs to be massaged to support IPv4 and IPv6 handling. It shouldn't be that much work.

Cacheboy-1.6 is definitely now in the "freeze and fix bugs as they creep up" stage. I'll continue the memory allocator and HTTP parser code reimplementation in their respective branches and get them ready for merge once I'm happy 1.6 is stable. The rest of the IPv6 support will also have to wait.

Friday, October 3, 2008

Cacheboy IPv6 update

I've made some progress in the IPv6 reorganisation in cacheboy. I've converted the ACL, authentication and ident code over to support v4/v6. I'm now going to convert over the client_db, request_t structure and then the related stuff like logging, x-forwarded-for, etc. I'll then revisit what else is required before I enable v6 sockets on the http client-side. It -should- be pretty minimal - persistent connections/connection pinning (for just assembling the hash key) and some SNMP code to just gloss over IPv6 connections for the time being.

Hm, I was hoping to have this all done by the end of September but I've been a bit busy with paid work. I'll hopefully have this done just after NYCBSDCON. I hope. :)

Sunday, September 21, 2008

IPv6 ACL code, sort of!

I'm just doing a spot of testing with my new IPv6 ACL code.

Take a look at this:


(adrian) agnus:~/work/cacheboy/playpen/ipv6_acl/tools% ./squidclient mgr:config@PASSWORD | grep acl
acl all src 0.0.0.0/0.0.0.0
acl all6 src6 ::/::
acl lclnet6 src6 fe80::/fff0::
acl test1 src6 2a01:348:147:5::/ffff:ffff:ffff:ffff::
acl test1 src6 fe80::/fff0::

That there is an IPv6 "src6" ACL (well, three) with somewhat unfriendly netmask display code. I'll tidy that up later. Importantly, the IPv6 code seems to be coming along fine. I'm going to generate up some large random IPv4 and IPv6 ACLs tomorrow to make sure they load in and display out from the splay tree fine, then I'll look at writing some test cases for all of this.

The last bit of code that needs converting before -very basic- client-side IPv6 support can be enabled is to convvert the ACL checklist struct "src_addr" and "my_addr" over to sqaddr_t IPv6 types. This will probably require a whole lot of horrible code changes but luckily I can convert most of them to just be "assign that an IPv4 address thx" and everything should just work as before. Although I need to remind myself to make sure aclMatchIp() checks the _type_ of the ACL its looking up against - doing an IPv4 lookup against an IPv6 splay tree won't really work out.

(Amos / Squid-3 have a single IPv6 "type" for this, and the IPv4 addresses are merged into the IPv6 address space. The ACL types for IP src/dst/myip is then -always- an IPv6 type lookup. I decided to keep seperate IPv4/IPv6 ACL types for now to make testing and development easier. It will double up on the ACL sizes a little - holy crap, I'm doing something less efficient then Squid-3?!? - but thats a small price to pay at the moment for an easier to migrate codebase. Basically, if you compile this up and listen on an IPv6 address, but don't configure an IPv6 ACL, you won't get surprised when IPv6 requests are let through when they shouldn't..)

Friday, September 5, 2008

Cacheboy-1.5: IPv6 DNS servers

I'm just debugging the last couple of issues with the IPv6-aware UDP/TCP DNS code. The Internal DNS resolver still only understands IPv4 code (and, more importantly, the ipcache/fqdncache layer too!) but the code itself will communicate with IPv4/IPv6 DNS servers.

I think I'll stop the development here and concentrate on getting the Cacheboy-1.5 release out the door. I'll then work on IPv6 record resolution in a seperate branch in preparation for Cacheboy-1.6. I may even break out the ipcache/fqdncache code into external libraries so I can reuse/debug/test that code during development.

Tuesday, September 2, 2008

Upcoming Cacheboy-1.5.PRE3 development release

(Yes, I've been slack in posting about this stuff.)

I'm just about to roll the next Cacheboy-1.5 development pre-release. Cacheboy-1.5 is probably the last "almost but not quite squid-2.HEAD" release. Besides the IPv6 core, Cacheboy-1.5 resembles the Squid code but with a more sensible layout of modules and libraries.

Its main difference is the inclusion of core comm layer changes to support IPv6 in preparation of IPv6 client and server support. This particular pre-release includes some changes to the internal DNS code to decouple it from a few routines in src/ relating to TCP socket connection. Its possible I've busted stuff - just run cacheboy with "debug_options ALL,1 78,2" for a while to see if you're falling back to TCP DNS properly.

I'm about to put Cacheboy-1.5.PRE3 in production for a couple of clients to get some real world feedback.