Wednesday, April 21, 2010

Modifying the HTTP header parser in Lusca

I've been slowly working towards a variety of medium-term goals in Lusca. I've resisted committing various bits of partially finished work partly because they're works in progress but partially because I'm not happy with how the code fits together.

One of these areas is the HTTP header parser and management routines. Among other things, the main issues I have with the parser and management code is listed below.
  • Each header entry is represented by separate strings in memory;
  • Each header entry has a small, separately allocated object (HttpHeaderEntry), one per header
  • Parsing the header entries uses various stdio routines to iterate over characters, and these may be implemented slower (to handle unicode/wide/UTF/locale support) than what's needed here (7-bit ASCII);
  • There's some sanity checks in the header parser - specifically, duplicate content-length - which is likely better once the headers have been parsed.
I've been working on the first two items in separate branches. One converts the HttpHeaderEntry items into a single allocated array, which is grown if needed. Another takes the current String API and turns it into fully reference-counted strings. Both of these work fine for me. But shoe-horning it into the current HTTP parser code - which expects individually allocated/created HttpHeaderEntry items which it can destroy on a whim before they're considered a part of the Http Header set - is overly hackish and prone to introduce bugs.

It's taking me quite a bit of time to slowly change the HTTP parser code to be ready for the new management code. Well, it's taken me about 6 months to slowly modify it in a way that doesn't require rewriting everything and potentially changing expected behaviour and/or introduce subtle bugs.

The upshoot? Things take time, but the code hopefully will be tidier, cleaner and easier to understand. Oh, and won't include bugs.

Saturday, April 3, 2010

State of the Cygwin/Windows port!

A rather nice chap has been ploughing through the source and making it work under Windows/Cygwin. I've been committing bits and pieces of his work into LUSCA_HEAD as time permits.

You can find the main port details in Issue 94.

Thanks!

Friday, April 2, 2010

Hunting down method_t bugs..

It all started with Issue 99. There was a random crash in the logging code. It looked like bug in the method handling changes which made it into Squid-2.HEAD a year or two ago. I've been patching issues in the method handling - specifically with NULL and uninitialised method pointers appearing in places - but this time the method_t pointed to junk data.

A bit of digging found that the pointer value did point to a valid method_t structure instance - but something free'd it. Hm. A little further digging found what was going on:
  1. A METHOD_OTHER appeared (an RTSP method) which resulted in a new method_t being malloc'ed;
  2. The pointer was copied to the request_t structure;
  3. The request was processed;
  4. The initial method_t pointer was freed, but the request_t method pointer still pointed to it;
  5. The logging code then logged the stuff said request_t method pointer pointed to - but it was already free'd. Sometimes it'd be junk, sometimes it'd be the original contents.
The original method code (and the "known" methods) all throw around pointers - and copies of pointers - to statically allocated structures which never go away. Unfortunately this logic wasn't changed when the dynamic "other" methods appeared.

So I've been quite busy tidying up the method handling code in preparation for the change in how they're handled. LUSCA_HEAD now has some code which logs potential memory leaks when handling the dynamic methods. I'm going to see if I can come up with a way (or two) to log potential risky situations when items are dereferenced after being free'd. But hopefully I can fix the issue without introducing any further bugs.