Monday, June 29, 2009

Current Downtime/issues

There's a current issue with content not being served correctly. It stemmed from a ZFS related panic on one of the backend servers (note to self - update to the very latest FreeBSD-7-stable code; these are all fixed!) which then came up with lighttpd but no ZFS mounts. Lighttpd then started returning 404's.

I'm now watching the backend(s) throw random connection failures and the Lusca caches then cache an error rather than the object.

I've fixed the backend giving trouble so it won't start up in that failed mode again and I've set the negative caching in the Lusca cache nodes to 30 seconds instead of the default 5 minutes. Hopefully the traffic levels now pick up to where its supposed to be.

EDIT: The problem is again related to the Firefox range requests and Squid/Lusca's inability to cache range request fragments.

The backend failure(s) removed the objects from the cache. The problem now is that the objects aren't re-entering the cache because they are all range requests.

I'm going to wind down the Firefox content serving for now until I get some time to hack up Lusca "enough" to cache the range request objects. I may just do something dodgy with the URL rewriter to force a full object request to occur in the background. Hm, actually..

2 comments:

  1. Hi Adrian. We've actually done exactly that - we use an URL rewriter to farm off a tiny UDP request to a "Fetcher server".

    It then decides if the request is already queued for "full download" and then farms the work off to a WGET process that fetches the full file.

    Works well enough....

    ReplyDelete
  2. Yes I may just do that hack for the short short term.

    The first hack I want to add to Lusca is the option to have downloads happen at the fastest speed possible instead of the slowest speed possible. Having them download at the slowest speed possible may keep your disk system happier but it certainly screws up everyone elses downloads. :)

    ReplyDelete