Monday, June 29, 2009
Saturday, June 27, 2009
Friday, June 26, 2009
Here's the Lusca HEAD install under FreeBSD-7.2 + TPROXY patch. This is a basic configuration with minimal customisation. There's ~ 600gig of data in the cache and there's around 5TB of total disk storage.
You can see when I turned on half of the users, then all of the users. I think there's now around 10,000 active users sitting behind this single Lusca server.
Tuesday, June 23, 2009
I'm using the patches and ipfw config available at http://tproxy.no-ip.org/ .
The latest Lusca fixes some of the method_t related crashes due to some work done last year in Squid-2.HEAD. It seems quite stable now. The bugs only get tickled with invalid requests - so they show up in production but not with local testing. Hm, I need to "extend" my local testing to include generating a wide variety of errors.
Getting back on track, I've also helped another Lusca user deploy full transparency using the TPROXY4 support in the latest Linux kernel (I believe under Debian-unstable?) He helped me iron out some of the bugs which I've just not seen in my local testing. The important ones (method_t in particular) have been fixed; he's been filing Lusca issues in the google code tracker so I don't forget them. Ah, if all users were as helpful. :)
Anyway. Its nice to see Lusca in production. My customer should be turning it on for their entire satellite link (somewhere between 50 and 100mbit I think) in the next couple of days. I believe the other user has enabled it for 5000 odd users. I'll be asking them both for some statistics to publish once the cache has filled and has been tuned.
Stay tuned for example configurations and tutorials covering how this all works. :)
Wednesday, June 17, 2009
And the geoip summary:
From Sun Jun 7 00:00:00 2009 to Sun Jun 14 00:00:00 2009
Tuesday, June 16, 2009
The following is a snapshot of the per destination AS traffic information I'm keeping.
If you're peering with any of these ASes and are willing to sponsor a cacheboy node or two then please let me know. How well I can scale things at this point is rapidly becoming limited to where I can push traffic from, rather than anything intrinsic to the software.
From Sun Jun 7 00:00:00 2009 to Sun Jun 14 00:00:00 2009
|Time||Site||ASN||MBytes||Requests||% of overall|
|AS3320||602465.01||1021975||3.26||DTAG Deutsche Telekom AG|
|AS7132||583164.05||778259||3.16||SBIS-AS - AT&T Internet Services|
|AS19262||459322.30||603127||2.49||VZGNI-TRANSIT - Verizon Internet Services Inc.|
|AS3215||330962.95||553299||1.79||AS3215 France Telecom - Orange|
|AS3269||317534.06||333114||1.72||ASN-IBSNAZ TELECOM ITALIA|
|AS9121||259768.32||434932||1.41||TTNET TTnet Autonomous System|
|AS22773||244573.65||283427||1.32||ASN-CXA-ALL-CCI-22773-RDC - Cox Communications Inc.|
|AS12322||224708.25||343686||1.22||PROXAD AS for Proxad/Free ISP|
|AS3352||206093.84||305183||1.12||TELEFONICADATA-ESPANA Internet Access Network of TDE|
|AS812||204120.74||166633||1.10||ROGERS-CABLE - Rogers Cable Communications Inc.|
|AS8151||198918.22||328632||1.08||Uninet S.A. de C.V.|
|AS6327||197906.53||152861||1.07||SHAW - Shaw Communications Inc.|
|AS3209||191429.18||303787||1.04||ARCOR-AS Arcor IP-Network|
|AS20115||182407.09||225151||0.99||CHARTER-NET-HKY-NC - Charter Communications|
|AS577||181167.02||152383||0.98||BACOM - Bell Canada|
|AS12874||172973.42||108429||0.94||FASTWEB Fastweb Autonomous System|
|AS6389||165445.73||236133||0.90||BELLSOUTH-NET-BLK - BellSouth.net Inc.|
|AS6128||165183.07||210300||0.89||CABLE-NET-1 - Cablevision Systems Corp.|
|AS2856||164332.96||219267||0.89||BT-UK-AS BTnet UK Regional network|
Query content served: 5234195.61 mbytes; 6878234 requests (ie, what was displayed in the table.)
Total content served: 18473721.25 mbytes; 26272660 requests (ie, the total amount of content served over the time period.)
Saturday, June 13, 2009
I'm now actively looking for some more Cacheboy CDN nodes in the United States and Canada. I've got around 3gbit of available bandwidth in Europe, 1gbit of available bandwidth in Japan but only 300mbit of available bandwidth in North America.
I'd really, really appreciate a couple of well-connected North American nodes so I can properly test the platform and software that I'm building. The majority of traffic is still North American in destination; I'm having to serve a fraction of it from Sweden and the United Kingdom at the moment. Erk.
Please drop me a line if you're interested. The node requirements are at http://www.cacheboy.net/node_requirements.html . Thankyou!
Friday, June 12, 2009
The changes I've made to the Lusca load shedding code (ie, being able to disable it :) works well for this workload. Migrating the backend to lighttpd (and fixing up the ETag generation to be properly consistent between 32 bit and 64 bit platforms) fixed the initial issues I was seeing.
The network pushed out around 850mbit at peak. Not a lot (heck, I can do that on one CPU of a mid-range server without a problem!) but it was a good enough test to show that things are working.
I need to teach Lusca a couple of new tricks, namely:
- It needs to be taught to download at the fastest client speed, not the slowest; and
- Some better range request caching needs to be added.
The former isn't too difficult - that is a weekend 5 line patch. The latter is more difficult. I don't really want to shoehorn in range request caching into the current storage layer. It would look a lot like how Vary and Etag is currently handled (ie, with "magical" store entries acting as indexes to the real backend objects.) I'd rather put in a dirtier hack that is easy to undo now and use the opportunity to tidy up the whole storage layer a whole lot. But the "tidying up" rant is not for this blog entry, its for the Lusca development blog.
The hack will most likely be a little logic to start downloading full objects that aren't in the cache when their first range request comes in - so subsequent range requests for those objects will be "glued" to the current request. It means that subsequent requests will "stall" until enough of the object is transferred to start satisfying their range request. The alternative is to pass through each range request to a backend until the full object is transferred and this would improve initial performance but there's a point where the backend could be overloaded with too many range requests for highly popular objects and that starts affecting how fast full objects are transferred.
As a side note, I should probably do up some math on a whiteboard here and see if I can model some of the potential behaviour(s). It would certainly be a good excuse to brush up on higher math clue. Hm..!
Thursday, June 11, 2009
In theory, once all of the caching stuff is fixed, the backends will spend most of their time revalidating objects.
But for some weird reason I'm seeing TCP_REFRESH_MISS on my Lusca edge nodes and generally poor performance during this release. I look at the logs and find this:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:188.8.131.52) Gecko/2009042316 Firefox/3.0.10\r\n
If-Modified-Since: Wed, 03 Jun 2009 15:09:39 GMT\r\n
[HTTP/1.0 200 OK\r\n
Last-Modified: Wed, 03 Jun 2009 15:09:39 GMT\r\n
Date: Fri, 12 Jun 2009 04:25:40 GMT\r\n
X-Cache: MISS from mirror1.jp.cacheboy.net\r\n
Via: 1.0 mirror1.jp.cacheboy.net:80 (Lusca/LUSCA_HEAD)\r\n
Notice the different ETags? Hm! I wonder whats going on. On a hunch I checked the Etags from both backends. master1 for that object gives "1721454571"; master2 gives "1687308715". They both have the same size and same timestamp. I wonder what is different?
Time to go digging into the depths of the lighttpd code.
EDIT: the etag generation is configurable. By default it uses the mtime, inode and filesize. Disabling inode and inode/mtime didn't help. I then found that earlier lighttpd versions have different etag generation behaviour based on 32 or 64 bit platforms. I'll build a local lighttpd package and see if I can replicate the behaviour on my 32/64 bit systems. Grr.
Meanwhile, Cacheboy isn't really serving any of the mozilla updates. :(
EDIT: so it turns out the bug is in the ETag generation code. They create an unsigned 32-bit integer hash value from the etag contents, then shovel it into a signed long for the ETag header. Unfortunately for FreeBSD-i386, "long" is a signed 32 bit type, and thus things go airy from time to time. Grrrrrr.
EDIT: fixed in a newly-built local lighttpd package; both backend servers are now doing the right thing. I'm going back to serving content.
Tuesday, June 2, 2009
mirror2.uk is thanks to UK Broadband, who have graciously given me access to a few hundred megabits of traffic and space on an ESX server.
mirror3.uk (due to be turned up today!) is thanks to a private donor named Alex who has given me a server in his colocation space and up to a gigabit of traffic.
Shiny! Thanks to you both.