Wednesday, July 8, 2009

Storage rebuilding / logging project - proposal

I've put forward a basic proposal to the fledgling Lusca community to get funding to fix up the storage logging and rebuilding code.

Right now the storage logging (ie, "swap.state" logging) is done using synchronous IO and this starts to lag Lusca if there is a lot of disk file additions/deletions. It also takes a -long- time to rotate the store swap log (which effectively halts the proxy whilst the logs are rotated) and an even longer time to rebuild the cache index at startup.

I've braindumped the proposal here - .

Now, the good news is that I've implemented the rebuild helper programs and the results are -fantastic-. UFS cache dirs will still take forever to rebuild if the logfile doesn't exist or is corrupt but the helper programs speed this up by a factor of "LOTS". It also parallelises correctly - if you have 15 disks and you aren't hitting CPU/bus/controller limits, all the cache dirs will rebuild at full speed in parallel.

Rebuilding from the log files takes seconds rather than minutes.

Finally, I've sketched out how to solve the COSS startup/rebuild times and the even better news is that fixing the AUFS rebuild code will give me about 90% of what I need to fix COSS.

The bad news is that integrating this into the Lusca codebase and fixing up the rebuild process to take advantage of this parallelism is going to take 4 to 6 weeks of solid work. I'm looking for help from the community (and other interested parties) who would like to see this work go in. I have plenty of testers but nothing to help -coding- along and I unfortunately have to focus on projects that provide me with some revenue.

Please contact me if you're able to help with either coding or funding for this.

No comments:

Post a Comment