The problem stems from the disk IO interface inherited from Squid. With Squid, there's no explicit cancel-and-wait-for-cancel to occur with both the network and disk IO code, so the async disk IO read code would actually allocate its own read buffer, read into that, and then provide said read buffer to the completion callback to copy said read data out of. If the request is cancelled but the worker thread is currently read()'ing data, it'll read into its own buffer and not a potentially free()'d buffer from the owner. Its a bit inefficient but in the grand scheme of Squid CPU use, its not that big a waste on modern hardware.
In the short term, I'm going to re-jig the async IO code to not zero buffers that are involved in the aioRead() path. In the longer term, I'm not sure. I prefer cancels which may fail - ie, if an operation is in progress, let it complete, if not then return immediately. I'd like this for the network code too, so I can use async network IO threads for less copy network IO (eg FreeBSD and aio_read() / aio_write()); but there's significant amounts of existing code which assumes things can be cancelled immediately and assumes temporary copies of data are made everywhere. Sigh.
Anyway - grr'ing aside, fixing the pointless zero'ing of buffers should drop the CPU use for large file operations reasonably noticably - by at least 10% to 15%. I'm sure that'll be a benefit to someone.