Wednesday, May 23, 2012

Fixing BAR handling and handle corner cases of things..

Part of the general 802.11n requirements is correctly handling TX aggregation and failures - where we need to pause the TX queue, send a BAR to forcibly move the remote end block-ack window to be in sync with the transmitters idea, and then continue sending frames.

This exposed a very annoying problem - what if the driver runs out of ath_buf entries to schedule TX frames? Or, what if the network stack runs out of mbufs? If we need to allocate an ath_buf/mbuf to send a BAR frame, but they're all allocated and unavailable, the driver/wireless stack will come to a grinding halt. Typically these allocated ath_buf's are allocated in the software queue, waiting for the BAR TX (or power-save wakeup) to send a frame.

So, I haven't fixed this. It's on my (very) short term to-do list. But it did expose some issues in how the net80211 BAR send code (ieee80211_send_bar()) works. In short - it didn't handle resource allocation failures at all. It worked fine if the driver send method (ic->ic_raw_xmit()) succeeded and just failed to TX the frame. But if it couldn't allocate an mbuf, or if the driver send method failed.. things just stopped. And when the BAR TX just stopped, the ath(4) software TX queue would just keep buffering frames, right until all the TX ath_buf entries were consumed.

This is obviously .. sub-optimal.

But this raises an interesting point - how much of your kernel and/or userland application handle resource shortages correctly? I've seen plenty of userland software just not check whether malloc() returned NULL and I've seen some that specifically terminate (non-gracefully) if malloc()/calloc() fails - Squid does this. But what about your network stack? How's it handle mbuf shortages? What about the driver stack? What about net80211 (ew) ? What if the kernel malloc() API has to sleep because there's no free memory available?

I don't (currently) have an answer - it's a difficult, cross-discipline problem. What I -can- do though (at least in my corner of the FreeBSD world - net80211 and ath(4)) is to start testing some of these corner cases, where I  force some resource shortages and ensure that the wireless stack and driver(s) recover somewhat gracefully. 802.11n is very unforgiving if you start dropping frames involved in an active aggregation session. So it's best I try and address these sooner rather than later.

1 comment:

  1. Hi Adrian,

    great job with ath driver, way better than my experience wtih madwifi/ath5k on Linux. I need a bit of help on squeezing more speed out of my card, let me know what way of contact you prefer.
    The issue: I have an Atheros PCIE card based on 9287 chip. The card is installed on a FreeBSD 9.0 box with a kernel updated to HEAD yesterday. The card connects and I get a transfer rate of max 3MBytes/sec. I know this is a good but I'm searching ways to increase this speed. As I see with ifconfig the card is connecting at maximum 54Mbps on a HT/20 channel. The AP is a Cisco EPC 3925 that has 1 stream only and a max transfer rate of 150Mbps. The one thing I see weird is that the 9287 will not connect on HT/40, this should increase the max speed. How can I get it to connect to HT/40? On Cisco router this is enabled and my laptop with Intel 5300AGN can connect at 144Mbps to the Cisco router.

    thanks for helping,