Monday, June 11, 2012

A tale of two sequence numbers, or "when QoS seqno and CCMP PN don't match up"..

Many moons ago (say, 3 or 4 weeks - so hm, most-of-a-moon-ago actually) I found a rather curious failure condition in the ath(4) TX aggregation path. The colourful history is documented in FreeBSD kern/166190. In short - there are situations where sequence numbers were allocated in a different order to how frames were being added to the block-ack window tracking, and if you got unlucky, you'd cause the stack to think a frame was (far) outside the BAW.

The 30 second explanation:

Imagine you allocated four frames - sequence numbers 1, 2, 3 and 4. They have to be added to the block-ack window in precisely that order. Ie:

  1. Starting condition: Window is at 0:63 (64 frame window, starting at 0, so ending at 63)
  2. Add 1: Window is now at 0:63, starting at 1
  3. Add 2: Window is now at 0:63, starting at 2
  4. Add 3: Window is now at 0:63, starting at 3.
The reason the window pointer isn't moving along is because although you've sent the frames (or you're about to), you can't advance it until the other end has ACKed it (via a block-ack or a normal ACK.) For more information, google how 802.11n aggregation works.

The important bit here is that the window is still 0:63 and the starting point is now '3'. This continues all the way to trying to queue frame 64, where it will be outside of the current BAW and not be allowed to be transmitted. It'll sit in the software queue and wait until frame '0' has been ACKed and the BAW has been advanced to be 1:64 - at which point 63 will fall inside the window and will be transmitted.

So yes, the sender is tracking two things - the BAW and what the starting point is that they've added to the BAW.

Now, imagine instead of (1, 2, 3, 4) on the software queue, I somehow get preempted (or race between two sending threads, when using SMP) between 'allocated seqno' and 'queue to software queue'. In the existing code, a lock was held when:
  • Allocating a sequence number, then it was dropped; then
  • Adding it to the software queue.
Now because there was a period where no lock was being held, it's quite possible that what ends up on the software queue is (2, 1, 3, 4.) So:
  1. Starting condition: Window is at 0:63
  2. Add 2: Window is now 0:63, starting at 2.
  3. Add 1: Window is 0:63, starting at 2; 1 is outside of the BAW (it's treated as a 'wraparound', so imagine it's 4095 seqno's away) so TX stalls.
This was the cause of the TX stalls that I was seeing originally in kern/166190. I "fixed" it by only allocating sequence numbers when the frame was about to be transmitted for the first time, and then adding it to the BAW right there. Since both sequence number allocation and adding to the BAW happened inside the same lock, everything was sweet.

Except, I totally forgot about CCMP PN. So under high enough UDP TX loads (say, > 200MBit), I'd hit the same race, but between 802.11 sequence numbers and CCMP PN sequence numbers.

CCMP PN is assigned during 802.11 encapsulation time, in the driver. In the ath(4) case, it's done during transmit and before being queued to the software queue. And it was being done outside of any locking.  So it's very possible that frames would end up on the software queue with 802.11 and CCMP PN sequence numbers out of lock-step.

What would happen?

Simply - after the 802.11n reordering occured on the receive side, the CCMP PN replay detector would notice sequence numbers out of order, and start tossing said out of order frames. Lots of packet loss ensued.

So, I sat down and started trying to address it. The simplest thing - wrap the whole encapsulation path between ieee80211_crypto_encap(), 802.11 sequence number assignment and software/hardware queueing behind the TID (well, hardware TXQ) lock. It took some time; I had to revert two earlier commits which introduced the delayed sequence number allocation.

This didn't fix things. So I was back to square one.

I started looking at all the places where the frames were being queued to the software queue and .. well, let's just say I spent Sunday swearing _at myself_ for all the weird and wonderful stupid mistakes I had made when writing/porting this code over.

The short version follows (the long version is "read the sys/dev/ath/ commit logs and the PR history"):
  1. When I was queueing frames to the software queue, I'd check how deep the hardware queue was. If the hardware queue was shallow/empty, I'd direct dispatch up to two frames to the hardware to get things 'busy'. That will (hopefully) let further frames come along in the meantime and be aggregated. However, I was queueing the new frame to the hardware rather than queueing the new frame to the tail of the queue, and queueing the head frame of the queue to the hardware. That led to some out of order behaviour.
  2. ath_tx_xmit_aggr() would check if the sequence number was within the block-ack window and if it wasn't, it'd queue the frame to the tail of the queue. This meant that any new frames that came along would be queued to the end of the queue, even if they had been dequeued from the head of the queue. This lead to frames on the software queue being out of order.
    1. Frames on the software queue don't have to be in-sequence (as retries are prepended to the beginning of the list, and new frames are appended to the end) however they have to be in-order. If they end up being out of order, the BAW logic fails.
So, now that I allocate sequence numbers at packet queue time, I have to be triply sure that what ends up on the software queue is correctly in order, or the BAW logic will cause traffic stalls and potentially duplicate sequence number issues. Yes, this means that the old behaviour, whilst it now works right with all the right locking, requires me to correctly handle putting frames on the software queue. (Or, as I like to say, "keeping the bastard (me) honest.")

TL;DR - 802.11n aggregation works again. Now, to fix those pesky "queue full and I want to send a BAR frame so I can unblock the full queue and transmit" problems. At least that one is more tractable and easier to solve. Or is it.

No comments:

Post a Comment