Friday, May 31, 2013

Trying to implement A-MSDU support, or "what happens when things aren't done in the right order."

I've been looking at implementing A-MSDU support in net80211. This has crept up for a few reasons:


  • It means you can do basic MSDU aggregation without needing the full block-ack window mechanics;
  • It allows you to do aggregation of very small frames into one larger MPDU that you can then stuff into an A-MPDU;
  • I want to leverage it for TDMA.
So, the background.

A-MSDU is where you take a bunch of MSDUs and shovel them into a single, larger MPDU. They're all destined to the same end-node and they all are transmitted/retransmitted together. There's a single sequence number assigned to the A-MSDU, so the hardware will have to retransmit it as a single unit.

Which is great for things like TCP ACKs, which are tiny and waste a lot of airtime. If you can shovel a bunch of them into a larger A-MSDU and then transmit that inside an A-MPDU, you get a double bonus - your A-MPDU sub-frame sizes are large, so you won't hit any "minimum frame size" limits that various chips may have.

And for TDMA it's an instant win - I can just use A-MSDU with no ACK for now and achieve 11n throughput with minimal effort. All I have to do is write the A-MSDU support and I'll get 11n throughput for free with TDMA.

So all I have to do is write A-MSDU support, right? ... right?

It turns out that from an architectural standpoint, it's a pain in the ass to write.

For A-MPDU it's easy. You can just do it in the driver. It looks like a series of individual MPDUs that are already 802.11 encapsulated. So you can just buffer those in the driver and transmit/retransmit them. For net80211 (FreeBSD) and mac80211 (Linux) this is great - both stacks pass an already-encapsulated 802.11 MPDU to the driver.

But for A-MSDU it's a bit hairier. We're not aggregating already encapsulated 802.11 frames - we're encapsulating multiple 802.3 frames together. Which means the stack itself has to glue together a bunch of 802.3 frames into an A-MSDU, then pass that as an MPDU to the driver.

That bit isn't hard.

The first bit - figuring out what the maximum A-MSDU size is. Now, the naive solution is just to aggregate as much as you can to the 7935 byte maximum A-MSDU size boundary, then transmit that. Great - except that there are regulatory limits and QoS limits on how long an individual frame can take to transmit. So when you create an A-MSDU, you actually want to limit the size.

Now, what do you limit it to? It has to be based on how long it'll take to transmit, so here's the tricky bit - you already need to have made your transmit rate decision first. If you haven't made that decision, you can't calculate how long it'll take to transmit the frame.

For the short term I'm going to just ignore this and write the A-MSDU support for FreeBSD in net80211 and aggregate up to the maximum limit. It's good enough to do basic testing of the feature itself. But I do need to add that maximum frame limit for another reason: QoS.

With QoS, I have a specific slot time transmit opportunity limit. I have to do two things:
  • Not exceed the slot time entirely by scheduling a frame so large it will actually exceed the transmit opportunity limit, and
  • Schedule any other subsequent frames in order to try and "fill out" the rest of the transmit opportunity window.
This is really important for TDMA. Say I have an 8ms slot window but can only transmit 4ms at a time due to regulatory concerns. If I can schedule a 4ms long A-MSDU then great, that's what I'll do. But say this happens:
  • I have a 8ms long window;
  • My first frame is a long one, and it takes 4ms;
  • I then have five 0.5ms long frames afterwards.
What I don't want to do is create an A-MSDU with all of those frames. I want to create a 4ms frame by adding in two 0.5ms frames to that 3ms frame, or transmit the 3ms frame followed by four 0.5ms long frames. I don't want to aggregate the second set of smaller frames into a 4ms long frame and have it be "too long" to fit into the rest of the transmit opportunity.

So, things get hairy.

But wait, there's more.

What about handling frame re-transmission? If you use the Atheros (or others, for that matter) hardware frame re-transmission, you can have the hardware re-transmit the frame for you at various rates, starting with the highest one and then trying slower ones. You now have a similar issue - what if the frame is within your transmit opportunity at the fastest rate, but not at the smallest rate? What the FreeBSD and Linux atheros driver does for A-MPDU is pretty stupid - it uses hardware re-transmission at slower rates, but limits the A-MPDU size to not exceed the maximum transmit length (4ms) at the slowest rate.

I'd rather have it never use hardware multi-rate retransmission and just step down to a slower rate. It'll re-calculate the maximum length and re-aggregate frames. It's fine, it'll be slightly less efficient but it'll work.

But for A-MSDU, it is done in the stack rather than the driver. So imagine this:
  • You've buffered a bunch of 802.3 frames into a staging area, to put into an A-MSDU;
  • You make a transmit rate selection and that limits how big your A-MSDU is;
  • You assemble the A-MSDU and pass that MPDU down to the driver;
  • The driver tries transmitting it and fails, so you should retransmit it as a lower rate;
  • .. except now you really don't know if it didn't make it to the remote end or not. Did you fail to hear the ACK, or?
Now comes the tricky bit. All you know at this point is that it didn't ACK. You don't know whether it didn't transmit or whether you didn't hear the ACK (and the receiver did actually receive it, ACK it and push it back up to the network layer.)

If you retransmit the MSDU at a lower rate, the receiver can eliminate a duplicate received frame by just looking at the sequence number and seeing it already has seen it, eliminate it.

But if you retransmit it at a lower rate that exceeds the TXOP window size, you will be breaking QoS requirements. Your hardware (eg Atheros with the right bit enabled!) may even just flat out refuse to transmit the frame, returning it as failed because it automatically failed to fit into the transmit opportunity window. So, what do you do?

If you retransmit it at a lower rate, it's going to automatically fail. What about pulling apart the A-MSDU into two sets of MSDUs, then treating it as two A-MSDUs to transmit? That way both will fit into the maximum transmit duration at the given transmit rate.

The problem here is you don't know that the receiver did actually not hear the A-MSDU. All you do know is you didn't hear the ACK. So if you do this, and assign new sequence numbers to the two new A-MSDUs, it's quite possible that the receiver will hear the old A-MSDU and the two new A-MSDUs, and pass duplicated frames up to the network layer.

So, the TL;DR version here - we either form A-MSDUs for software retransmit that can be retransmitted by the lower rates (biting some inefficiency but allowing for retransmission), or you just absolutely do fail the transmit and not retry.

So, it's complicated. Complicated and annoyingly messy.