Wednesday, July 15, 2020

I'm back into the grind of FreeBSD's wireless stack and 802.11ac

hi!

Yes, it's been a while since I posted here and yes, it's been a while since I was actively working on FreeBSD's wireless stack. Life's been .. well, life. I started the ath10k port in 2015. I wasn't expecting it to take 5 years, but here we are. My life has changed quite a lot since 2015 and a lot of the things I was doing in 2015 just stopped being fun for a while.

But the stars have aligned and it's fun again, so here I am.

Here's where things are right now.

First up - if_run. This is the Ralink (now mediatek) 11abgn USB driver for stuff that they made before Mediatek acquired them. A contributor named Ashish Gupta showed up on the #freebsd-wifi IRC channel on efnet to start working on 11n support to if_run and he got it to the point where the basics worked - and I took it and ran with it enough to land 20MHz 11n support. It turns out I had a couple of suitable NICs to test with and, well, it just happened. I'm super happy Ashish came along to get 11n working on another NIC.

The if_run TODO list (which anyone is welcome to contribute to):

  • Ashish is looking at 40MHz wide channel support right now;
  • Short and long-GI support would be good to have;
  • we need to get 11n TX aggregation working via the firmware interface - it looks like the Linux driver has all the bits we need and it doesn't need retransmission support in net80211. The firmware will do it all if we set up the descriptors correctly.

net80211 work


Next up - net80211. So, net80211 has basic 11ac bits, even if people think it's not there. It doesn't know about MU-MIMO streams yet but it'll be a basic 11ac AP and STA if the driver and regulatory domain supports it.

However, as I implement more of the ath10k port, I find more and more missing bits that really need to be in net80211.

A-MPDU / A-MSDU de-encapsulation


The hardware does A-MPDU and A-MSDU de-encapsulation in hardware/firmware, pushing up individual decrypted and de-encapsulated frames to the driver. It supports native wifi and 802.3 (ethernet) encapsulation, and right now we only support native wifi. (Note - net80211 supports 802.3 as well; I'll try to get that going once the driver lands.)

I added support to handle decryption offload with the ath10k supplied A-MPDU/A-MSDU frames (where there's no PN/MIC at all, it's all done in firmware/hardware!) so we could get SOME traffic. However, receive throughput just plainly sucked when I last poked at this. I also added A-MSDU offload support where we wouldn't drop the A-MSDU frames with the same receive 802.11 sequence number. However...

It turns out that my mac was doing A-MSDU in A-MPDU in 11ac, and the net80211 receive A-MPDU reordering was faithfully dropping all A-MSDU frames with the same receive 802.11 sequence number. So TCP would just see massive packet loss and drop the throughput in a huge way. Implementing this feature requires buffering all A-MSDU frames in an A-MPDU sub-frame in the reordering queue rather than tossing them, and then reordering them as if they were a single frame.

So I modified the receive reordering logic to reorder queues of mbufs instead of mbufs, and patched things to allow queuing multiple mbufs as long as they were appropriately stamped as being A-MSDUs in a single A-MPDU subframe .. and now the receive traffic rate is where it should be (> 300mbit UDP/TCP.) Phew.


U-APSD support


I didn't want to implement full U-APSD support in the Atheros 11abgn driver because it requires a lot of driver work to get it right, but the actual U-APSD negotiation support in net80211 is significantly easier. If the NIC supports U-APSD offload (like ath10k does) then I just have to populate the WME QoS fields appropriately and call into the driver to notify them about U-APSD changes.

Right now net80211 doesn't support the ADD-TS / DEL-TS methods for clients requesting explicit QoS requirements.

Migrating more options to per-VAP state


There are a bunch of net80211 state which was still global rather than per-VAP. It makes sense in the old world - NICs that do things in the driver or net80211 side are driven in software, not in firmware, so things like "the current channel", "short/long preamble", etc are global state. However the later NICs that offload various things into firmware can now begin to do interesting things like background channel switching for scan, background channel switching between STA and P2P-AP / P2P-STA. So a lot of state should be kept per-VAP rather than globally so the "right" flags and IEs are set for a given VAP.

I've started migrating this state into per-VAP fields rather than global, but it showed a second shortcoming - because it was global, we weren't explicitly tracking these things per-channel. Ok, this needs a bit more explanation.

Say you're on a 2GHz channel and you need to determine whether you care about 11n, 11g or 11b clients. If you're only seeing and servicing 11n clients then you should be using the short slot time, short preamble and not require RTS/CTS protection to interoperate with pre-11n clients.

But then an 11g client shows up.

The 11g client doesn't need to interoperate with 11b, only 11n - so it doesn't need RTS/CTS. It can use short preamble and short slot time still. But the 11n client need to interoperate, so it needs to switch protection mode into legacy - and it will do RTS/CTS protection.

But then, an 11b client shows up.

At this point the 11g protection kicks in; everyone does RTS/CTS protection and long preamble/slot time kicks in.

Now - is this a property of a VAP, or of a channel? Technically speaking, it's the property of a channel. If any VAP on that channel sees an 11b or 11g client, ALL VAPs need to transition to update protection mode.

I migrated all of this to be per-VAP, but I kept the global state for literally all the drivers that currently consume it. The ath10k driver now uses the per-VAP state for the above, greatly simplifying things (and finishing TODO items in the driver!)


ath10k changes


And yes, I've been hacking on ath10k too.

Locking issues


I've had a bunch of feedback and pull requests from Bjorn and Geramy pointing out lock ordering / deadlock issues in ath10k. I'm slowly working through them; the straight conversion from Linux to FreeBSD showed the differences in our locking and how/when driver threads run. I will rant about this another day.

Encryption key programming


The encryption key programming is programmed using firmware calls, but net80211 currently expects them to be done synchronously. We can't sleep in the net80211 crypto key updates without changing net80211's locks to all be SX locks (and I honestly think that's a bad solution that papers over non-asynchronous code that honestly should just be made asynchronous.) Anyway, so it and the node updates are done using deferred calls - but this required me to take complete copies of the encryption key contents. It turns out net80211 can pretty quickly recycle the key contents - including the key that is hiding inside the ieee80211_node. This fixed up the key reprogramming and deletion - it was sometimes sending garbage to the firmware. Whoops.


What's next?


So what's next? Well, I want to land the ath10k driver! There are still a whole bunch of things to do in both net80211 and the driver before I can do this.

Add 802.11ac channel entries to regdomain.xml


Yes, I added it - but only for FCC. I didn't add them for all the other regulatory domain codes. It's a lot of work because of how this file is implemented and I'd love help here.


Add MU-MIMO group notification


I'd like to make sure that we can at least support associating to a MU-MIMO AP. I think ath10k does it in firmware but we need to support the IE notifications.

Block traffic from being transmitted during a node creation or key update


Right now net80211 will transmit frames right after adding a node or sending a key update - it assumes the driver is completing it before returning. For software driven NICs like the pre-11ac Atheros chips this holds true, but for everything USB and newer firmware based devices this definitely doesn't hold.

For ath10k in particular if you try transmitting a frame without a node in firmware the whole transmit path just hangs. Whoops. So I've fixed that so we can't queue a frame if the firmware doesn't know about the node but ...

... net80211 will send the association responses in hostap mode once the node is created. This means the first association response doesn't make it to the associating client. Since net80211 doesn't yet do this traffic buffering, I'll do it in ath10k- I'll buffer frames during a key update and during node addition/deletion to make sure that nothing is sent OR dropped.

Clean up the Linux-y bits


There's a bunch of dead code which we don't need or don't use; as well as some compatibility bits that define Linux mac80211/nl80211 bits that should live in net80211. I'm going to turn these into net80211 methods and remove the Linux-y bits from ath10k. Bjorn's work to make linuxkpi wifi shims can then just translate the calls to the net80211 API bits I'll add, rather than having to roll full wifi methods inside linuxkpi.


To wrap up ..


.. job changes, relationship changes, having kids, getting a green card, buying a house and paying off old debts from your old hosting company can throw a spanner in the life machine. On the plus side, hacking on FreeBSD and wifi support are fun again and I'm actually able to sleep through the night once more, so ... here goes!

If you're interested in helping out, I've been updating the net80211/driver TODO list here: https://wiki.freebsd.org/WiFi/TodoStuff . I'd love some help, even on the small things!


13 comments:

  1. Welcome back! I love your coding style !

    ReplyDelete
  2. Much, much thanks for the work you're putting into this.

    ReplyDelete

  3. I would like to help you out. Let me know where to start off.

    ReplyDelete
    Replies
    1. I replied to someone else here who asked - there's a wiki page on the freebsd wifi stack that we're trying to get into better shape with a todo list. But you should just grab freebsd-head, grab a supported wifi device and try using it! Find what's not quite right and let's fix it!

      Delete
  4. Just wanted to give you words of encouragement! Thanks for all the hard work you do to make open source firmware awesome. I recently delved deeper into some firmware code and realized just how much work it can be to get things to a semi working state.

    ReplyDelete
  5. This seems really interesting and I'd like to help! Please let me know how to get started

    ReplyDelete
    Replies
    1. hi! We have a list of things in our wiki - https://wiki.freebsd.org/WiFi - there's a todo section that I'm trying to sort into more useful tasks.

      But basically - grab something that's supported and try to make it better. :-)

      Delete
  6. Could you explain Linux-y code by example?

    ReplyDelete
    Replies
    1. So there are a couple of areas.

      first up - the nl80211/mac80211 bits are just different than the ye olde net80211/madwifi layout. mac80211 has a lot more stuff pushed into a deferred path, like crypto key management - but net80211 doesn't. it makes writing stuff for firmware nics hard.

      next up is the structure of locking and wakeups. on linux, you can grab a lock and disable your software irq at the same time - ensuring that you're not going to contend locks with the irq path. FreeBSD doesn't do this (outside of linuxkpi); so you end up with a bunch of lock contention that kills performance. It means you can't "just" port a Linux driver to BSD and hope the performance is the same; you need to sit down and make sure the locking is efficient.

      Next - the tasklet abstraction. On freebsd (again outside of linuxkpi) waking up a tasklet wakes it up on the same CPU you're currently on, so you know it won't be scheduled to run until after you finish your current work. On FreeBSD a taskqueue can run on any CPU by default, so if you schedule a taskqueue entry it'll immediately run if there's a free CPU. This can also lead to fun lock contention and also there's some cases where you need to add/modify locking.

      Then there's the increasing use of the lock-free algorithms which require that not only you try to do the same thing, but all the consumers do the same thing. mac80211 makes a lot of use of their lock-free bits, and net80211/freebsd doesn't. they kinda have to line up or you end up with behavioural differences that can be difficult to debug.

      I can keep going.. :-)

      Delete
  7. why make an effort to improve FreeBSD, vs linux, vs dfly, vs netBSD? Why have you decided that FreeBSD is the fork to follow?
    I've been struggling with deciding which way to to focus my efforts. Why FreeBSD? DragonflyBSD has a lot of features that seem to address your linux-y considerations?

    Please keep going, and Thanks :-)

    ReplyDelete