Showing posts with label usb. Show all posts
Showing posts with label usb. Show all posts

Tuesday, October 13, 2015

Fixing up the RTL8188SU/RTL8192SU 802.11bgn driver (rsu)

I recently figured out most of the missing pieces for 11n support and stability with the rsu driver in FreeBSD. This handles support for the RTL8188SU and RTL8192SU chipsets. I'll cover what I found and fixed in this post.

First off - the driver was in reasonably poor shape. It sometimes paniced when the NIC was removed, it didn't support 802.11n at all and it wouldn't associate reliably. I was pestered enough by one of the original users behind getting the driver ported (Idwer! Hi!) and decided I'd give fixing it up a go.

Importantly - it's a mostly real fullmac device. "fullmac" here means that the firmware on the device does almost everything interesting - it handles association, it can do encryption/decryption for you if you want, it'll handle retransmission and transmit rate control. There are some important things it doesn't do - I'll cover those shortly.

Here's a fun bit of trivia - this firmware outputs text debugging via a magic firmware notification, and it's on by default. This made all of the debugging much, much easier as I didn't have to guess so much about what was going on in the firmware. All firmware developers - please do this. Please!

I first looked at the association issue. The device does full scan offload - you send it a firmware command to start scanning and it'll return scan results as they come in. Plenty of firmware devices do this. Then you send it an association message, then a join_bss message. For those looking at the source - rsu_site_survey() starts the scan, and rsu_join_bss() attempts an association. Now, I noticed that it was sending a join message before the site survey finished. I also noticed that I never really received any management frames, and when I used a sniffer to see what was going on, I saw double-associations sometimes occuring.

I then checked OpenBSD. Their driver just stubbed out the management frame transmit routine. This wasn't done in FreeBSD, so I added it. It turns out the firmware here does all management frame transmit and receive, so I just plainly have to do none of it. This tidied things up a bit but it didn't fix association.

Next up - the whole way scan results were pushed into net80211 was wrong. Sometimes scan results ended up on the wrong channel. The driver was doing dirty things to the current channel state directly and then faking a beacon to the net80211 stack. I replaced that with some code I wrote for the 7260 wifi driver - the stack now accepts a channel (and other things!) as part of the receive frame, so you can do proper off-channel frame reception. This tidied up the scan results so they were now consistent.

Then I thought about an evil hack - how about delaying the call to rsu_join_bss() until after the survey finished. That worked - associations were now very reliable.


Now the device associated reliably and worked okay. There were some missing bits for the firmware setup path for doing things like power saving, saying how many transmit/receive streams are available, etc, but those were easy to add. Next up - 802.11n.

On the receive side, 802.11n requires you to do A-MPDU reordering as the transmitter is free to retransmit failed frames out of order. But the net80211 stack only handled the case where it saw the management frames and it itself drove the A-MPDU negotiation. Here, the firmware drives the negotiation and just tells us what's just happened. So, I had to extend net80211 to be told what the A-MPDU parameters are. It turned out that yes, the firmware sends a notification about A-MPDU going up, but it doesn't tell you how big the block-ack window is. Sigh. So, I needed to add that.

But the access point still wasn't negotiating it. Here was the next fun bit - join rsu_join_bss() it lets the stack assemble optional IEs to send to the access point and, the more interesting part, it looks at said IEs for an idea of what its own configuration should be. I added the HTINFO IE and voila! It started negotiating 802.11n.

(Oh, and I had to add M_AMPDU to each RX'ed frame from an 802.11n node before I called net80211, or the receive code would never do A-MPDU reorder processing.)

The final hack - I stubbed out the A-MPDU TX negotiation so we would never attempt to do it. So yes, there's no TX aggregation support, but that's fine for now.

Then Idwer told me it wasn't working for him. After much digging with the Linux driver authors (Thanks Christian and Larry!) we found that the OpenBSD driver tried to program the chip directly for 40MHz mode and that's wrong - instead, I just missed one of the 802.11n IEs. The firmware looked into that to see what the channel setup should've been. Two lines of diff later and I was on at 40MHz wide modes.

Finally - stability. It turns out that the USB drivers do inconsistent things when it comes to the detach path. They're supposed to stop transmit/receive, then flush buffers which flushes the net80211 node references, and then tear down the net80211 interface. Some, eg if_rsu, were doing it the other way. I fixed if_rsu and if_urtwn - they're now both stable.

Thursday, October 1, 2015

As requested: progress of AR9170

Hi!

The progress of the AR9170 FreeBSD-ification can be found here:

https://github.com/erikarn/otus

Yes, I did actually keep the history of the driver bring-up here.

Wednesday, September 30, 2015

Porting a wifi driver from openbsd - AR9170

I told myself a long, long time ago that I really don't want to be working on USB wireless. It's not that I dislike USB or wireless; it's just that the hoops required to get it all working in a stable way is quite a bit to keep in your mind. But, I decided recently that it's about time I learnt how it worked and I was very sad that we still didn't have any working USB wifi devices that also operated with 802.11n.

So, I picked a NIC and dove in.

I picked if_rsu(4) - it's the RTL8188SU / RTL8192SU series hardware from Realtek. It turned out I chose reasonably well.

First off - it's a "fullmac" device - meaning that outside of a handful of things, the device firmware offloads a lot of the 802.11 complications. The driver does hardware initialisation and the wireless stack speaks WPA/WPA2/etc for negotiating encryption, but the hardware handles scanning, authentication, 802.11n aggregation negotiation and most management frame work.

Secondly - it's ported from OpenBSD. The OpenBSD folk do a good job of getting drivers up and running, but there tend to be some sharp edges and the 802.11n bits just don't work.

So, besides currently doing encryption in software, the rsu(4) driver behaves rather well. I'll write a separate article about that. This article is about the AR9170, or otus(4) driver in FreeBSD/OpenBSD parlance.

Now, the AR9170 is a ZyDAS device with an Atheros 802.11n PHY and radio. It's quite a hybrid beast. It's also buggy - there are issues with QoS frames and 802.11n aggregation that make it impossible to behave well. So, for now I'm treating it like a 11abg device and I'll worry about 802.11n when someone gives me patches to make it work.

The OpenBSD driver is based on the initial otus driver that Atheros provided to the Linux developers circa 2009. The firmware blob is closed and very old - the ar9170fw project is still out there on the internet (and I have a mirror at https://github.com/erikarn/ar9170-fw) but I can't get it to build on a recent FreeBSD install so a firmware update will take time. But, it does seem to work.

There are a few pieces to think about when porting a USB driver. The biggest piece is that it's not memory mapped IO or IO port based - everything is a message. There are USB device control commands you can send which will sleep until they're done, but the majority of stuff is done using bulk transmit and receive endpoints and that's all conveniently asynchronous. But it complicates things in the driver world.

Memory mapped and IO port drivers treat device IO as this magical "I do it, then the next instruction executes when it's done" mostly serialised paradigm. It's a lie, of course - the intel x86 CPUs will pretend things are occuring in a specific order, but a lot of platforms require you to mark memory as uncached or use memory / cache flush operations to ensure things go out to the device in any particularly controlled manner. But USB doesn't - outside of USB control transfers, USB devices tend to look like remote network devices and this includes register accesses. Now, the RTL8188SU driver (rsu(4)) implements the firmware upload and register accesses using control transfers, so it's all pretty easy to get the driver initialisation and attaching working before you care about the asynchronous parts. But the AR9170 driver implements register accesses as firmware commands - and so I have to get a lot more of the stack up and working first.

So, here's what I did.

First up - I commented out almost all of the device driver, and focused on getting the probe, attach and detach methods working. That wasn't too hard. But yes, almost all the code was commented out.

Next up was firmware loading. This was done using control transfers, so I didn't have to worry about implementing the bulk transmit and receive endpoint handling. I had to convert the firmware load path to the FreeBSD firmware API rather than the OpenBSD API, but that was mostly trivial.

Then I realised I wasn't doing any driver locking - so yes, I ensured I did the bare minimum of driver locking required to stop the kernel panicing. OpenBSD doesn't use locks, they use old style BSD spl() levels.

Next up was command transmit and receive. Now, I needed to setup the USB endpoints - which FreeBSD makes really easy to do using a structure to define what endpoints are what. It was pretty clean. The complicated bit is the bulk callback - it handles transfer statuses and transfer initiation. This is the bit that took me a little time to wrap my head around.

The USB stuff handles things in-sequence. Everything going to an endpoint here gets handled in the sequence you queue it. It also will process the bulk callback in a single worker thread taskqueue, rather than the driver author having to worry about creating their own worker threads. So, this is what
you end up doing:

  • The bulk callback has three states: USB_ST_TRANSFERRED, USB_ST_SETUP, and everything else (error.)
  • USB_ST_TRANSFERRED says "I've finished a transfer".
  • USB_ST_SETUP says "I've been asked to initiate a transfer."
  • Any driver thread starts a transmit by calling usbd_transfer_start() on the usb_xfer struct, which will kick off a call into the bulk callback with USB_ST_SETUP.
  • So, the driver has to maintain its own queues of "pending", "active" and "waiting" transactions. "pending" is the queue to put outbound transmit messages on. "active" is the queue you put messages that you've submitted when USB_ST_SETUP is called. When USB_ST_TRANSFERRED or an error is called, you pop off the top entry from "active" and you finish with it, then you fall through to USB_ST_SETUP to start a new transfer.
It's a little complicated because you have to maintain your own submission queues in/out of the USB stack, but in practice it's just a linked list.

So, I stole the framework from rsu(4) for buffer management, transmit submission and completion. It submitted things fine. I also registered buffers for receive, and .. nothing happened. I would send a PING message to the firmware to see if it was awake, and I'd get nothing from the receive pipe.

Then I remembered an interesting bug from when I tried this in 2012 - the AR9170 firmware required the IRQ endpoint to be setup, even though no interrupt messages were ever posted. So, I set the endpoint up, started reception on it.. and now I started to see receive messages. My PING messages were being PONG'ed.

But here's the first complication - although everything is asynchronous here, a lot of places want to send a command and wait for a response. For the PING command it's waiting for a matching PONG response. For setting frequency, starting calibration, etc, you get back interesting status from the firmware. But for things like register read commands, you have to wait until you get the register value back before you can continue. We need to be able to put the caller to sleep until the response comes back, or some timeout occurs.

So, cmd_otus() submits a transfer buffer and then will msleep() on it for up to a second, waiting for a response. When a command is transmitted, a couple of things can occur:
  • Once the transfer succeeds, if the command needs no response then we just send a wakeup to notify the sender that we've sent it, and we free the buffer.
  • If the transfer succeeds but the command needs a response, then we put it on the "waiting" queue.
Then in the receive path we pull out firmware notifications and if they're responses we copy the response into the callers provided buffer, call wakeup() to wake up the caller, and free the buffer.

OpenBSD cheats - it only has one single outstanding command buffer for all threads to use.

The tricky, unimplemented bit here is error handling - if I yank out a NIC during active commands then the driver will sleep for a second, wakeup with an error and pass an error back. But, the rest of the driver doesn't know anything was sleeping, so state gets freed from underneath it. I need to go and add what OpenBSD does - refcount when the driver is entered from say, the transmit and ioctl paths, and then upon detach just wait for pending things to finish before freeing.

Ok, so that got command transmit/receive and sleep/wake notification working okay. Next up is packet reception and basic initialization. That was mostly the same - the same hardware bits are needed, the same 802.11 packet format is needed for the stack. The main differences here were in the OpenBSD versus FreeBSD net80211 interface layout - FreeBSD has vaps (virtual access points, etc) but OpenBSD does not. It's still pre-vap work, so there's only one interface. This required a little bit of splitting to put the vap bits in vap routines, and driver bits in the driver. The notable exceptions are vap_create, vap_destroy and newstate.

Next up was realising OpenBSD is also still driving 802.11 state from the driver, not from the net80211 stack. FreeBSD drives the state changes and tells the driver what to do. That required me undoing some manual state transitions (eg otus_init() setting the state to SCAN or RUN depending upon the interface mode) and just letting net80211 do it.

So, net80211 created a vap, called otus_init(), then brought up the interface, set the initial vap state to SCAN via a call to newstate and started changing channels. This worked fine. I had some locking concerns - check the driver to see what I did. It was pretty straightforward.

And then yes - because the receive path was pretty simple and I got straight 802.11 frames back, yes, I started seeing beacons in a tcpdump session. This was great.

Then I ripped up a bunch of callback code that isn't needed. A few years ago FreeBSD's USB drivers maintained their own taskqueue to defer things like crypto key setting, state changes and such. Now net80211 has a per-device taskqueue that it runs these things on, and a lot of the driver calls are done as deferred tasks. OpenBSD doesn't have this so the drivers create their own deferred task and async callback framework to schedule these. It's duplicated work and I removed all of that from the driver.

Next up is transmit. This is trickier for a few reasons.

First, FreeBSD doesn't use if_start() anymore, with network stack provided queues. I have to maintain my own queue and free net80211 node references as appropriate. It took a while to craft up a correctly behaving transmit side when I fixed rsu(4), so I just stole it for the AR9170. I'll describe that in a subsequent article about rsu.

FreeBSD's net80211 stack handles 802.11 encapsulation itself; we're not handed ethernet frames unless we ask for them. So, I don't call ieee80211_encap(). Yes, I do call for software encryption as required, and that was done.

The biggest sticking point is the rate control. FreeBSD's net80211 stack has a reasonable implementation of transmit rate control modules and it's per vap and per associated node. I don't have to do anything too manual for it. OpenBSD did a bunch of manual work to do the AMRR setup/teardown/updating, so I had to rip it out and call the ratectl init/destroy methods in the vap create/destroy methods.

Next up was what ni->ni_txrate represented. In OpenBSD it seems like an index into the rate control table. In FreeBSD it's the 802.11 rate to use! So, I ripped out a bunch of rate table stuff in the driver and replaced it with a couple of mapping functions to go 802.11 rate to AR9170 hardware rate. That worked like a charm, and transmit works fine.

The last annoying thing with transmit is how the firmware tells us about failed frames. We don't get a completion message upon each frame - the later firmware does this, but the original blob doesn't. We only get told upon retries and errors. So, I hacked up something where the transmit path counts outbound packets, the RX command path counts retries/errors, and each time I transmit a packet I update net80211 with the transmit/retry/error counts. This works pretty well.

Finally - teardown. The correct order for teardown is:
  • Shut down the MAC - eg, disable TX/RX DMA, etc
  • Disable the USB transfers, wait until they're done
  • Free the transmit/receive buffers and any net80211 node references they may have; and
  • then call ieee80211_ifdetach() to ensure vaps and the top level interface is destroyed.
The initial port called ieee80211_ifdetach() too early and the subsequent node references would refer to now-freed nodes and vaps, causing lots of hilarity.

And that's that. I haven't made 802.11n work; I haven't fixed up the radiotap support so received 802.11 packets in tcpdump actually provide the right rate/channel/etc. That's all details that I'll do when i feel like it. But, the driver is stable, there aren't any lock ordering issues that I've seen so far, and it actually behaves remarkably well.

Tuesday, April 30, 2013

A FAQ about today's FSF release

I've had a few people ask me some questions. There's also been a few questions on slashdot. I'll update this article as more questions come in.

Was it all me?

No, I didn't do the bulk of the work. Luis did the bulk of the legal hoop-jumping and review process at work. I grabbed it near the end of this process (so he could move onto other things) and shepherded the process of getting things ready for open sourcing.

I encouraged some external developers from the community to come on board and help in the initial effort to get it to compile and work correctly under the open source Tensilica toolchain rather than the internal toolchain.

I've fixed a few bugs here and there - eg the RX path TSF bugs that stopped the NICs from working in Mesh mode, along with some other fallout issues from the toolchain migration.

I wanted the bulk of the work to come from the community rather than me. I don't want to be the only person working on this. Thankfully I'm not! There's an active community now!

I'll likely do a bunch more development in the firmware code once I get it working on FreeBSD!

Why is it only one device? Why is it so expensive? Why that device?

You'll have to ask the FSF that.

How different is this to the non-USB stuff?

Like a lot of manufacturers, Atheros reuses its CPU and Wifi cores everywhere they can.

The AR7010 designs have an external AR9280 or AR9287 NIC. This is exactly the same as a mini-PCIe design - the same chip, speaking PCIe, etc.

The AR9271 design is a single chip solution (see below) with an AR9285 NIC internally. I don't know whether internally it speaks PCIe or whether they just glued the NIC onto the AHB like they do for other integrated CPU+SoC designs (eg the AR913x, AR933x, AR934x, etc.)

But once you get past the USB and CPU parts, it looks exactly the same as the PCIe devices Atheros driver developers know and love.

Just keep in mind the main difference - the wifi part doesn't DMA directly to/from your computer memory. It has to go via buffer RAM on the AR7010 core in order to then send or receive it via USB endpoints.

What about the other NICs? The AR7010 based ones?

The AR7010 based ones are precursors to the single-chip solution that the FSF is selling a NIC for (the AR9271.) The AR7010 has USB on one side and PCIe on the other. It runs effectively the same firmware as the AR9271 NIC, save for some different ROM addresses, memory map and some other little differences.

The AR7010 based devices are thus "just as free" as the AR9271 NIC the FSF is selling.

I'm not sure if the FSF is going to certify an AR7010 design. I hope they can find a dual-band AR7010+AR9280 ath9k_htc NIC and sell that as part of their open hardware programme.

What is this AR9271 anyway? Why is it only 1x1 and 2GHz only?

The AR9271 is a single chip solution containing:
  • An AR7010 style core, with minor differences
  • Some RAM and ROM (but less RAM than the AR7010; no I don't know why.)
  • An AR9285 derivative (which is the 2GHz, 1x1 chip.)
Like a lot of things that manufacturers do, it's a "cost savings" design for a specific market. Even now, laptop and tablet manufacturers want to skimp on 5GHz NIC designs in order to save some cash. No, I don't know why. No, I can't quote costs.

How can I help?

Download the firmware, download a linux-next or compat-wireless tarball - or, run OpenBSD + athn for now - compile stuff up and hack away.