Now that I'm trying to use more userland services, there are some obvious shortcomings which need addressing.
The first is a lack of real service management. FreeBSD doesn't have a service management daemon - the framework assumes that daemons implement their own background and monitoring. It would be much nicer if init or something similar to init could manage services and start/restart them where appropriate. Yes, this is one of those arguments for systemd. Eg, maybe I want to only start telnetd or dropbear/sshd whenever a connection comes in. But I'd also like to be able to add services for monitoring, such as dnsmasq and hostapd.
The next is a lack of suitable syslog daemon. Yes, I'd like to be able to log some messages locally - even if it's only a couple hundred kilobytes of messages. I'd also like to be able to push messages to a remote service. Unfortunately the FreeBSD syslog daemon doesn't do log rotation or maximum log file sizes itself - it's done by "newsyslog" which runs out of cron. This isn't any good for real embedded systems with limited storage.
Then yes, there's a lack of a cron service. It'd be nice to have that integrated into the service management framework so things could be easily added/removed. I may just use cron, but that means cron is also always running which adds memory footprint (~1.3 megabytes) for something that is almost never actually active. When you have 32MB of RAM, that's quite a bit of wasted memory.
Finally, there's a lack of some message bus and notification mechanism for device changes. For example, openvpn-client creates a tunnel device - ok, so what should then check to see if a NAT configuration needs updating? Or updated firewall rules? It can be done with shell scripts (which I'll write tomorrow) but ideally there'd be something like dbus (a dirty word, I know) where these systems could push updates to and events could be triggered from them. I'd like to be able to run ntpdate whenever an interface comes up, because yes, there is no RTC on this hardware.
With all of the above in mind, I'll start working on some of it tomorrow. Hopefully I can automate the openvpn NAT configuration a little bit more so I can optionally have wifi NAT or openvpn NAT, depending upon the current requirement. Fixing ntpdate to run out of dhclient as part of the 'up' script may be helpful. I'll see what else I can do to tidy things up before I start the process of merging all of this back into freebsd-wifi-build.
At least this year I can now use the defcon wireless with all of my devices.
Thursday, August 4, 2016
Wednesday, August 3, 2016
Musings on bringing up services on freebsd-wifi MIPS devices, or "why cross compiling is important"
FreeBSD has run on these MIPS routers for quite some time, but it was limited to what ships in base. There's not been any cross-built packages as part of the image building, which means we can't easily have third-party functionality.
Now, some of this third-party functionality is pretty important these days. Relying on telnet sucks; I'd like to have dropbear as an SSH server so we at least have SSH. Not having a DNS relay or DHCP server also sucks; dnsmasq would solve this problem. I'd also like some VPN services, so openvpn would be nice.
So, I eventually snapped a few months ago and started integrating some external toolchain compiler use with the freebsd-wifi-build scripts. bapt@freebsd did a whole lot of work to build ports of cross-compilers to be used by the FreeBSD ports and base system so I'm leveraging that for doing MIPS cross compiling. A bit of hacking later, and I'm cross compiling dnsmasq, dropbear, openvpn and lua.
Then I needed to integrate things. I wrote up a bunch of simple startup script wrappers to generate suitable config files for these services. Everything except the openvpn server/client configuration is in the rc.conf file, which will eventually make it much easier to turn into a configuration database.
OpenVPN was the most amusing. It cross compiled fine, save needing liblzo for compression (so that's disabled for now.) However, FreeBSD's openvpn package is version 2.3 but the easyrsa component is actually from 3.0 - which means all the documentation is out of date.
I used this for the OpenVPN config:
https://airvpn.org/topic/15096-verify-error-depth1-errorcertificate-is-not-yet-valid-using-router-with-tomato-shibby-firmware/
And this for easyvpn:
https://community.openvpn.net/openvpn/wiki/EasyRSA3-OpenVPN-Howto
And digitalocean have a writeup for how to convert the config file into a combined config file and certification bundle:
https://www.digitalocean.com/community/tutorials/how-to-configure-and-connect-to-a-private-openvpn-server-on-freebsd-10-1
A few things tripped me up:
Now, some of this third-party functionality is pretty important these days. Relying on telnet sucks; I'd like to have dropbear as an SSH server so we at least have SSH. Not having a DNS relay or DHCP server also sucks; dnsmasq would solve this problem. I'd also like some VPN services, so openvpn would be nice.
So, I eventually snapped a few months ago and started integrating some external toolchain compiler use with the freebsd-wifi-build scripts. bapt@freebsd did a whole lot of work to build ports of cross-compilers to be used by the FreeBSD ports and base system so I'm leveraging that for doing MIPS cross compiling. A bit of hacking later, and I'm cross compiling dnsmasq, dropbear, openvpn and lua.
Then I needed to integrate things. I wrote up a bunch of simple startup script wrappers to generate suitable config files for these services. Everything except the openvpn server/client configuration is in the rc.conf file, which will eventually make it much easier to turn into a configuration database.
OpenVPN was the most amusing. It cross compiled fine, save needing liblzo for compression (so that's disabled for now.) However, FreeBSD's openvpn package is version 2.3 but the easyrsa component is actually from 3.0 - which means all the documentation is out of date.
I used this for the OpenVPN config:
https://airvpn.org/topic/15096-verify-error-depth1-errorcertificate-is-not-yet-valid-using-router-with-tomato-shibby-firmware/
And this for easyvpn:
https://community.openvpn.net/openvpn/wiki/EasyRSA3-OpenVPN-Howto
And digitalocean have a writeup for how to convert the config file into a combined config file and certification bundle:
https://www.digitalocean.com/community/tutorials/how-to-configure-and-connect-to-a-private-openvpn-server-on-freebsd-10-1
A few things tripped me up:
- as mentioned before - the lack of freebsd openvpn documentation that works with easyvpn 3.0 made things frustrating;
- openvpn really wants valid system time, so I am going to have to run ntpdate when the WAN comes up;
- there's no RTC on many of these router boards, making time keeping very much reliant on NTP;
- kernel NAT works pretty well, but it needs interfaces to be up before you can add them. I'll have to add some scripts to openvpn-client to setup the NAT state once the link comes up so this stops being a problem;
Saturday, June 18, 2016
Debugging TDMA on the AR9380
So, it turns out that TDMA didn't work on the AR9380. I started digging into it a bit more with AR9380's in 5GHz mode and found that indeed no, it was just transmitting whenever the heck it wanted to.
The first thing I looked at was the transmit packet timing. Yes, they were going out at arbitrary times, rather than after the beacon. So I dug into the AR9380 HAL code and found the TX queue setup code just didn't know how to setup arbitrary TX queues to be beacon-gated. The CABQ does this by default, and the HAL just hard-codes that for the CAB queue, but it wasn't generic for all queues. So, I fixed that and tried again. Now, packets were exchanged, but I couldn't get more than around 1mbit of transmit throughput. The packets were correctly being beacon gated, but they were going out at very long intervals (one every 25ms or so.)
After a whole lot of digging and asking around, I found out what's going on. It turns out that the new TX DMA engine in the AR9380 treats queue gating slightly different versus previous chips. In previous chips you would see it transmit whatever it could, and then be gated until the next time it could transmit. As long as you kept poking the AR_TXE bit to re-start queue DMA it would indeed continue along transmitting whenver it could. But, the AR9380 TX DMA FIFO works differently.
Each queue has 8 TX FIFO descriptors, which can contain a list of frames or a single frame. For the CABQ I just added the whole list of frames in one hit and that works fine. But for the normal data paths it would push one frame into a TX DMA FIFO slot. If it's an A-MPDU aggregate then yes, it'd be a whole list of frames, but still a single PPDU. But for non-aggregate traffic it'd push a single frame in.
With this in mind, the TX DMA gating now works on FIFO slots, not just descriptor lists. That is, if you have the queue setup to gate on something (say a timer firing, like the beacon timer) then that un-gating is for a single FIFO slot only. If that FIFO slot has one PPDU in it then indeed it'll only burst out a single frame and then the rest of the channel burst time is ignored. It won't go to the next FIFO slot until the burst time expires and the queue is re-gated again. This is why I was only seeing one frame every 25ms - that's the beacon interval for two devices in a TDMA setup. It didn't matter that the queue had more data available - it ran out of data servicing a single TX FIFO slot and that was that.
So I did some local hacks to push more data into each TX FIFO slot. When I buffered things and only leaked out 32 frames at a time (which is roughly the whole slot time worth of large frames) then it indeed behaved roughly at the expected throughput. But there are bugs and it broke non-TDMA traffic. I won't commit it all to FreeBSD-HEAD until I figure out what's going on
There's also something else I noticed - there was some situation where it would push in a new frame and that would cause the next frame to go out immediately. I think it's actually just scheduling for the next gated burst (ie, it isn't doing multiple frames in a single burst window, but one every beacon interval) but I need to dig into it a bit more to see what's going on.
In any case, I'm getting closer to working TDMA on the AR9380 and later chips.
Oh, and it turns out that TDMA mode doesn't add some of the IEs to the beacon announcements - notably, no atheros fast-frames announcement. This means A-MSDUs or fast-frames aren't sent. I was hoping to leverage A-MSDU aggregation in its present state to improve things, even if it's just two frames at a time. Hopefully that'd double the throughput - I'm currently seeing 30mbit TX and 30mbit RX without it, so hopefully 60mbit with it.)
The first thing I looked at was the transmit packet timing. Yes, they were going out at arbitrary times, rather than after the beacon. So I dug into the AR9380 HAL code and found the TX queue setup code just didn't know how to setup arbitrary TX queues to be beacon-gated. The CABQ does this by default, and the HAL just hard-codes that for the CAB queue, but it wasn't generic for all queues. So, I fixed that and tried again. Now, packets were exchanged, but I couldn't get more than around 1mbit of transmit throughput. The packets were correctly being beacon gated, but they were going out at very long intervals (one every 25ms or so.)
After a whole lot of digging and asking around, I found out what's going on. It turns out that the new TX DMA engine in the AR9380 treats queue gating slightly different versus previous chips. In previous chips you would see it transmit whatever it could, and then be gated until the next time it could transmit. As long as you kept poking the AR_TXE bit to re-start queue DMA it would indeed continue along transmitting whenver it could. But, the AR9380 TX DMA FIFO works differently.
Each queue has 8 TX FIFO descriptors, which can contain a list of frames or a single frame. For the CABQ I just added the whole list of frames in one hit and that works fine. But for the normal data paths it would push one frame into a TX DMA FIFO slot. If it's an A-MPDU aggregate then yes, it'd be a whole list of frames, but still a single PPDU. But for non-aggregate traffic it'd push a single frame in.
With this in mind, the TX DMA gating now works on FIFO slots, not just descriptor lists. That is, if you have the queue setup to gate on something (say a timer firing, like the beacon timer) then that un-gating is for a single FIFO slot only. If that FIFO slot has one PPDU in it then indeed it'll only burst out a single frame and then the rest of the channel burst time is ignored. It won't go to the next FIFO slot until the burst time expires and the queue is re-gated again. This is why I was only seeing one frame every 25ms - that's the beacon interval for two devices in a TDMA setup. It didn't matter that the queue had more data available - it ran out of data servicing a single TX FIFO slot and that was that.
So I did some local hacks to push more data into each TX FIFO slot. When I buffered things and only leaked out 32 frames at a time (which is roughly the whole slot time worth of large frames) then it indeed behaved roughly at the expected throughput. But there are bugs and it broke non-TDMA traffic. I won't commit it all to FreeBSD-HEAD until I figure out what's going on
There's also something else I noticed - there was some situation where it would push in a new frame and that would cause the next frame to go out immediately. I think it's actually just scheduling for the next gated burst (ie, it isn't doing multiple frames in a single burst window, but one every beacon interval) but I need to dig into it a bit more to see what's going on.
In any case, I'm getting closer to working TDMA on the AR9380 and later chips.
Oh, and it turns out that TDMA mode doesn't add some of the IEs to the beacon announcements - notably, no atheros fast-frames announcement. This means A-MSDUs or fast-frames aren't sent. I was hoping to leverage A-MSDU aggregation in its present state to improve things, even if it's just two frames at a time. Hopefully that'd double the throughput - I'm currently seeing 30mbit TX and 30mbit RX without it, so hopefully 60mbit with it.)
Friday, May 27, 2016
Updating the broadcom driver part #2
In Part 1, I described updating the FreeBSD bwn(4) driver and adding some support for the PHY-N driver from b43. It's GPL, but it works, and it gets me over the initial hump of getting support for updated NICs and initial 5GHz operation.
In this part, I'll describe what I did to tidy up RSSI handling and bring up the BCM4322 support.
To recap - I ported over PHY-N support from b43, updated the SPROM handling in the bus glue (siba(4)), and made 11a OFDM transmission work. I was lucky - I chose the first 11n, non-MIMO NIC that Broadcom made which behaved sufficiently similarly to the previous 11abg generation. It was non-MIMO and I could run non-MIMO microcode, which already shipped with the existing firmware FreeBSD builds. But, the BCM4322 is a 2x2 MIMO device, and requires updated firmware, which brought over a whole new firmware API.
Now, bwn(4) handles the earlier two firmware interfaces, but not the newer one that b43 also supports. I chose BCM4321 because it didn't require firmware API changes and anything in the Broadcom siba(4) bus layer, so I could focus on porting the PHY-N code and updating the MAC driver to work. This neatly compartmentalised the problem so I wouldn't be trying to make a completely changed thing work and spending days chasing down obscure bugs.
The BCM4322 is a bit of a different beast. It uses PHY-N, which is good. It requires the transmit path setup the PLCP header bits for OFDM to work (ie, 11a, 11g) which I had to do for BCM4321, so that's good. But, it required firmware API changes, and it required siba(4) changes. I decided to tackle the firmware changes first, so I could at least get the NIC loaded and ready.
So, I first fixed up the RX descriptor handling, and found that we were missing a whole lot of RSSI calculation math. I dutifully wrote it down on paper and reimplemented it from b43. That provided some much better looking RSSI values, which made the NIC behave much better. The existing bwn(4) driver just didn't decode the RSSI values in any sensible way and so some Very Poor Decisions were made about which AP to associate to.
Next up, the firmware API. I finished adding the new structure definitions and updating the descriptor sizes/offsets. There were a couple of new things I had to handle for later chip revision devices, and the transmit/receive descriptor layout changed. That took most of a weekend in Palm Springs (my first non-working holiday in .. well, since Atheros, really) and I had the thing up and doing DMA. But, I wasn't seeing any packets.
So, I next decided to finish implementing the siba(4) bus pieces. The 4322 uses a newer generation power management unit (PMU) with some changes in how clocking is configured. I did that, verified I was mostly doing the right thing, and fired that up - but it didn't show anything in the scan list. Now, I was wondering whether the PMU/clock configuration was wrong and not enabling the PHY, so I found some PHY reset code that bwn(4) was doing wrong, and I fixed that. Nope, still no scan results. I wondered if the thing was set up to clock right (since if we fed the PHY the wrong clock, I bet it wouldn't configure the radio with the right clock, and we'd tune to the wrong frequency) which was complete conjecture on my part - but, I couldn't see anything there I was missing.
Next up, I decided to debug the PHY-N code. It's a different PHY revision and chip revision - and the PHY code does check these to do different things. I first found that some of the PHY table programming was completely wrong, so after some digging I found I had used the wrong SPROM offsets in the siba(4) code I had added. It didn't matter for the BCM4321 because the PHY-N revision was early enough that these SPROM values weren't used. But they were used on the BCM4322. But, it didn't come up.
Then I decided to check the init path in more detail. I added some debug prints to the various radio programming functions to see what's being called in what order, and I found that none of them were being called. That sounded a bit odd, so I went digging to see what was supposed to call them.
The first thing it does when it changes channel is to call the rfkill method with the "on" flag set on, so it should program on the RF side of things. It turns out that, hilariously, the BCM4322 PHY revision has a slightly different code path, which checks the value of 'rfon' in the driver state. And, for reasons I don't yet understand, it's set to '1' in the PHY init path and never set to '0' before we start calling PHY code. So, the PHY-N code thought the radio was already up and didn't need reprogramming.
Oops.
I commented out that check, and just had it program the radio each time. Voila! It came up.
So, next on the list (as I do it) is adding PHY-HT support, and starting the path of supporting the newer bus (bhnd(4)) NICs. Landon Fuller is writing the bhnd(4) support and we're targeting the BCM943225 as the first bcma bus device. I'll write something once that's up and working!
In this part, I'll describe what I did to tidy up RSSI handling and bring up the BCM4322 support.
To recap - I ported over PHY-N support from b43, updated the SPROM handling in the bus glue (siba(4)), and made 11a OFDM transmission work. I was lucky - I chose the first 11n, non-MIMO NIC that Broadcom made which behaved sufficiently similarly to the previous 11abg generation. It was non-MIMO and I could run non-MIMO microcode, which already shipped with the existing firmware FreeBSD builds. But, the BCM4322 is a 2x2 MIMO device, and requires updated firmware, which brought over a whole new firmware API.
Now, bwn(4) handles the earlier two firmware interfaces, but not the newer one that b43 also supports. I chose BCM4321 because it didn't require firmware API changes and anything in the Broadcom siba(4) bus layer, so I could focus on porting the PHY-N code and updating the MAC driver to work. This neatly compartmentalised the problem so I wouldn't be trying to make a completely changed thing work and spending days chasing down obscure bugs.
The BCM4322 is a bit of a different beast. It uses PHY-N, which is good. It requires the transmit path setup the PLCP header bits for OFDM to work (ie, 11a, 11g) which I had to do for BCM4321, so that's good. But, it required firmware API changes, and it required siba(4) changes. I decided to tackle the firmware changes first, so I could at least get the NIC loaded and ready.
So, I first fixed up the RX descriptor handling, and found that we were missing a whole lot of RSSI calculation math. I dutifully wrote it down on paper and reimplemented it from b43. That provided some much better looking RSSI values, which made the NIC behave much better. The existing bwn(4) driver just didn't decode the RSSI values in any sensible way and so some Very Poor Decisions were made about which AP to associate to.
Next up, the firmware API. I finished adding the new structure definitions and updating the descriptor sizes/offsets. There were a couple of new things I had to handle for later chip revision devices, and the transmit/receive descriptor layout changed. That took most of a weekend in Palm Springs (my first non-working holiday in .. well, since Atheros, really) and I had the thing up and doing DMA. But, I wasn't seeing any packets.
So, I next decided to finish implementing the siba(4) bus pieces. The 4322 uses a newer generation power management unit (PMU) with some changes in how clocking is configured. I did that, verified I was mostly doing the right thing, and fired that up - but it didn't show anything in the scan list. Now, I was wondering whether the PMU/clock configuration was wrong and not enabling the PHY, so I found some PHY reset code that bwn(4) was doing wrong, and I fixed that. Nope, still no scan results. I wondered if the thing was set up to clock right (since if we fed the PHY the wrong clock, I bet it wouldn't configure the radio with the right clock, and we'd tune to the wrong frequency) which was complete conjecture on my part - but, I couldn't see anything there I was missing.
Next up, I decided to debug the PHY-N code. It's a different PHY revision and chip revision - and the PHY code does check these to do different things. I first found that some of the PHY table programming was completely wrong, so after some digging I found I had used the wrong SPROM offsets in the siba(4) code I had added. It didn't matter for the BCM4321 because the PHY-N revision was early enough that these SPROM values weren't used. But they were used on the BCM4322. But, it didn't come up.
Then I decided to check the init path in more detail. I added some debug prints to the various radio programming functions to see what's being called in what order, and I found that none of them were being called. That sounded a bit odd, so I went digging to see what was supposed to call them.
The first thing it does when it changes channel is to call the rfkill method with the "on" flag set on, so it should program on the RF side of things. It turns out that, hilariously, the BCM4322 PHY revision has a slightly different code path, which checks the value of 'rfon' in the driver state. And, for reasons I don't yet understand, it's set to '1' in the PHY init path and never set to '0' before we start calling PHY code. So, the PHY-N code thought the radio was already up and didn't need reprogramming.
Oops.
I commented out that check, and just had it program the radio each time. Voila! It came up.
So, next on the list (as I do it) is adding PHY-HT support, and starting the path of supporting the newer bus (bhnd(4)) NICs. Landon Fuller is writing the bhnd(4) support and we're targeting the BCM943225 as the first bcma bus device. I'll write something once that's up and working!
Thursday, May 19, 2016
Updating the broadcom softmac driver (bwn), or "damnit, I said I'd never do this!"
If you're watching the FreeBSD commit lists, you may have noticed that I .. kinda sprayed a lot of changes into the broadcom softmac driver.
Firstly, I swore I'd never touch this stuff. But, we use Broadcom (fullmac!) parts at work, so in order to get a peek under the hood to see how they work, I decided fixing up bwn(4) was vaguely work related. Yes, I did the work outside of work; no, it's not sponsored by my employer.
I found a small cache of broadcom 43xx cards that I have and I plugged one in. Nope, didn't work. Tried another. Nope, didn't work. Oh wait - I need to make sure the right firmware module is loaded for it to continue. That was the first hiccup.
Then I set up the interface and connected it to my home AP. It worked .. for about 30 seconds. Then, 100% packet loss. It only worked when I was right up against my AP. I could receive packets fine, but transmits were failing. So, off I went to read the transmit completion path code.
Here's the first fun bit - there's no TX completion descriptor that's checked. There is in the v3 firmware driver (bwi), but not in the v4 firmware. Instead, it reads a pair shared memory registers to get completion status for each packet. This is where I learnt my first fun bits about the hardware API - it's a mix of PIO/DMA, firmware, descriptors and shared memory mailboxes. Those completion registers? Reading them advances the internal firmware state to read the next descriptor completion. You can't just read them for fun, or you'll miss transmit completions.
So, yes, we were transmitting, and we were failing them. The retry count was 7, and the ACK bit was 0. Ok, so it failed. It's using the net80211 rate control code, so I turned on rate control debugging (wlandebug +rate) and watched the hilarity.
The rate control code was never seeing any failures, so it just thought everything was hunky dory and kept pushing the rate up to 54mbit. Which was the exact wrong thing to do. It turns out the rate control code was only called if ack=1, which meant it was only notified if packets succeeded. I fixed up (through some revisions) the rate control notification path to be called always, error and success, and it began behaving better.
Now, bwn(4) was useful. But, it needs updating to support any of the 11n chipsets, and it certainly didn't do 5GHz operation on anything. So, off I went to investigate that.
There are, thankfully, three major sources of broadcom softmac information:
Now, there's some architectural things to know about these chips. Firstly, the broadcom hardware is structured (like all chips, really) with a bunch of cores on-die with an interconnect, and then some host bus glue. So, the hardware design can just reuse the same internals but a different host bus (USB, PCI, SDIO, etc) and reuse 90% of the chip design. That's a huge win. But, most of the other chips out there lie to you about the internal layout so you don't care - they map the internal cores into one big register window space so it looks like one device.
The broadcom parts don't. They expose each of the cores internally on a bus, and then you need to switch the cores on/off and also map them into the MMIO register window to access them.
Yes, that's right. There's not one big register window that it maps things to, PCI style. If you want to speak to a core, you have to unmap the existing core, map in the core you want, and do register access.
Secondly, the 802.11 core exposes MAC and PHY registers, but you can't have them both on at once. You switch on/off the MAC register window before you poke at the PHY.
Armed with this, I now understand why you need 'sys/dev/siba' (siba(4)) before you can use bwn(4). The siba driver provides the interface to PCI (and MIPS for an older Broadcom part) to probe/attach a SIBA bus, then enumerate all of the cores, then attach drivers to each. There's typically a PCI/PCIe core, then an 802.11 core, then a chipcommon core for the clock/power management, and then other things as needed (memory, USB, PCMCIA, etc.) bwn(4) doesn't attach to the PCI device, it sits on the siba bus as a child device.
So, to add support for a new chip, I needed to do a few things.
This meant that I would be adding GPLv2'ed code to bwn(4). So, I decided to dump it in sys/gnu/dev/bwn so it's away from the main driver, and make compiling it in non-standard. At some point yes, I'd like to port the brcmfmac PHYs to FreeBSD, but I wanted to get familiar with the chips and make sure the driver worked fine. Debugging /all/ broken and new pieces didn't sound like fun to me.
So after a few days, I got PHY-N compiling and I fired it up. I needed to add SPROM field parsing too, so I did that too. Then, the moment of truth - I fired it up, and it connected. It scanned on both 2G and 5G, and it worked almost first time! But, two things were broken:
There were two. Well, three, but two that broke everything.
Firstly, there's a "I'm 5GHz!" flag in the tx descriptor. I set that for 5GHz operation - but nothing.
Secondly, the driver tries a fallback rate if the primary rate fails. Those are hardcoded, same as the RTS/CTS rates. It turns out the fallback rate for 6MB OFDM is 11MB CCK, which is invalid for 5GHz. I fixed that, but I haven't yet fixed the 1MB CCK RTS/CTS rates. I'll go do that soon. (I also submitted a patch to Linux b43 to fix that!)
Thirdly, and this was the kicker - the PHY-N and later PHYs require more detailed TX setup. We were completely missing initializing some descriptor fields. It turns out it's also required for PHY-LP (which we support) but somehow the PHY was okay with that. Once I added those fields in, OFDM transmit worked fine.
So, a week after I started, I had a stable driver on 11bg chips, as well as 5GHz operation on the PHY-N BCM4321 NIC. No 11n yet, obviously, that'll have to wait.
In the next post I'll cover fixing up the RX RSSI calculations and then what I needed to do for the BCM94322MC, which is also a PHY-N chip, but is a much later core, and required new microcode with a new descriptor interface.
Firstly, I swore I'd never touch this stuff. But, we use Broadcom (fullmac!) parts at work, so in order to get a peek under the hood to see how they work, I decided fixing up bwn(4) was vaguely work related. Yes, I did the work outside of work; no, it's not sponsored by my employer.
I found a small cache of broadcom 43xx cards that I have and I plugged one in. Nope, didn't work. Tried another. Nope, didn't work. Oh wait - I need to make sure the right firmware module is loaded for it to continue. That was the first hiccup.
Then I set up the interface and connected it to my home AP. It worked .. for about 30 seconds. Then, 100% packet loss. It only worked when I was right up against my AP. I could receive packets fine, but transmits were failing. So, off I went to read the transmit completion path code.
Here's the first fun bit - there's no TX completion descriptor that's checked. There is in the v3 firmware driver (bwi), but not in the v4 firmware. Instead, it reads a pair shared memory registers to get completion status for each packet. This is where I learnt my first fun bits about the hardware API - it's a mix of PIO/DMA, firmware, descriptors and shared memory mailboxes. Those completion registers? Reading them advances the internal firmware state to read the next descriptor completion. You can't just read them for fun, or you'll miss transmit completions.
So, yes, we were transmitting, and we were failing them. The retry count was 7, and the ACK bit was 0. Ok, so it failed. It's using the net80211 rate control code, so I turned on rate control debugging (wlandebug +rate) and watched the hilarity.
The rate control code was never seeing any failures, so it just thought everything was hunky dory and kept pushing the rate up to 54mbit. Which was the exact wrong thing to do. It turns out the rate control code was only called if ack=1, which meant it was only notified if packets succeeded. I fixed up (through some revisions) the rate control notification path to be called always, error and success, and it began behaving better.
Now, bwn(4) was useful. But, it needs updating to support any of the 11n chipsets, and it certainly didn't do 5GHz operation on anything. So, off I went to investigate that.
There are, thankfully, three major sources of broadcom softmac information:
- Linux b43
- Linux brcmsmac
- http://bcm-v4.sipsolutions.net/
Now, there's some architectural things to know about these chips. Firstly, the broadcom hardware is structured (like all chips, really) with a bunch of cores on-die with an interconnect, and then some host bus glue. So, the hardware design can just reuse the same internals but a different host bus (USB, PCI, SDIO, etc) and reuse 90% of the chip design. That's a huge win. But, most of the other chips out there lie to you about the internal layout so you don't care - they map the internal cores into one big register window space so it looks like one device.
The broadcom parts don't. They expose each of the cores internally on a bus, and then you need to switch the cores on/off and also map them into the MMIO register window to access them.
Yes, that's right. There's not one big register window that it maps things to, PCI style. If you want to speak to a core, you have to unmap the existing core, map in the core you want, and do register access.
Secondly, the 802.11 core exposes MAC and PHY registers, but you can't have them both on at once. You switch on/off the MAC register window before you poke at the PHY.
Armed with this, I now understand why you need 'sys/dev/siba' (siba(4)) before you can use bwn(4). The siba driver provides the interface to PCI (and MIPS for an older Broadcom part) to probe/attach a SIBA bus, then enumerate all of the cores, then attach drivers to each. There's typically a PCI/PCIe core, then an 802.11 core, then a chipcommon core for the clock/power management, and then other things as needed (memory, USB, PCMCIA, etc.) bwn(4) doesn't attach to the PCI device, it sits on the siba bus as a child device.
So, to add support for a new chip, I needed to do a few things.
- The device needs to probe/attach to siba(4);
- The SPROM parsing is done by siba(4), so new fields have to be added there;
- The 802.11 core revision is what's probe/attached by bwn(4), so add it there;
- Then I needed to ensure the right microcode and radio initvals are added in bwn(4);
- Then, new PHY code is needed. For the BCM4321, it's PHY-N.
This meant that I would be adding GPLv2'ed code to bwn(4). So, I decided to dump it in sys/gnu/dev/bwn so it's away from the main driver, and make compiling it in non-standard. At some point yes, I'd like to port the brcmfmac PHYs to FreeBSD, but I wanted to get familiar with the chips and make sure the driver worked fine. Debugging /all/ broken and new pieces didn't sound like fun to me.
So after a few days, I got PHY-N compiling and I fired it up. I needed to add SPROM field parsing too, so I did that too. Then, the moment of truth - I fired it up, and it connected. It scanned on both 2G and 5G, and it worked almost first time! But, two things were broken:
- 5GHz operation just failed entirely for transmit, and
- 2GHz operation failed transmitting all OFDM frames, but CCK was fine.
There were two. Well, three, but two that broke everything.
Firstly, there's a "I'm 5GHz!" flag in the tx descriptor. I set that for 5GHz operation - but nothing.
Secondly, the driver tries a fallback rate if the primary rate fails. Those are hardcoded, same as the RTS/CTS rates. It turns out the fallback rate for 6MB OFDM is 11MB CCK, which is invalid for 5GHz. I fixed that, but I haven't yet fixed the 1MB CCK RTS/CTS rates. I'll go do that soon. (I also submitted a patch to Linux b43 to fix that!)
Thirdly, and this was the kicker - the PHY-N and later PHYs require more detailed TX setup. We were completely missing initializing some descriptor fields. It turns out it's also required for PHY-LP (which we support) but somehow the PHY was okay with that. Once I added those fields in, OFDM transmit worked fine.
So, a week after I started, I had a stable driver on 11bg chips, as well as 5GHz operation on the PHY-N BCM4321 NIC. No 11n yet, obviously, that'll have to wait.
In the next post I'll cover fixing up the RX RSSI calculations and then what I needed to do for the BCM94322MC, which is also a PHY-N chip, but is a much later core, and required new microcode with a new descriptor interface.
Monday, February 22, 2016
Why's my laptop running so hot? Or Firefox, pandora, and 1 million syscalls a second.
My FreeBSD-HEAD laptop runs very warm when running firefox, but even warmer when it's doing something simple - like say, streaming from pandora.
So, I decided to take a bit of a look.
Firstly, 'vmstat -a' - a good top level peek.
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 cd0 in sy cs us sy id
3 0 0 21G 180M 30403 0 2 0 21520 976 27 0 598 462567 10061 25 31 44
1 0 0 21G 174M 10389 0 0 0 3320 980 0 0 913 1203071 11892 15 24 61
3 0 0 21G 192M 4028 0 0 0 4763 983 14 0 563 1246314 8166 15 23 62
1 0 0 21G 192M 2305 0 0 0 334 988 1 0 390 1165396 10784 18 20 62
2 0 0 21G 202M 30493 0 0 0 3154 983 4 0 340 1072100 13287 28 23 49
2 0 0 21G 202M 8440 0 0 0 646 979 1 0 391 1071166 8802 32 20 48
1 0 0 21G 204M 3608 0 0 0 1841 1954 31 0 516 1041635 11319 33 21 46
3 0 0 20G 212M 67782 0 0 0 2895 973 1 0 387 1053575 10995 28 26 46
2 0 0 21G 187M 25368 0 0 0 2483 989 7 0 475 1047031 12056 29 23 48
.. ok, a million syscalls a second. Fine.Let's ask dtrace what's going on:
root@victoria:/home/adrian # dtrace -n 'syscall:::entry { @[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 1082 probes
^C
gettimeofday 305
lstat 336
kevent 598
recvfrom 1018
__sysctl 1384
getpid 2158
sigprocmask 5189
select 5443
writev 6215
madvise 6606
setitimer 6729
recvmsg 17556
_umtx_op 40740
ppoll 853940
read 1152896
poll 1158669
write 2159990
ioctl 2170830
root@victoria:/home/adrian # dtrace -n 'syscall::read:return /execname == "firefox"/ { @["rval (bytes)"] =
quantize(arg1); }'
dtrace: description 'syscall::read:return ' matched 2 probes
^C
rval (bytes)
value ------------- Distribution ------------- count
-2 | 0
-1 | 1
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 496294
2 | 5
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 | 0
512 | 0
1024 | 0
2048 | 0
4096 | 6
8192 | 0
16384 | 0
32768 | 36
65536 | 0
root@victoria:/home/adrian # dtrace -n 'syscall::write:return /execname == "firefox"/ { @["rval (bytes)"] = quantize(arg1); }'
dtrace: description 'syscall::write:return ' matched 2 probes
^C
rval (bytes)
value ------------- Distribution ------------- count
-2 | 0
-1 |@@@@@@@@@@@@@@@@@@@@ 875025
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@ 876075
2 | 0
4 | 0
8 | 0
16 | 14
32 | 1
64 | 0
128 | 15
256 | 8
512 | 8
1024 | 29
2048 | 563
4096 | 0
8192 | 0
16384 | 0
32768 | 14
65536 | 0
... ok, so read and write is doing single byte transactions, and write is actually failing as often as it's succeeding.
so, what's actually going on? I decided to run truss briefly, and I get a lot of this:
_umtx_op(0x82efa2e80,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xce0c5ac0) = 0 (0x0)
_umtx_op(0x82b16fb70,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8d6a57e58,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
write(158,"x",1) ERR#35 'Resource temporarily unavailable'
_umtx_op(0x8006bd4b8,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffbde44c88) = 0 (0x0)
So I'm guessing there's a lot of inefficient single byte read/write syscalls to wake up a remote thread, along with a lot of inefficient use of the sound API.
For sound ioctls:
ioctl(68,SNDCTL_DSP_GETOSPACE,0xd7d54e10) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xd7d54e00) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOSPACE,0xd7d54df0) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xd7d54d50) = 0 (0x0)
.. so i'm guessing it's doing it every thread wakeup or something stupid, even if it doesn't need to yet.
So, I decided to take a bit of a look.
Firstly, 'vmstat -a' - a good top level peek.
procs memory page disks faults cpu
r b w avm fre flt re pi po fr sr ad0 cd0 in sy cs us sy id
3 0 0 21G 180M 30403 0 2 0 21520 976 27 0 598 462567 10061 25 31 44
1 0 0 21G 174M 10389 0 0 0 3320 980 0 0 913 1203071 11892 15 24 61
3 0 0 21G 192M 4028 0 0 0 4763 983 14 0 563 1246314 8166 15 23 62
1 0 0 21G 192M 2305 0 0 0 334 988 1 0 390 1165396 10784 18 20 62
2 0 0 21G 202M 30493 0 0 0 3154 983 4 0 340 1072100 13287 28 23 49
2 0 0 21G 202M 8440 0 0 0 646 979 1 0 391 1071166 8802 32 20 48
1 0 0 21G 204M 3608 0 0 0 1841 1954 31 0 516 1041635 11319 33 21 46
3 0 0 20G 212M 67782 0 0 0 2895 973 1 0 387 1053575 10995 28 26 46
2 0 0 21G 187M 25368 0 0 0 2483 989 7 0 475 1047031 12056 29 23 48
.. ok, a million syscalls a second. Fine.Let's ask dtrace what's going on:
root@victoria:/home/adrian # dtrace -n 'syscall:::entry { @[probefunc] = count(); }'
dtrace: description 'syscall:::entry ' matched 1082 probes
^C
gettimeofday 305
lstat 336
kevent 598
recvfrom 1018
__sysctl 1384
getpid 2158
sigprocmask 5189
select 5443
writev 6215
madvise 6606
setitimer 6729
recvmsg 17556
_umtx_op 40740
ppoll 853940
read 1152896
poll 1158669
write 2159990
ioctl 2170830
root@victoria:/home/adrian # dtrace -n 'syscall::read:return /execname == "firefox"/ { @["rval (bytes)"] =
quantize(arg1); }'
dtrace: description 'syscall::read:return ' matched 2 probes
^C
rval (bytes)
value ------------- Distribution ------------- count
-2 | 0
-1 | 1
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 496294
2 | 5
4 | 0
8 | 0
16 | 0
32 | 0
64 | 0
128 | 0
256 | 0
512 | 0
1024 | 0
2048 | 0
4096 | 6
8192 | 0
16384 | 0
32768 | 36
65536 | 0
root@victoria:/home/adrian # dtrace -n 'syscall::write:return /execname == "firefox"/ { @["rval (bytes)"] = quantize(arg1); }'
dtrace: description 'syscall::write:return ' matched 2 probes
^C
rval (bytes)
value ------------- Distribution ------------- count
-2 | 0
-1 |@@@@@@@@@@@@@@@@@@@@ 875025
0 | 0
1 |@@@@@@@@@@@@@@@@@@@@ 876075
2 | 0
4 | 0
8 | 0
16 | 14
32 | 1
64 | 0
128 | 15
256 | 8
512 | 8
1024 | 29
2048 | 563
4096 | 0
8192 | 0
16384 | 0
32768 | 14
65536 | 0
... ok, so read and write is doing single byte transactions, and write is actually failing as often as it's succeeding.
so, what's actually going on? I decided to run truss briefly, and I get a lot of this:
_umtx_op(0x82efa2e80,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xce0c5ac0) = 0 (0x0)
_umtx_op(0x82b16fb70,UMTX_OP_MUTEX_WAIT,0x0,0x0,0x0) = 0 (0x0)
_umtx_op(0x8d6a57e58,UMTX_OP_NWAKE_PRIVATE,0x1,0x0,0x0) = 0 (0x0)
write(158,"x",1) ERR#35 'Resource temporarily unavailable'
_umtx_op(0x8006bd4b8,UMTX_OP_WAIT_UINT_PRIVATE,0x0,0x18,0x7fffbde44c88) = 0 (0x0)
So I'm guessing there's a lot of inefficient single byte read/write syscalls to wake up a remote thread, along with a lot of inefficient use of the sound API.
For sound ioctls:
ioctl(68,SNDCTL_DSP_GETOSPACE,0xd7d54e10) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xd7d54e00) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOSPACE,0xd7d54df0) = 0 (0x0)
ioctl(68,SNDCTL_DSP_GETOPTR,0xd7d54d50) = 0 (0x0)
.. so i'm guessing it's doing it every thread wakeup or something stupid, even if it doesn't need to yet.
Wednesday, February 17, 2016
On being able to reflash your own devices, or "wow, millions of devices are potentially vulnerable."
If you work in software, you've likely heard of the latest hilarious bug - Linux glibc getaddrinfo() stack buffer overflow (https://isc.sans.edu/diary/CVE-2015-7547%3A+Critical+Vulnerability+in+glibc+getaddrinfo/20737). It was jointly found by redhat and google (https://googleonlinesecurity.blogspot.ca/2016/02/cve-2015-7547-glibc-getaddrinfo-stack.html), and it's been under investigation for a while. There are also some proof of concepts out there (eg https://github.com/fjserna/CVE-2015-7547).
I'm not sure if Android or OpenWRT devices are vulnerable - they don't use glibc out of the box, but they may use the relevant pieces of the NSS resolver library. But anything based on a linux distribution (centos, debian, ubuntu, redhat, etc) - ie, web services, docker installs, virtual machines, a heck of a lot of firewall/email/web gateway appliances, even some router management planes (hi Cisco?) may be vulnerable to this attack.
This means, well, most of the internet is likely vulnerable. I'm glad it's not an obvious bug in openwrt/android, as that'd also mean tens/hundreds of millions of devices are vulnerable. But it's a good study case - if you own something that has this bug, but there's no longer software updates available, you're short of luck. You may have working software on a perfectly working hardware, but since you (or some third party) can't fix it, it's effectively a paperweight.
But there may be devices which use glibc that I haven't covered. There may be set top boxes, televisions, cable modem / DSL gateways that are affected by this. There's likely a whole bunch of medical kit and control systems out there with this bug. Millions of potential consumer and industrial devices are impacted by this bug and it's likely never going to be patched. And since it's DNS, it's totally unencrypted/unauthorized, so anyone can hijack/spoof DNS to control what's going on.
So this is why I'm a big fan of open source software and being able to reflash your own devices. There's likely millions (or more!) devices this affects that is perfectly fine hardware but will never get software updates. This exposes a lot of people, with no easy fix besides "buy a replacement" and hope that it also isn't impacted. Heck, look at your home, office, workspace, outdoors - look at all those little electronic devices and think that at least some of them run Linux with this vulnerability and will be network connected. Any of them could be vulnerable to this and any of them may be owned by someone now.
This is "Hollywood" level of exploit. This is like, watching an episode of "Person of Interest" and realising all of those drive-by hacks are actually possible. This is like, anyone everywhere can do this - not just governments, but anyone with the minimum technical ability needed to run the exploit. Yes, this includes your internet connected fridge and your Internet-Of-Things lightbulbs.
Oh, and FreeBSD isn't vulnerable. Heh.
I'm not sure if Android or OpenWRT devices are vulnerable - they don't use glibc out of the box, but they may use the relevant pieces of the NSS resolver library. But anything based on a linux distribution (centos, debian, ubuntu, redhat, etc) - ie, web services, docker installs, virtual machines, a heck of a lot of firewall/email/web gateway appliances, even some router management planes (hi Cisco?) may be vulnerable to this attack.
This means, well, most of the internet is likely vulnerable. I'm glad it's not an obvious bug in openwrt/android, as that'd also mean tens/hundreds of millions of devices are vulnerable. But it's a good study case - if you own something that has this bug, but there's no longer software updates available, you're short of luck. You may have working software on a perfectly working hardware, but since you (or some third party) can't fix it, it's effectively a paperweight.
But there may be devices which use glibc that I haven't covered. There may be set top boxes, televisions, cable modem / DSL gateways that are affected by this. There's likely a whole bunch of medical kit and control systems out there with this bug. Millions of potential consumer and industrial devices are impacted by this bug and it's likely never going to be patched. And since it's DNS, it's totally unencrypted/unauthorized, so anyone can hijack/spoof DNS to control what's going on.
So this is why I'm a big fan of open source software and being able to reflash your own devices. There's likely millions (or more!) devices this affects that is perfectly fine hardware but will never get software updates. This exposes a lot of people, with no easy fix besides "buy a replacement" and hope that it also isn't impacted. Heck, look at your home, office, workspace, outdoors - look at all those little electronic devices and think that at least some of them run Linux with this vulnerability and will be network connected. Any of them could be vulnerable to this and any of them may be owned by someone now.
This is "Hollywood" level of exploit. This is like, watching an episode of "Person of Interest" and realising all of those drive-by hacks are actually possible. This is like, anyone everywhere can do this - not just governments, but anyone with the minimum technical ability needed to run the exploit. Yes, this includes your internet connected fridge and your Internet-Of-Things lightbulbs.
Oh, and FreeBSD isn't vulnerable. Heh.
Subscribe to:
Posts (Atom)