Adrian Chadd's Ramblings: 2020

Monday, December 28, 2020

Repairing and bootstrapping an IBM PC/AT 5170, Part 3

So! In Parts 1 and 2 I covered getting this old thing cleaned up, getting it booting, hacking together a boot floppy disc and getting a working file transfer onto a work floppy disc.

Today I'm going to cover what it took to get it going off of a "hard disk", which in 2020 can look quite a bit different than 1990.

First up - what's the hard disk? Well, in the 90s we could still get plenty of MFM and early IDE hard disks to throw into systems like this. In 2020, well, we can get Very Large IDE disks from NOS (like multi hundred gigabyte parallel ATA interface devices), but BIOSes tend to not like them. The MFM disks are .. well, kinda dead. It's understandable - they didn't exactly build them for a 40 year shelf life.

The IBM PC, like most computers at the time, allow for peripherals to include software support in ROM for the hardware you're trying to use. For example, my PC/AT BIOS doesn't know anything about IDE hardware - it only knows about ye olde ST-412/ST-506 Winchester/MFM drive controllers. But contemporary IDE hardware would include the driver code for the BIOS in an on-board ROM, which the BIOS would enumerate and use. Other drive interconnects such as SCSI did the same thing.

By the time the 80386's were out, basic dumb IDE was pretty well supported in BIOSes as, well, IDE is really code for "let's expose some of the 16 bit ISA bus on a 40 pin ribbon cable to the drives". But, more about that later.

Luckily some electronics minded folk have gone and implemented alternatives that we can use. Notably:

There's now an open-source "Universe IDE BIOS" available for computers that don't have IDE support in their BIOS - notably PC/XT and PC/AT, and
There are plenty of projects out there which break out the IDE bus on an XT or AT ISA bus - I'm using XT-IDE.

Now, I bought a little XT-IDE + compact flash card board off of ebay. They're cheap, it comes with the universal IDE bios on a flash device, and ...

... well, I plugged it in and it didn't work. So, I wondered if I broke it. I bought a second one, as I don't have other ISA bus computers yet, and ...

IT didn't work. Ok, so I know that there's something up with my system, not these cards. I did the 90s thing of "remove all IO cards until it works" in case there was an IO port conflict and ...

.. wham! The ethernet card. Both wanted 0x300. I'd have to reflash the Universal IDE BIOS to get it to look at any other address, so off I went to get the Intel Etherexpress 8/16 card configuration utility.

Here's an inside shot of the PC/AT with the XT-IDE installed, and a big gaping hole where the Intel EtherExpress 8/16 NIC should be.

No wait. What I SHOULD do first is get the XT-IDE CF card booting and running.

Ok, so - first things first. I had to configure the BIOS drive as NONE, because the BIOS isn't servicing the drive - the IDE BIOS is. Unfortunately, the IDE BIOS is coming in AFTER the system BIOS disks, so I currently can't run MFM + IDE whilst booting from IDE. I'm sure I can figure out how at some point, but that point is not today.

Success! It boots!

To DOS 6.22!

And only the boot sector, and COMMAND.COM! Nooooo!

Ok so - I don't have a working 3.5" drive installed, I don't have DOS 6.22 media on 1.2MB, but I can copy my transfer program (DSZ) - and Alley Cat - onto the CF card. But - now I need the DOS 6.22 install media.

On the plus side - it's 2020 and this install media is everywhere. On the minus side - it's disk images that I can't easily use. On the double minus side - the common DOS raw disk read/write tool - RAWREAD/RAWRITE - don't know about 5.25" drives! Ugh!

However! Here's where a bit of hilarious old knowledge is helpful - although the normal DOS installers want to be run from floppy, there's a thing called the "DOS 6.22 Upgrade" - and this CAN be run from the hard disk. However! You need a blank floppy for it to write the "uninstallation" data to, so keep one of those handy.

I extracted the files from the disk images using MTOOLS - "MCOPY -i DISK.IMG ::*.* ." to get everything out - used PKZIP and DSZ to get it over to the CF card, and then ran the upgrader.

Hello DOS 6.22 Upgrade Setup!

Ah yes! Here's the uninstall disc step! Which indeed I had on hand for this very moment!

I wonder if I should fill out the registration card for this install and send it to Microsoft.

Ok, so that's done and now I have a working full DOS 6.22 installation. Now I can do all the fun things like making a DOS boot disk and recovery things. (For reference - you do that using FORMAT /S A: to format a SYSTEM disk that you can boot from; then you add things to it using COPY.)

Finally, I made a boot disk with the Intel EtherExpress 8/16 config program on it, and reconfigured my NIC somewhere other than 0x300. Now, I had to open up the PC/AT, remove the XT-IDE and install the EtherExpress NIC to do this - so yes, I had to boot from floppy disc.

Once that was done, I added a bunch of basic things like Turbo C 2.0, Turbo Assembler and mTCP. Now, mTCP is a package that really needed to exist in the 90s. However, this and the RAM upgrade (which I don't think I've talked about yet!) will come in the next installment of "Adrian's remembering old knowledge from his teenage years!".

Thursday, December 17, 2020

Repairing and bootstrapping an IBM 5170 PC/AT, part 2

Ok, so now it runs. But, what did it take to get here?

First up - I'm chasing down a replacement fusable PROM and I'll likely have to build a programmer for it. The programmer will need to run a bit at a time, which is very different to what the EPROM programmers available today support. It works for now, but I don't like it.

I've uploaded a dump of the PROM here - https://erikarn.github.io/pcat/notes.html .

Here's how the repair looks so far:

Next - getting files onto the device. Now, remember the hard disk is unstable, but even given that it's only DOS 5.0 which didn't really ship with any useful file transfer stuff. Everyone expected you'd have floppies available. But, no, I don't have DOS available on floppy media! And, amusingly, I don't have a second 1.2MB drive installed anywhere to transfer files.

I have some USB 3.5" drives that work, and I have a 3.5" drive and Gotek drive to install in the PC/AT. However, until yesterday I didn't have a suitable floppy cable - the 3.5" drive and Gotek USB floppy thingy both use IDC pin connectors, and this PC/AT uses 34 pin edge connectors. So, whatever I had to do, I had to do with what I had.

There are a few options available:

You can write files in DOS COMMAND.COM shell using COPY CON <file> - it either has to be all ascii, or you use ALT-<3 numbers> to write ALT CODES. For MS-DOS, this would just input that value into the keyboard buffer. For more information, Wikipedia has a nice write-up here: https://en.wikipedia.org/wiki/Alt_code .
You can use an ASCII only assembly as above: a popular one was TCOM.COM, which I kept here: https://erikarn.github.io/pcat/tcomtxt.asm
If you have MODE.COM, you could try setting up the serial port (COM1, COM2, etc) to a useful baud rate, turn on flow control, etc - and then COPY COM1 <file>. I didn't try this because I couldn't figure out how to enable hardware flow control, but now that I have it all (mostly) working I may give it a go.
If you have QBASIC, you can write some QBASIC!

I tried TCOM.COM, both at 300 and 2400 baud. Both weren't reliable, and there's a reason it isn't - writing to the floppy is too slow! Far, far too slow! And, it wasn't enforcing hardware flow control, which was very problematic for reliable transfers.

So, I wrote some QBASIC. It's pretty easy to open a serial port and read/write to it, but it's not AS easy to have it work for binary file transfer. There are a few fun issues:

Remember, DOS (and Windows too, yay!) has a difference between files open for text reading/writing and files open for binary reading/writing.
QBASIC has sequential file access or random file access. For sequential, you use INPUT/PRINT, for random you use GET and PUT.
There's no byte type - you define it as a STRING type of a certain size.
This is an 8MHz 80286, and .. well, let's just say QBASIC isn't the fastest thing on the planet here.

I could do some basic IO fine, but I couldn't actually transfer and write out the file contents quickly and reliably. Even going from 1200 to 4800 and 9600 baud didn't increase the transfer rate! So even given an inner loop of reading/writing a single byte at a time with nothing else, it still can't keep up.

The other amusingly annoying thing is what to use on the remote side to send binary files. Now, you can use minicom and such on FreeBSD/Linux, but it doesn't have a "raw" transfer type - it has xmodem, ymodem, zmodem and ascii transfers. I wanted to transfer a ~ 50KB binary to let me do ZMODEM transfers, and .. well, this presents a bootstrapping problem.

After a LOT of trial and error, I ended up with the following:

I used tip on FreeBSD to talk to the serial port
I had to put "hf=true" into .tiprc to force hardware handshaking; it didn't seem to work when I set it after I started tip (~s to set a variable)
On the QBASIC side I had to open it up with hardware flow control to get reliable transfers;
And I had to 128 byte records - not 1 byte records - to get decent transfer performance!
On tip to send the file I would ask it to fork 'dd' to do the transfer (using ~C) and asking it to pad to the 128 byte boundary:

dd if=file bs=128 conv=sync

The binary I chose (DSZ.COM) didn't mind the extra padding, it wasn't checksumming itself.

Here's the hacky QBASIC program I hacked up to do the transfer:

OPEN "RB", #2, "MYFILE.TXT", 128

' Note: LEN = 128 is part of the OPEN line, not a separate line!
OPEN "COM1:9600,N,8,1,CD0,CS500,DS500,OP0,BIN,TB2048,RB32768" FOR RANDOM AS #1 LEN = 128

size# = 413 '413 * 128 byte transfer
DIM c AS STRING * 128 ' 128 byte record
FOR i = 1 TO size#
GET #1, , c
PUT #2, , c
NEXT i
CLOSE #2
CLOSE #1

Now, this is hackish, but specifically:

9600 baud, 8N1, hardware flow control, 32K receive buffer.
128 byte record size for both the file and UART transfers.
the DSZ.COM file size, padded to 128 bytes, was 413 blocks. So, 413 block transfers.
Don't forget to CLOSE the file once you've written, or DOS won't finalise the file and you'll end up with a 0 byte file.

This plus tip configured for 9600 and hardware flow control did the right thing. I then used DSZ to use ZMODEM to transfer a fresh copy of itself, and CAT.EXE (Alley Cat!)

Ok, so that bootstrapped enough of things to get a ZMODEM transfer binary onto a bootable floppy disc containing a half-baked DOS 5.0 installation. I can write software with QBASIC and I can transfer files on/off using ZMODEM.

Next up, getting XT-IDE going in this PC/AT and why it isn't ... well, complete.

Wednesday, December 16, 2020

Repairing and bootstrapping an IBM 5170 PC/AT, part 1

I bought an IBM PC/AT 5170 a few years ago for a Hackerdojo project that didn't end up going anywhere.

So, I have a PC/AT with:

8MHz 80286 (type 3 board)
512K on board
128K expansion board (with space for 512K extended RAM, 41256 RAM chip style)
ST4038 30MB MFM drive with some gunk on head or platter 3 (random head 3 read failures, sigh)
1.2MB floppy drive
CGA card
Intel 8/16 ethernet card

Ok, so the bad disk was a pain in the ass. It's 2020, DOS on 1.2MB floppy disks isn't exactly the easiest thing to come across. But, it DOES occasionally boot.

But, first up - the BIOS battery replacement had leaked. Everywhere. So I replaced that, and had to type in a BASIC program into ROM BASIC to reprogram the BIOS NVRAM area with a default enough configuration to boot from floppy or hard disk.

Luckily someone had done that:

http://www.minuszerodegrees.net/5170/setup/5170_setup.htm

So, I got through that.

Then, I had to buy some double high density 5.25" discs. Ok, well, that took a bit, but they're still available as new old stock (noone's making floppy discs anymore, sigh.) I booted the hard disk and after enough attempts at it, it booted to the command prompt. At which point I promptly created a bootable system disc and copied as much of DOS 5.0 off of it as I could.

Then, since I am a child of the 80s and remember floppy discs, I promptly DISCCOPY'ed it to a second disc that I'm leaving as a backup.

And, for funsies, DOSSHELL.

Ok, so what's next?

I decided to buy an alternate BIOS - the Quadtel 286 image that's floating about - because quite frankly having to type in that BASIC program into ROM BASIC every time was a pain in the ass. So, in it went. Which was good, because...

Well, then it stopped working. It turns out that my clean-up of the battery leakage wasn't enough. The system booted with three short beeps and "0E" on the screen.

Now we get into deep, deep PC history.

Luckily, the Quadtel BIOS codes are available here:

http://www.bioscentral.com/postcodes/quadtelbios.htm

.. but with the Intel BIOS, it didn't beep, it didn't do anything. Just a black screen.

What gives?

So, starting with PC/AT and clone machines, the BIOS would write status updates during boot to a fixed IO port. Then if you have a diagnostic card that monitors that IO port, you'd get updates on where the system go to during boot before it hit a problem. These are called POST (power on self test) codes.

Here's a write-up of it and some POST cards:

http://www.minuszerodegrees.net/misc/post_cards.htm

Luckily, the Quadtel BIOS just spat it out on the screen for me. Phew.

So! 0xE says the 8254 interval timer wasn't working. I looked on the board and ... voila! It definitely had a lot of rusty looking crap on it. U115, a 32 byte PROM used for some address line decoding also was unhappy.

Here's how it looked before I had cleaned it up - this is circa July:

I had cleaned all of this out and used some vinegar on a Q-tip to neutralise the leaked battery gunk, but I didn't get underneath the ICs.

So, out they both came. I cleaned up the board, repaired some track damage and whacked in sockets.

Then in went the chips - same issue. Then I was sad.

Then! Into the boxes of ICs I went - where I found an 8254-2 that was spare! I lifted it from a dead PC clone controller board a while ago. In IT went, and the PC/AT came alive again.

(At this point I'd like to note that I was super afraid that the motherboard was really dead, as repairing PC/AT motherboards is not something I really wanted to do. Well, it's done and it works.)

Rightio! So, the PC boots, CGA monitor and all, from floppy disc. Now comes the fun stuff - how do I bootstrap said PC/AT with software, given no software on physical media? Aha, that's in part 2.

Friday, July 24, 2020

RFI from crappy electronics, or "how's this sold in the US?"

I picked up a cheap charging cable for my Baofeng UV-9S. (https://www.amazon.com/gp/product/B07TSDSQ4Z/). It .. well, it works.

But it messes up operating my radios! I heard super strong interference on my HF receiver and my VHF receivers.

So, let's take a look. I setup a little antenna in my shack. The baofeng was about 6ft away.

Here's DC to 120MHz. Those peaks to the right? Broadcast FM. The marker is at 28.5MHz.

Ok, let's plug in the baofeng and charger.

Ok, look at that noise. Ugh. That's unfun.

What about VHF? Let's look at that. 100-300MHz.

Ok, that's expected too. I think that's digital TV or something in there. Ok, now, let's plug in the charger, without it charging..

Whaaaaaaaaattttttt oh wait. Yeah, this is likely an unshielded buck converter and it's unloaded. Ok, let's load it up.

Whaaaaaa oh ok. Well that explains everything.

Let's pull it open:

Yup. A buck converter going from 5v to 9v; no shielding, no shielded power cable and no ground plane on the PCB. This is just amazing. The 3ft charge cable is basically an antenna. "Unintentional radiator" indeed.

So - even with a ferrite on the cable, it isn't quiet.

It's quiet at 28MHz now so I can operate on the 10m band with it charging, but this doesn't help at all at VHF.

Ew.

Wednesday, July 15, 2020

Fixing up ath_rate_sample to actually work well with 11n

Way back in 2011 when I was working on FreeBSD's Atheros 802.11n support I needed to go and teach some rate control code about 802.11n MCS rates. (As a side note, the other FreeBSD wifi hackers and I at the time taught wlan_amrr - the AMRR rate control in net80211 - about basic MCS support too, and fixing that will be the subject of a later post.)

The initial hacks I did to ath_rate_sample made it kind of do MCS rates OK, but it certainly wasn't great. To understand why then and what I've done now, it's best to go for a little trip down journey lane - the initial sample rate control algorithm by John Bicket. You can find a copy of the paper he wrote here - https://pdos.csail.mit.edu/papers/jbicket-ms.pdf .

Now, sample didn't try to optimise maximum throughput. Instead, it attempts to optimise for minimum airtime to get the job done, and also attempted to minimise the time spent sampling rates that had a low probability of working. Note this was all done circa 2005 - at the time the other popular rate control methods tried to maintain the highest PHY rate that met some basic success rate (eg packet loss, bit error rate, etc, etc.) The initial implementation in FreeBSD also included multiple packet size bins - 250 and 1600 bytes - to allow rate selection based on packet length.

However, it made some assumptions about rates that don't quite hold in the 802.11n MCS world. Notably, it didn't take the PHY bitrate into account when comparing rates. It mostly assumed that going up in rate code - except between CCK and OFDM rates - meant it was faster. Now, this is true for 11b, 11g and 11a rates - again except when you transition between 11b and 11g rates - but this definitely doesn't hold true in the 802.11n MCS rate world. Yes, between MCS0 to MCS7 the PHY bitrate goes up, but then MCS8 is MCS0 times two streams, and MCS16 is MCS0 times three streams.

So my 2011/2012 just did the minimum hacks to choose /some/ MCS rates. It didn't take the length of aggregates into account; it just used the length of the first packet in the aggregate. Very suboptimal, but it got MCS rates going.

Now fast-forward to 2020. This works fine if you're close to the other end, but it's very terrible if you're at the fringes of acceptable behaviour. My access points at home are not well located and thus I'm reproducing this behaviour very often - so I decided to fix it.

First up - packet length. I had to do some work to figure out how much data was in the transmit queue for a given node and TID. (Think "QoS category.") The amount of data in the queue wasn't good enough - chances are we couldn't transmit all of it because of 802.11 state (block-ack window, management traffic, sleep state, etc.) So I needed a quick way to query the amount of traffic in the queue taking into account 802.11 state. That .. ended up being a walk of each packet in the software queue for that node/TID list until we hit our limit, but for now that'll do.

So then I can call ath_rate_lookup() to get a rate control schedule knowing how long a packet may be. But depending up on the rate it returns, the amount of data that may be transmitted could be less - there's a 4ms limit on 802.11n aggregates, so at lower MCS rates you end up only sending much smaller frames (like 3KB at the slowest rate.) So I needed a way to return how many bytes to form an aggregate for as well as the rate. That informed the A-MPDU formation routine how much data it could queue in the aggregate for the given rate.

I also stored that away to use when completing the transmit, just to line things up OK.

Ok, so now I'm able to make rate control decisions based on how much data needs to be sent. ath_rate_sample still only worked with 250 and 1600 byte packets. So, I extended that out to 65536 bytes in mostly-powers-of-two values. This worked pretty well right out of the box, but the rate control process was still making pretty trash decisions.

The next bit is all "statistics". The decisions that ath_rate_sample makes depend upon accurate estimations of how long packet transmissions took. I found that a lot of the logic was drastically over-compensating for failures by accounting a LOT more time for failures at each attempted rate, rather than only accounting how much time failed at that rate. Here's two examples:

If a rate failed, then all the other rates would get failure accounted for the whole length of the transmission to that point. I changed it to only account for failures for that rate - so if three out of four rates failed, each failed rate would only get their individual time accounted to that rate, rather than everything.
Short (RTS/CTS) and long (no-ACK) retries were being accounted incorrectly. If 10 short retries occured, then the maximum failed transmission for that rate can't be 10 times the "it happened" long retry style packet accounting. It's a short retry; the only thing that could differ is the rate that RTS/CTS is being exchanged at. Penalising rates because of bursts of short failures was incorrect and I changed that accounting.

There are a few more, but you can look at the change log / change history for sys/dev/ath/ath_rate/sample/ to see.

By and large, I pretty accurately nailed making sure that failed transmit rates account for THEIR failures, not the failures of other rates in the schedule. It was super important for MCS rates because mis-accounting failures across the 24-odd rates you can choose in 3-stream transmit can have pretty disasterous effects on throughput - channel conditions change super frequently and you don't want to penalise things for far, far too long and it take a lot of subsequent successful samples just to try using that rate again.

So that was the statistics side done.

Next up - choices.

Choices was a bit less problematic to fix. My earlier hacks mostly just made it possible to choose MCS rates but it didn't really take into account their behaviour. When you're doing 11a/11g OFDM rates, you know that you go in lock-step from 6, 12, 18, 24, 36, 48, 54MB, and if a rate starts failing the higher rate will likely also fail. However, MCS rates are different - the difference between MCS0 (1/2 BPSK, 1 stream) and MCS8 (1/2 BPSK, 2 streams) is only a couple dB of extra required signal strength. So given a rate, you want to sample at MCS rates around it but also ACROSS streams. So I mostly had to make sure that if I was at say MCS3, I'd also test MCS2 and MCS4, but I'd also test MCS10/11/12 (the 2-stream versions of MCS2/3/4) and maybe MCS18/19/20 for 3-stream. I also shouldn't really bother testing too high up the MCS chain if I'm at a lower MCS rate - there's no guarantee that MCS7 is going to work (5/6 QAM64 - fast but needs a pretty clean channel) if I'm doing ok at MCS2. So, I just went to make sure that the sampling logic wouldn't try all the MCS rates when operating at a given MCS rate. It works pretty well - sampling will try a couple MCS rates either side to see if the average transmit time for that rate is higher or lower, and then it'll bump it up or down to minimise said average transmit time.

However, the one gotcha - packet loss and A-MPDU.

ath_rate_sample was based on single frames, not aggregates. So the concept of average transmit time assumed that the data either got there or it didn't. But, with 802.11n A-MPDU aggregation we can have the higher rates succeed at transmitting SOMETHING - meaning that the average transmit time and long retry failure counts look great - but most of the frames in the A-MPDU are dropped. That means low throughput and more actual airtime being used.

When I did this initial work in 2011/2012 I noted this, so I kept an EWMA of the packet loss both of single frames and aggregates. I wouldn't choose higher rates whose EWMA was outside of a couple percent of the current best rate. It didn't matter how good it looked at the long retry view - if only 5% of sub-frames were ACKed, I needed a quick way to dismiss that. The EWMA logic worked pretty well there and only needed a bit of tweaking.

A few things stand out after testing:

For shorter packets, it doesn't matter if it chooses the one, two or three stream rate; the bulk of the airtime is overhead and not data. Ie, the difference between MCS4, MCS12 and MCS20 is any extra training symbols for 2/3 stream rates and a few dB extra signal strength required. So, typically it will alternate between them as they all behave roughly the same.
For longer packets, the bulk of the airtime starts becoming data, so it begins to choose rates that are obviously providing lower airtime and higher packet success EWMA. MCS12 is the choice for up to 4096 byte aggregates; the higher rates start rapidly dropping off in EWMA. This could be due to a variety of things, but importantly it's optimising things pretty well.

There's a bunch of future work to tidy this all up some more but it can wait.

I'm back into the grind of FreeBSD's wireless stack and 802.11ac

hi!

Yes, it's been a while since I posted here and yes, it's been a while since I was actively working on FreeBSD's wireless stack. Life's been .. well, life. I started the ath10k port in 2015. I wasn't expecting it to take 5 years, but here we are. My life has changed quite a lot since 2015 and a lot of the things I was doing in 2015 just stopped being fun for a while.

But the stars have aligned and it's fun again, so here I am.

Here's where things are right now.

First up - if_run. This is the Ralink (now mediatek) 11abgn USB driver for stuff that they made before Mediatek acquired them. A contributor named Ashish Gupta showed up on the #freebsd-wifi IRC channel on efnet to start working on 11n support to if_run and he got it to the point where the basics worked - and I took it and ran with it enough to land 20MHz 11n support. It turns out I had a couple of suitable NICs to test with and, well, it just happened. I'm super happy Ashish came along to get 11n working on another NIC.

The if_run TODO list (which anyone is welcome to contribute to):

Ashish is looking at 40MHz wide channel support right now;
Short and long-GI support would be good to have;
we need to get 11n TX aggregation working via the firmware interface - it looks like the Linux driver has all the bits we need and it doesn't need retransmission support in net80211. The firmware will do it all if we set up the descriptors correctly.

net80211 work

Next up - net80211. So, net80211 has basic 11ac bits, even if people think it's not there. It doesn't know about MU-MIMO streams yet but it'll be a basic 11ac AP and STA if the driver and regulatory domain supports it.

However, as I implement more of the ath10k port, I find more and more missing bits that really need to be in net80211.

A-MPDU / A-MSDU de-encapsulation

The hardware does A-MPDU and A-MSDU de-encapsulation in hardware/firmware, pushing up individual decrypted and de-encapsulated frames to the driver. It supports native wifi and 802.3 (ethernet) encapsulation, and right now we only support native wifi. (Note - net80211 supports 802.3 as well; I'll try to get that going once the driver lands.)

I added support to handle decryption offload with the ath10k supplied A-MPDU/A-MSDU frames (where there's no PN/MIC at all, it's all done in firmware/hardware!) so we could get SOME traffic. However, receive throughput just plainly sucked when I last poked at this. I also added A-MSDU offload support where we wouldn't drop the A-MSDU frames with the same receive 802.11 sequence number. However...

It turns out that my mac was doing A-MSDU in A-MPDU in 11ac, and the net80211 receive A-MPDU reordering was faithfully dropping all A-MSDU frames with the same receive 802.11 sequence number. So TCP would just see massive packet loss and drop the throughput in a huge way. Implementing this feature requires buffering all A-MSDU frames in an A-MPDU sub-frame in the reordering queue rather than tossing them, and then reordering them as if they were a single frame.

So I modified the receive reordering logic to reorder queues of mbufs instead of mbufs, and patched things to allow queuing multiple mbufs as long as they were appropriately stamped as being A-MSDUs in a single A-MPDU subframe .. and now the receive traffic rate is where it should be (> 300mbit UDP/TCP.) Phew.

U-APSD support

I didn't want to implement full U-APSD support in the Atheros 11abgn driver because it requires a lot of driver work to get it right, but the actual U-APSD negotiation support in net80211 is significantly easier. If the NIC supports U-APSD offload (like ath10k does) then I just have to populate the WME QoS fields appropriately and call into the driver to notify them about U-APSD changes.

Right now net80211 doesn't support the ADD-TS / DEL-TS methods for clients requesting explicit QoS requirements.

Migrating more options to per-VAP state

There are a bunch of net80211 state which was still global rather than per-VAP. It makes sense in the old world - NICs that do things in the driver or net80211 side are driven in software, not in firmware, so things like "the current channel", "short/long preamble", etc are global state. However the later NICs that offload various things into firmware can now begin to do interesting things like background channel switching for scan, background channel switching between STA and P2P-AP / P2P-STA. So a lot of state should be kept per-VAP rather than globally so the "right" flags and IEs are set for a given VAP.

I've started migrating this state into per-VAP fields rather than global, but it showed a second shortcoming - because it was global, we weren't explicitly tracking these things per-channel. Ok, this needs a bit more explanation.

Say you're on a 2GHz channel and you need to determine whether you care about 11n, 11g or 11b clients. If you're only seeing and servicing 11n clients then you should be using the short slot time, short preamble and not require RTS/CTS protection to interoperate with pre-11n clients.

But then an 11g client shows up.

The 11g client doesn't need to interoperate with 11b, only 11n - so it doesn't need RTS/CTS. It can use short preamble and short slot time still. But the 11n client need to interoperate, so it needs to switch protection mode into legacy - and it will do RTS/CTS protection.

But then, an 11b client shows up.

At this point the 11g protection kicks in; everyone does RTS/CTS protection and long preamble/slot time kicks in.

Now - is this a property of a VAP, or of a channel? Technically speaking, it's the property of a channel. If any VAP on that channel sees an 11b or 11g client, ALL VAPs need to transition to update protection mode.

I migrated all of this to be per-VAP, but I kept the global state for literally all the drivers that currently consume it. The ath10k driver now uses the per-VAP state for the above, greatly simplifying things (and finishing TODO items in the driver!)

ath10k changes

And yes, I've been hacking on ath10k too.

Locking issues

I've had a bunch of feedback and pull requests from Bjorn and Geramy pointing out lock ordering / deadlock issues in ath10k. I'm slowly working through them; the straight conversion from Linux to FreeBSD showed the differences in our locking and how/when driver threads run. I will rant about this another day.

Encryption key programming

The encryption key programming is programmed using firmware calls, but net80211 currently expects them to be done synchronously. We can't sleep in the net80211 crypto key updates without changing net80211's locks to all be SX locks (and I honestly think that's a bad solution that papers over non-asynchronous code that honestly should just be made asynchronous.) Anyway, so it and the node updates are done using deferred calls - but this required me to take complete copies of the encryption key contents. It turns out net80211 can pretty quickly recycle the key contents - including the key that is hiding inside the ieee80211_node. This fixed up the key reprogramming and deletion - it was sometimes sending garbage to the firmware. Whoops.

What's next?

So what's next? Well, I want to land the ath10k driver! There are still a whole bunch of things to do in both net80211 and the driver before I can do this.

Add 802.11ac channel entries to regdomain.xml

Yes, I added it - but only for FCC. I didn't add them for all the other regulatory domain codes. It's a lot of work because of how this file is implemented and I'd love help here.

Add MU-MIMO group notification

I'd like to make sure that we can at least support associating to a MU-MIMO AP. I think ath10k does it in firmware but we need to support the IE notifications.

Block traffic from being transmitted during a node creation or key update

Right now net80211 will transmit frames right after adding a node or sending a key update - it assumes the driver is completing it before returning. For software driven NICs like the pre-11ac Atheros chips this holds true, but for everything USB and newer firmware based devices this definitely doesn't hold.

For ath10k in particular if you try transmitting a frame without a node in firmware the whole transmit path just hangs. Whoops. So I've fixed that so we can't queue a frame if the firmware doesn't know about the node but ...

... net80211 will send the association responses in hostap mode once the node is created. This means the first association response doesn't make it to the associating client. Since net80211 doesn't yet do this traffic buffering, I'll do it in ath10k- I'll buffer frames during a key update and during node addition/deletion to make sure that nothing is sent OR dropped.

Clean up the Linux-y bits

There's a bunch of dead code which we don't need or don't use; as well as some compatibility bits that define Linux mac80211/nl80211 bits that should live in net80211. I'm going to turn these into net80211 methods and remove the Linux-y bits from ath10k. Bjorn's work to make linuxkpi wifi shims can then just translate the calls to the net80211 API bits I'll add, rather than having to roll full wifi methods inside linuxkpi.

To wrap up ..

.. job changes, relationship changes, having kids, getting a green card, buying a house and paying off old debts from your old hosting company can throw a spanner in the life machine. On the plus side, hacking on FreeBSD and wifi support are fun again and I'm actually able to sleep through the night once more, so ... here goes!

If you're interested in helping out, I've been updating the net80211/driver TODO list here: https://wiki.freebsd.org/WiFi/TodoStuff . I'd love some help, even on the small things!