I decided to bite the bullet and start hacking on bluetooth coexistence on these Atheros NICs. It's a bit of a rabbit hole.
I'll write up a bit more documentation on this when I'm not overly tired, but the general overview is pretty simple: "It's all done in software."
The bluetooth and wifi stacks need to speak to each other to know when is an appropriate time to prefer wifi traffic or bluetooth traffic. When pairing, bluetooth should be preferred. When scanning, associating, authenticating and rekeying, wifi should be preferred. When different profiles are active (eg A2DP audio), the bluetooth traffic should be periodically given preference so the A2DP frames can go out reliably. This has to be controlled in software.
So to make this work well on FreeBSD, I'll have to teach the wifi and bluetooth stacks to interface with each other somehow so this can be synchronised.
I have basic (static) coexistence working with the AR9285+AR3011 combo NIC. That's now in -HEAD.
I'm working on basic (static) coexistence on the AR9485+AR3012 combo NIC, however my NIC has an older BT part which requires quite a bit of dancing to make work. I'll have to teach ath3kfw how to load the config and firmware image for the required NIC. It's going to take some time but it'll be worth it.
I was hoping that FreeBSD would have basic A2DP support but it currently doesn't. I'd love to see that happen as it'd simplify a lot of my development/testing - as I can then do audio stream testing both playing and recording audio, then stream that over wifi.
Oh well. Another day of hacking!
Friday, June 14, 2013
Monday, June 10, 2013
So long, and thanks for all the fish!
After 18 months at Qualcomm Atheros, I decided I needed a bit of a change.
This is what I sent out to the open source community:
Hi all,
This Friday will be my last day at Qualcomm Atheros. I've enjoyed working with the extremely bright and driven engineers and designers that make the wireless chips and SoCs that people everywhere take for granted. I've achieved a bunch of goals both with their internal product development and open source. But now it's time to move onto different things.
This Friday will be my last day at Qualcomm Atheros. I've enjoyed working with the extremely bright and driven engineers and designers that make the wireless chips and SoCs that people everywhere take for granted. I've achieved a bunch of goals both with their internal product development and open source. But now it's time to move onto different things.
I'd especially like to thank Luis Rodriguez for introducing me to the QCA folk and helping me get access to the Atheros open source project, as well as the follow-up discussions that led to me being hired. The open source wireless community has been driving innovation in a lot of areas for a number of years. I'd like to hope that I've had a small, positive effect on that. I wish you all the best of luck in pushing forward and continuing to innovate.
Now, I'm still NDA-enabled and I quite like hacking on this wireless stuff so I won't be quitting hacking on things. I will just have other things on my mind.
Good luck to you all!
Now, this generated a flurry of private emails asking me what happened and where I'm going to.
So, the summary - I accepted a job at Netflix, as part of their OpenConnect CDN team.
They've built a world-wide CDN using FreeBSD and they're looking to continue growing and improving it. They've committed to improving FreeBSD's network, storage and VM layer to facilitate moving tens of gigabits of Netflix video traffic per server. And, they're going to open source the bulk of it. They realise that the best benefit from open source comes from working with open source - and that's exactly what they've done. They've contributed back their improvements and fixes.
I've enjoyed my time at Qualcomm Atheros. The people are brilliant, the hardware is excellent and it was a great learning experience. I got to experience what it was like working at a silicon company during chip design, validation and bring-up - both the good and the bad bits. But when it came down to it, I couldn't contribute to and improve the process in any meaningful way. I was one engineer in a very large, diverse organisation - and like large organisations, things move slowly.
So, I hope to continue to maintain close ties with the hardware and software people at Qualcomm Atheros. I hope to continue hacking on the FreeBSD wireless stack in my spare time, as I have been to date. I wish I could've contributed more positively to their evolving hardware and software strategy. But there's only so much an engineer in an established company can do, and that engineer wasn't going to be me.
Sunday, June 2, 2013
The fitbit, or "making me aware of all the exercise I'm not doing"
A friend of mine (hi Sabrina!) uses a Fitbit to track her daily activities. It's a little device that tracks your movement and gives you a simple overview of how active you are (or aren't.)
Now, I don't really believe that its calorie counting, stair counting and step counting is entirely accurate. It's just doing it based on an accelerometer and I've seen it occasionally double count walks. That's fine.
But what it does do is pretty nifty: it's reminding me of exactly how freaking inactive I am being a salaried computer programmer. I'm not spending an hour or two a day walking. I'm not really doing any kind of strenuous activity outside of occasionally going to the gym.
This thing reminds me with one simple number (or flower, if you like that kind of thing) exactly how inactive you are. And that to me is worth more than millions of lines of cute looking websites to track your daily progress.
So, now I have no excuse.
Now, I don't really believe that its calorie counting, stair counting and step counting is entirely accurate. It's just doing it based on an accelerometer and I've seen it occasionally double count walks. That's fine.
But what it does do is pretty nifty: it's reminding me of exactly how freaking inactive I am being a salaried computer programmer. I'm not spending an hour or two a day walking. I'm not really doing any kind of strenuous activity outside of occasionally going to the gym.
This thing reminds me with one simple number (or flower, if you like that kind of thing) exactly how inactive you are. And that to me is worth more than millions of lines of cute looking websites to track your daily progress.
So, now I have no excuse.
Friday, May 31, 2013
Trying to implement A-MSDU support, or "what happens when things aren't done in the right order."
I've been looking at implementing A-MSDU support in net80211. This has crept up for a few reasons:
For the short term I'm going to just ignore this and write the A-MSDU support for FreeBSD in net80211 and aggregate up to the maximum limit. It's good enough to do basic testing of the feature itself. But I do need to add that maximum frame limit for another reason: QoS.
- It means you can do basic MSDU aggregation without needing the full block-ack window mechanics;
- It allows you to do aggregation of very small frames into one larger MPDU that you can then stuff into an A-MPDU;
- I want to leverage it for TDMA.
So, the background.
A-MSDU is where you take a bunch of MSDUs and shovel them into a single, larger MPDU. They're all destined to the same end-node and they all are transmitted/retransmitted together. There's a single sequence number assigned to the A-MSDU, so the hardware will have to retransmit it as a single unit.
Which is great for things like TCP ACKs, which are tiny and waste a lot of airtime. If you can shovel a bunch of them into a larger A-MSDU and then transmit that inside an A-MPDU, you get a double bonus - your A-MPDU sub-frame sizes are large, so you won't hit any "minimum frame size" limits that various chips may have.
And for TDMA it's an instant win - I can just use A-MSDU with no ACK for now and achieve 11n throughput with minimal effort. All I have to do is write the A-MSDU support and I'll get 11n throughput for free with TDMA.
So all I have to do is write A-MSDU support, right? ... right?
It turns out that from an architectural standpoint, it's a pain in the ass to write.
For A-MPDU it's easy. You can just do it in the driver. It looks like a series of individual MPDUs that are already 802.11 encapsulated. So you can just buffer those in the driver and transmit/retransmit them. For net80211 (FreeBSD) and mac80211 (Linux) this is great - both stacks pass an already-encapsulated 802.11 MPDU to the driver.
But for A-MSDU it's a bit hairier. We're not aggregating already encapsulated 802.11 frames - we're encapsulating multiple 802.3 frames together. Which means the stack itself has to glue together a bunch of 802.3 frames into an A-MSDU, then pass that as an MPDU to the driver.
That bit isn't hard.
The first bit - figuring out what the maximum A-MSDU size is. Now, the naive solution is just to aggregate as much as you can to the 7935 byte maximum A-MSDU size boundary, then transmit that. Great - except that there are regulatory limits and QoS limits on how long an individual frame can take to transmit. So when you create an A-MSDU, you actually want to limit the size.
Now, what do you limit it to? It has to be based on how long it'll take to transmit, so here's the tricky bit - you already need to have made your transmit rate decision first. If you haven't made that decision, you can't calculate how long it'll take to transmit the frame.
For the short term I'm going to just ignore this and write the A-MSDU support for FreeBSD in net80211 and aggregate up to the maximum limit. It's good enough to do basic testing of the feature itself. But I do need to add that maximum frame limit for another reason: QoS.
With QoS, I have a specific slot time transmit opportunity limit. I have to do two things:
- Not exceed the slot time entirely by scheduling a frame so large it will actually exceed the transmit opportunity limit, and
- Schedule any other subsequent frames in order to try and "fill out" the rest of the transmit opportunity window.
This is really important for TDMA. Say I have an 8ms slot window but can only transmit 4ms at a time due to regulatory concerns. If I can schedule a 4ms long A-MSDU then great, that's what I'll do. But say this happens:
- I have a 8ms long window;
- My first frame is a long one, and it takes 4ms;
- I then have five 0.5ms long frames afterwards.
What I don't want to do is create an A-MSDU with all of those frames. I want to create a 4ms frame by adding in two 0.5ms frames to that 3ms frame, or transmit the 3ms frame followed by four 0.5ms long frames. I don't want to aggregate the second set of smaller frames into a 4ms long frame and have it be "too long" to fit into the rest of the transmit opportunity.
So, things get hairy.
But wait, there's more.
What about handling frame re-transmission? If you use the Atheros (or others, for that matter) hardware frame re-transmission, you can have the hardware re-transmit the frame for you at various rates, starting with the highest one and then trying slower ones. You now have a similar issue - what if the frame is within your transmit opportunity at the fastest rate, but not at the smallest rate? What the FreeBSD and Linux atheros driver does for A-MPDU is pretty stupid - it uses hardware re-transmission at slower rates, but limits the A-MPDU size to not exceed the maximum transmit length (4ms) at the slowest rate.
I'd rather have it never use hardware multi-rate retransmission and just step down to a slower rate. It'll re-calculate the maximum length and re-aggregate frames. It's fine, it'll be slightly less efficient but it'll work.
But for A-MSDU, it is done in the stack rather than the driver. So imagine this:
- You've buffered a bunch of 802.3 frames into a staging area, to put into an A-MSDU;
- You make a transmit rate selection and that limits how big your A-MSDU is;
- You assemble the A-MSDU and pass that MPDU down to the driver;
- The driver tries transmitting it and fails, so you should retransmit it as a lower rate;
- .. except now you really don't know if it didn't make it to the remote end or not. Did you fail to hear the ACK, or?
Now comes the tricky bit. All you know at this point is that it didn't ACK. You don't know whether it didn't transmit or whether you didn't hear the ACK (and the receiver did actually receive it, ACK it and push it back up to the network layer.)
If you retransmit the MSDU at a lower rate, the receiver can eliminate a duplicate received frame by just looking at the sequence number and seeing it already has seen it, eliminate it.
But if you retransmit it at a lower rate that exceeds the TXOP window size, you will be breaking QoS requirements. Your hardware (eg Atheros with the right bit enabled!) may even just flat out refuse to transmit the frame, returning it as failed because it automatically failed to fit into the transmit opportunity window. So, what do you do?
If you retransmit it at a lower rate, it's going to automatically fail. What about pulling apart the A-MSDU into two sets of MSDUs, then treating it as two A-MSDUs to transmit? That way both will fit into the maximum transmit duration at the given transmit rate.
The problem here is you don't know that the receiver did actually not hear the A-MSDU. All you do know is you didn't hear the ACK. So if you do this, and assign new sequence numbers to the two new A-MSDUs, it's quite possible that the receiver will hear the old A-MSDU and the two new A-MSDUs, and pass duplicated frames up to the network layer.
So, the TL;DR version here - we either form A-MSDUs for software retransmit that can be retransmitted by the lower rates (biting some inefficiency but allowing for retransmission), or you just absolutely do fail the transmit and not retry.
So, it's complicated. Complicated and annoyingly messy.
Tuesday, April 30, 2013
A FAQ about today's FSF release
I've had a few people ask me some questions. There's also been a few questions on slashdot. I'll update this article as more questions come in.
I encouraged some external developers from the community to come on board and help in the initial effort to get it to compile and work correctly under the open source Tensilica toolchain rather than the internal toolchain.
I've fixed a few bugs here and there - eg the RX path TSF bugs that stopped the NICs from working in Mesh mode, along with some other fallout issues from the toolchain migration.
I wanted the bulk of the work to come from the community rather than me. I don't want to be the only person working on this. Thankfully I'm not! There's an active community now!
I'll likely do a bunch more development in the firmware code once I get it working on FreeBSD!
The AR9271 design is a single chip solution (see below) with an AR9285 NIC internally. I don't know whether internally it speaks PCIe or whether they just glued the NIC onto the AHB like they do for other integrated CPU+SoC designs (eg the AR913x, AR933x, AR934x, etc.)
Was it all me?
No, I didn't do the bulk of the work. Luis did the bulk of the legal hoop-jumping and review process at work. I grabbed it near the end of this process (so he could move onto other things) and shepherded the process of getting things ready for open sourcing.I encouraged some external developers from the community to come on board and help in the initial effort to get it to compile and work correctly under the open source Tensilica toolchain rather than the internal toolchain.
I've fixed a few bugs here and there - eg the RX path TSF bugs that stopped the NICs from working in Mesh mode, along with some other fallout issues from the toolchain migration.
I wanted the bulk of the work to come from the community rather than me. I don't want to be the only person working on this. Thankfully I'm not! There's an active community now!
I'll likely do a bunch more development in the firmware code once I get it working on FreeBSD!
Why is it only one device? Why is it so expensive? Why that device?
You'll have to ask the FSF that.
How different is this to the non-USB stuff?
Like a lot of manufacturers, Atheros reuses its CPU and Wifi cores everywhere they can.
The AR7010 designs have an external AR9280 or AR9287 NIC. This is exactly the same as a mini-PCIe design - the same chip, speaking PCIe, etc.
The AR9271 design is a single chip solution (see below) with an AR9285 NIC internally. I don't know whether internally it speaks PCIe or whether they just glued the NIC onto the AHB like they do for other integrated CPU+SoC designs (eg the AR913x, AR933x, AR934x, etc.)
But once you get past the USB and CPU parts, it looks exactly the same as the PCIe devices Atheros driver developers know and love.
Just keep in mind the main difference - the wifi part doesn't DMA directly to/from your computer memory. It has to go via buffer RAM on the AR7010 core in order to then send or receive it via USB endpoints.
What about the other NICs? The AR7010 based ones?
The AR7010 based ones are precursors to the single-chip solution that the FSF is selling a NIC for (the AR9271.) The AR7010 has USB on one side and PCIe on the other. It runs effectively the same firmware as the AR9271 NIC, save for some different ROM addresses, memory map and some other little differences.
The AR7010 based devices are thus "just as free" as the AR9271 NIC the FSF is selling.
I'm not sure if the FSF is going to certify an AR7010 design. I hope they can find a dual-band AR7010+AR9280 ath9k_htc NIC and sell that as part of their open hardware programme.
What is this AR9271 anyway? Why is it only 1x1 and 2GHz only?
The AR9271 is a single chip solution containing:
- An AR7010 style core, with minor differences
- Some RAM and ROM (but less RAM than the AR7010; no I don't know why.)
- An AR9285 derivative (which is the 2GHz, 1x1 chip.)
Like a lot of things that manufacturers do, it's a "cost savings" design for a specific market. Even now, laptop and tablet manufacturers want to skimp on 5GHz NIC designs in order to save some cash. No, I don't know why. No, I can't quote costs.
How can I help?
Download the firmware, download a linux-next or compat-wireless tarball - or, run OpenBSD + athn for now - compile stuff up and hack away.
Slashdotted!
Hi to those from!
http://hardware.slashdot.org/story/13/04/30/2251252/fsf-certifies-atheros-based-thinkpenguin-80211-n-usb-adapter
All I'd like to say is:
"Patches gratefully accepted."
http://hardware.slashdot.org/story/13/04/30/2251252/fsf-certifies-atheros-based-thinkpenguin-80211-n-usb-adapter
All I'd like to say is:
"Patches gratefully accepted."
Thursday, April 25, 2013
Today's Journey: Making AP mode power-save work better
I've been working on improving the net80211 and ath driver support for AP mode power save.
There's a few parts to it:
- A station can tell an access point it's going to sleep by setting the power mgmt bit to 1 in a TXed frame;
- The AP will then update the TIM entry in the beacon frames it sends out to reflect whether that station has any traffic queued;
- A station can signal an AP that it's awake by sending a data frame with the power mgmt bit set to 0;
- .. or it can request a frame at a time by using PS-POLL;
- There's also the uAPSD stuff which I haven't yet implemented and won't likely do so for a while.
Now, it shouldn't be that difficult. Except, that it is.
If an AP has a bunch of frames queued to a station that has gone to sleep, it will keep trying to transmit those frames. That wastes air-time and results in annoying levels of packet loss.
When you're doing 802.11n, there's a whole lot more traffic going on and a lot more room to cause massive traffic issues if you drop frames. But you don't want to keep failing to transmit those frames or you'll end up spending a lot of time transmitting BAR frames to the station.
If the driver maintains a queue of frames (for say, software retransmit) then it also needs to ensure that the TIM bit is set correctly. Otherwise the AP may set the TIM bit to 0 because the net80211 stack has no queued frames to that node; but the driver itself has some frames. Thus, the station won't wake up and you'll see increased packet latency.
When PS-POLL is received, frames need to first be leaked from the driver queue BEFORE it starts leaking frames from the net80211 power save queue. The last thing you want is the wrong set of frames to go out.
So, I've spent the last few months extending the driver and network stack to make this feasible. There's new net80211 driver methods for tying into the TIM update process, the node power save status and the PS-POLL handling. The filtered frames handling in the ath driver is another precursor to this - it means that frames can be failed out very quickly and retried when appropriate.
(No, I'm not implementing software retransmit for non-11n traffic just yet. I will eventually. Just not yet.)
The final bits that I've been working on have been tricky.
When a node goes to sleep, you want to pause the driver transmission to the node - otherwise it will keep trying to transmit whatever is in the driver queue. For 11n this is terrible; it means that frames will keep failing to be transmitted and with enough failures, the traffic will stop whilst a BAR frame is sent. Grr.
Next was figuring out how to send frames whilst the node is "paused". I introduced a per-node "leak" counter which tells the driver transmit path that even though the node is asleep, a single frame should be scheduled. If one isn't available, the next frame sent will be scheduled. This handles the PS-POLL "null" response - ie, if there's nothing in the queue, the net80211 stack will queue a null data response with the MORE bit clear. That way the station will know there's currently nothing to receive.
But then, something odd started happening. Devices would disassociate and re-associate, but they'd still be marked as "asleep". So no traffic would occur. After digging into it a bit, I discovered that the only time a station transitions back to awake is when it receives a DATA frame with the power mgmt bit set to 0. Seeing management/control traffic from the station isn't enough. So for now, I just always transmit management/control frames regardless if the station is asleep or awake - except BAR frames. Those get software queued if the node is asleep. Now that management/control frames are transmitted directly, a station can re-associate and be marked as 'awake.'
Then I found that once a station re-associates, it should have all of its current association state reset. It may have had a bunch of aggregate frames queued to the hardware and those need to finish transmitting before we can start transmitting new data to the re-associated station. It may even have been in the middle of receiving a BAR frame! So, I have to gently (well, "gently") reset the association state to allow for currently queued frames to be cleaned up, but reset things like filtered frame state and BAR TX. Ew, but it needs to be done.
Also, if there's data queued to an asleep station and a BAR frame needs to go out, the BAR frame needs to go into the head of the software queue, not the tail. Otherwise it will have to wait for the queue to be transmitted - which, if there's a gap in the transmit block-ack window (hence needing the BAR), no further transmission will occur. Oops!
I then found that a sufficiently chatty node could end up filling the software queue full of buffers destined to it. This is a general problem in the ath driver which I'll eventually fix, but it became a huge problem with power save enabled. So, I've introduced a per-node maximum queue depth when it's asleep. That should limit the amount of pain that a single sleeping node can cause. I'll eventually introduce a limit for how many buffers an individual node can consume whether it's awake or asleep but that's for another day.
There's likely lots more corner cases that need to be addressed before I can merge this into -HEAD. I'm still seeing my macbook pro occasionally disassociate and not automatically re-associate and I'm not sure why. But things are behaving much, much better with sleeping devices.
Subscribe to:
Posts (Atom)