Friday, July 24, 2020

RFI from crappy electronics, or "how's this sold in the US?"

I picked up a cheap charging cable for my Baofeng UV-9S. (https://www.amazon.com/gp/product/B07TSDSQ4Z/). It .. well, it works.

But it messes up operating my radios! I heard super strong interference on my HF receiver and my VHF receivers.

So, let's take a look. I setup a little antenna in my shack. The baofeng was about 6ft away.



Here's DC to 120MHz. Those peaks to the right? Broadcast FM. The marker is at 28.5MHz.

Ok, let's plug in the baofeng and charger.

Ok, look at that noise. Ugh. That's unfun.

What about VHF? Let's look at that. 100-300MHz.

Ok, that's expected too. I think that's digital TV or something in there. Ok, now, let's plug in the charger, without it charging..


Whaaaaaaaaattttttt oh wait. Yeah, this is likely an unshielded buck converter and it's unloaded. Ok, let's load it up.


Whaaaaaa oh ok. Well that explains everything.

Let's pull it open:



Yup. A buck converter going from 5v to 9v; no shielding, no shielded power cable and no ground plane on the PCB. This is just amazing. The 3ft charge cable is basically an antenna. "Unintentional radiator" indeed.

So - even with a ferrite on the cable, it isn't quiet.


It's quiet at 28MHz now so I can operate on the 10m band with it charging, but this doesn't help at all at VHF.

Ew.




Wednesday, July 15, 2020

Fixing up ath_rate_sample to actually work well with 11n

Way back in 2011 when I was working on FreeBSD's Atheros 802.11n support I needed to go and teach some rate control code about 802.11n MCS rates. (As a side note, the other FreeBSD wifi hackers and I at the time taught wlan_amrr - the AMRR rate control in net80211 - about basic MCS support too, and fixing that will be the subject of a later post.)

The initial hacks I did to ath_rate_sample made it kind of do MCS rates OK, but it certainly wasn't great. To understand why then and what I've done now, it's best to go for a little trip down journey lane - the initial sample rate control algorithm by John Bicket. You can find a copy of the paper he wrote here - https://pdos.csail.mit.edu/papers/jbicket-ms.pdf .

Now, sample didn't try to optimise maximum throughput. Instead, it attempts to optimise for minimum airtime to get the job done, and also attempted to minimise the time spent sampling rates that had a low probability of working. Note this was all done circa 2005 - at the time the other popular rate control methods tried to maintain the highest PHY rate that met some basic success rate (eg packet loss, bit error rate, etc, etc.) The initial implementation in FreeBSD also included multiple packet size bins - 250 and 1600 bytes - to allow rate selection based on packet length.

However, it made some assumptions about rates that don't quite hold in the 802.11n MCS world. Notably, it didn't take the PHY bitrate into account when comparing rates. It mostly assumed that going up in rate code - except between CCK and OFDM rates - meant it was faster. Now, this is true for 11b, 11g and 11a rates - again except when you transition between 11b and 11g rates - but this definitely doesn't hold true in the 802.11n MCS rate world. Yes, between MCS0 to MCS7 the PHY bitrate goes up, but then MCS8 is MCS0 times two streams, and MCS16 is MCS0 times three streams.

So my 2011/2012 just did the minimum hacks to choose /some/ MCS rates. It didn't take the length of aggregates into account; it just used the length of the first packet in the aggregate. Very suboptimal, but it got MCS rates going.

Now fast-forward to 2020. This works fine if you're close to the other end, but it's very terrible if you're at the fringes of acceptable behaviour. My access points at home are not well located and thus I'm reproducing this behaviour very often - so I decided to fix it.

First up - packet length.  I had to do some work to figure out how much data was in the transmit queue for a given node and TID. (Think "QoS category.") The amount of data in the queue wasn't good enough - chances are we couldn't transmit all of it because of 802.11 state (block-ack window, management traffic, sleep state, etc.) So I needed a quick way to query the amount of traffic in the queue taking into account 802.11 state. That .. ended up being a walk of each packet in the software queue for that node/TID list until we hit our limit, but for now that'll do.

So then I can call ath_rate_lookup() to get a rate control schedule knowing how long a packet may be. But depending up on the rate it returns, the amount of data that may be transmitted could be less - there's a 4ms limit on 802.11n aggregates, so at lower MCS rates you end up only sending much smaller frames (like 3KB at the slowest rate.) So I needed a way to return how many bytes to form an aggregate for as well as the rate. That informed the A-MPDU formation routine how much data it could queue in the aggregate for the given rate.

I also stored that away to use when completing the transmit, just to line things up OK.

Ok, so now I'm able to make rate control decisions based on how much data needs to be sent. ath_rate_sample still only worked with 250 and 1600 byte packets. So, I extended that out to 65536 bytes in mostly-powers-of-two values.  This worked pretty well right out of the box, but the rate control process was still making pretty trash decisions.

The next bit is all "statistics". The decisions that ath_rate_sample makes depend upon accurate estimations of how long packet transmissions took. I found that a lot of the logic was drastically over-compensating for failures by accounting a LOT more time for failures at each attempted rate, rather than only accounting how much time failed at that rate. Here's two examples:
  • If a rate failed, then all the other rates would get failure accounted for the whole length of the transmission to that point. I changed it to only account for failures for that rate - so if three out of four rates failed, each failed rate would only get their individual time accounted to that rate, rather than everything.
  • Short (RTS/CTS) and long (no-ACK) retries were being accounted incorrectly. If 10 short retries occured, then the maximum failed transmission for that rate can't be 10 times the "it happened" long retry style packet accounting. It's a short retry; the only thing that could differ is the rate that RTS/CTS is being exchanged at. Penalising rates because of bursts of short failures was incorrect and I changed that accounting.
There are a few more, but you can look at the change log / change history for sys/dev/ath/ath_rate/sample/ to see.

By and large, I pretty accurately nailed making sure that failed transmit rates account for THEIR failures, not the failures of other rates in the schedule. It was super important for MCS rates because mis-accounting failures across the 24-odd rates you can choose in 3-stream transmit can have pretty disasterous effects on throughput - channel conditions change super frequently and you don't want to penalise things for far, far too long and it take a lot of subsequent successful samples just to try using that rate again.

So that was the statistics side done.

Next up - choices.

Choices was a bit less problematic to fix. My earlier hacks mostly just made it possible to choose MCS rates but it didn't really take into account their behaviour. When you're doing 11a/11g OFDM rates, you know that you go in lock-step from 6, 12, 18, 24, 36, 48, 54MB, and if a rate starts failing the higher rate will likely also fail. However, MCS rates are different - the difference between MCS0 (1/2 BPSK, 1 stream) and MCS8 (1/2 BPSK, 2 streams) is only a couple dB of extra required signal strength. So given a rate, you want to sample at MCS rates around it but also ACROSS streams. So I mostly had to make sure that if I was at say MCS3, I'd also test MCS2 and MCS4, but I'd also test MCS10/11/12 (the 2-stream versions of MCS2/3/4) and maybe MCS18/19/20 for 3-stream. I also shouldn't really bother testing too high up the MCS chain if I'm at a lower MCS rate - there's no guarantee that MCS7 is going to work (5/6 QAM64 - fast but needs a pretty clean channel) if I'm doing ok at MCS2. So, I just went to make sure that the sampling logic wouldn't try all the MCS rates when operating at a given MCS rate. It works pretty well - sampling will try a couple MCS rates either side to see if the average transmit time for that rate is higher or lower, and then it'll bump it up or down to minimise said average transmit time.

However, the one gotcha - packet loss and A-MPDU.

ath_rate_sample was based on single frames, not aggregates. So the concept of average transmit time assumed that the data either got there or it didn't. But, with 802.11n A-MPDU aggregation we can have the higher rates succeed at transmitting SOMETHING - meaning that the average transmit time and long retry failure counts look great - but most of the frames in the A-MPDU are dropped. That means low throughput and more actual airtime being used.

When I did this initial work in 2011/2012 I noted this, so I kept an EWMA of the packet loss both of single frames and aggregates. I wouldn't choose higher rates whose EWMA was outside of a couple percent of the current best rate. It didn't matter how good it looked at the long retry view - if only 5% of sub-frames were ACKed, I needed a quick way to dismiss that. The EWMA logic worked pretty well there and only needed a bit of tweaking.


A few things stand out after testing:

  • For shorter packets, it doesn't matter if it chooses the one, two or three stream rate; the bulk of the airtime is overhead and not data. Ie, the difference between MCS4, MCS12 and MCS20 is any extra training symbols for 2/3 stream rates and a few dB extra signal strength required. So, typically it will alternate between them as they all behave roughly the same.
  • For longer packets, the bulk of the airtime starts becoming data, so it begins to choose rates that are obviously providing lower airtime and higher packet success EWMA. MCS12 is the choice for up to 4096 byte aggregates; the higher rates start rapidly dropping off in EWMA. This could be due to a variety of things, but importantly it's optimising things pretty well.
There's a bunch of future work to tidy this all up some more but it can wait.

I'm back into the grind of FreeBSD's wireless stack and 802.11ac

hi!

Yes, it's been a while since I posted here and yes, it's been a while since I was actively working on FreeBSD's wireless stack. Life's been .. well, life. I started the ath10k port in 2015. I wasn't expecting it to take 5 years, but here we are. My life has changed quite a lot since 2015 and a lot of the things I was doing in 2015 just stopped being fun for a while.

But the stars have aligned and it's fun again, so here I am.

Here's where things are right now.

First up - if_run. This is the Ralink (now mediatek) 11abgn USB driver for stuff that they made before Mediatek acquired them. A contributor named Ashish Gupta showed up on the #freebsd-wifi IRC channel on efnet to start working on 11n support to if_run and he got it to the point where the basics worked - and I took it and ran with it enough to land 20MHz 11n support. It turns out I had a couple of suitable NICs to test with and, well, it just happened. I'm super happy Ashish came along to get 11n working on another NIC.

The if_run TODO list (which anyone is welcome to contribute to):

  • Ashish is looking at 40MHz wide channel support right now;
  • Short and long-GI support would be good to have;
  • we need to get 11n TX aggregation working via the firmware interface - it looks like the Linux driver has all the bits we need and it doesn't need retransmission support in net80211. The firmware will do it all if we set up the descriptors correctly.

net80211 work


Next up - net80211. So, net80211 has basic 11ac bits, even if people think it's not there. It doesn't know about MU-MIMO streams yet but it'll be a basic 11ac AP and STA if the driver and regulatory domain supports it.

However, as I implement more of the ath10k port, I find more and more missing bits that really need to be in net80211.

A-MPDU / A-MSDU de-encapsulation


The hardware does A-MPDU and A-MSDU de-encapsulation in hardware/firmware, pushing up individual decrypted and de-encapsulated frames to the driver. It supports native wifi and 802.3 (ethernet) encapsulation, and right now we only support native wifi. (Note - net80211 supports 802.3 as well; I'll try to get that going once the driver lands.)

I added support to handle decryption offload with the ath10k supplied A-MPDU/A-MSDU frames (where there's no PN/MIC at all, it's all done in firmware/hardware!) so we could get SOME traffic. However, receive throughput just plainly sucked when I last poked at this. I also added A-MSDU offload support where we wouldn't drop the A-MSDU frames with the same receive 802.11 sequence number. However...

It turns out that my mac was doing A-MSDU in A-MPDU in 11ac, and the net80211 receive A-MPDU reordering was faithfully dropping all A-MSDU frames with the same receive 802.11 sequence number. So TCP would just see massive packet loss and drop the throughput in a huge way. Implementing this feature requires buffering all A-MSDU frames in an A-MPDU sub-frame in the reordering queue rather than tossing them, and then reordering them as if they were a single frame.

So I modified the receive reordering logic to reorder queues of mbufs instead of mbufs, and patched things to allow queuing multiple mbufs as long as they were appropriately stamped as being A-MSDUs in a single A-MPDU subframe .. and now the receive traffic rate is where it should be (> 300mbit UDP/TCP.) Phew.


U-APSD support


I didn't want to implement full U-APSD support in the Atheros 11abgn driver because it requires a lot of driver work to get it right, but the actual U-APSD negotiation support in net80211 is significantly easier. If the NIC supports U-APSD offload (like ath10k does) then I just have to populate the WME QoS fields appropriately and call into the driver to notify them about U-APSD changes.

Right now net80211 doesn't support the ADD-TS / DEL-TS methods for clients requesting explicit QoS requirements.

Migrating more options to per-VAP state


There are a bunch of net80211 state which was still global rather than per-VAP. It makes sense in the old world - NICs that do things in the driver or net80211 side are driven in software, not in firmware, so things like "the current channel", "short/long preamble", etc are global state. However the later NICs that offload various things into firmware can now begin to do interesting things like background channel switching for scan, background channel switching between STA and P2P-AP / P2P-STA. So a lot of state should be kept per-VAP rather than globally so the "right" flags and IEs are set for a given VAP.

I've started migrating this state into per-VAP fields rather than global, but it showed a second shortcoming - because it was global, we weren't explicitly tracking these things per-channel. Ok, this needs a bit more explanation.

Say you're on a 2GHz channel and you need to determine whether you care about 11n, 11g or 11b clients. If you're only seeing and servicing 11n clients then you should be using the short slot time, short preamble and not require RTS/CTS protection to interoperate with pre-11n clients.

But then an 11g client shows up.

The 11g client doesn't need to interoperate with 11b, only 11n - so it doesn't need RTS/CTS. It can use short preamble and short slot time still. But the 11n client need to interoperate, so it needs to switch protection mode into legacy - and it will do RTS/CTS protection.

But then, an 11b client shows up.

At this point the 11g protection kicks in; everyone does RTS/CTS protection and long preamble/slot time kicks in.

Now - is this a property of a VAP, or of a channel? Technically speaking, it's the property of a channel. If any VAP on that channel sees an 11b or 11g client, ALL VAPs need to transition to update protection mode.

I migrated all of this to be per-VAP, but I kept the global state for literally all the drivers that currently consume it. The ath10k driver now uses the per-VAP state for the above, greatly simplifying things (and finishing TODO items in the driver!)


ath10k changes


And yes, I've been hacking on ath10k too.

Locking issues


I've had a bunch of feedback and pull requests from Bjorn and Geramy pointing out lock ordering / deadlock issues in ath10k. I'm slowly working through them; the straight conversion from Linux to FreeBSD showed the differences in our locking and how/when driver threads run. I will rant about this another day.

Encryption key programming


The encryption key programming is programmed using firmware calls, but net80211 currently expects them to be done synchronously. We can't sleep in the net80211 crypto key updates without changing net80211's locks to all be SX locks (and I honestly think that's a bad solution that papers over non-asynchronous code that honestly should just be made asynchronous.) Anyway, so it and the node updates are done using deferred calls - but this required me to take complete copies of the encryption key contents. It turns out net80211 can pretty quickly recycle the key contents - including the key that is hiding inside the ieee80211_node. This fixed up the key reprogramming and deletion - it was sometimes sending garbage to the firmware. Whoops.


What's next?


So what's next? Well, I want to land the ath10k driver! There are still a whole bunch of things to do in both net80211 and the driver before I can do this.

Add 802.11ac channel entries to regdomain.xml


Yes, I added it - but only for FCC. I didn't add them for all the other regulatory domain codes. It's a lot of work because of how this file is implemented and I'd love help here.


Add MU-MIMO group notification


I'd like to make sure that we can at least support associating to a MU-MIMO AP. I think ath10k does it in firmware but we need to support the IE notifications.

Block traffic from being transmitted during a node creation or key update


Right now net80211 will transmit frames right after adding a node or sending a key update - it assumes the driver is completing it before returning. For software driven NICs like the pre-11ac Atheros chips this holds true, but for everything USB and newer firmware based devices this definitely doesn't hold.

For ath10k in particular if you try transmitting a frame without a node in firmware the whole transmit path just hangs. Whoops. So I've fixed that so we can't queue a frame if the firmware doesn't know about the node but ...

... net80211 will send the association responses in hostap mode once the node is created. This means the first association response doesn't make it to the associating client. Since net80211 doesn't yet do this traffic buffering, I'll do it in ath10k- I'll buffer frames during a key update and during node addition/deletion to make sure that nothing is sent OR dropped.

Clean up the Linux-y bits


There's a bunch of dead code which we don't need or don't use; as well as some compatibility bits that define Linux mac80211/nl80211 bits that should live in net80211. I'm going to turn these into net80211 methods and remove the Linux-y bits from ath10k. Bjorn's work to make linuxkpi wifi shims can then just translate the calls to the net80211 API bits I'll add, rather than having to roll full wifi methods inside linuxkpi.


To wrap up ..


.. job changes, relationship changes, having kids, getting a green card, buying a house and paying off old debts from your old hosting company can throw a spanner in the life machine. On the plus side, hacking on FreeBSD and wifi support are fun again and I'm actually able to sleep through the night once more, so ... here goes!

If you're interested in helping out, I've been updating the net80211/driver TODO list here: https://wiki.freebsd.org/WiFi/TodoStuff . I'd love some help, even on the small things!


Sunday, November 3, 2019

HF phasing noise eliminators, or "wow ok this mostly works"

I have a long standing issue with noise from a neighbour down the road that generates super wideband noise on the HF and low VHF bands. It's leaking out into the powerlines. It's pretty amazing.

Ok, so whilst the FCC goes and figures this out with the house in question, what can I do?

Well, one thing is to use a HF phasing noise eliminator. It's basically a pair of amplifiers and a phase inverting mixer. The "noise" antenna is a low gain antenna that's picking up the noise you want to filter out, but not sensitive enough to actually act as a longer distance HF antenna. You adjust the gain of your main input antenna and the gain of the noise antenna to be mostly the same level and then adjust the phase delay to cancel out the local noise source.

I bought a kit from Russia (http://www.ra0sms.ru/) and .. well, it worked. Pretty well. I used a discone antenna with a vertical radiator for 25/50MHz operation as my "very deaf" noise reception antenna and mixed in the signal from my HF antennas and .. well, it worked fine on 7MHz and 3.5MHz. I think it's a bit too deaf for 3.5MHz operation though; I bet it'd work better on higher frequencies because my noise antenna can start picking up more of the local area noise.

What about 1.8MHz? The 160m band?

Well, this is where the fun begins. The short version is - all I heard was lots of AM radio stations where they shouldn't be. The long version is - I am surrounded by super loud (>10kW) AM radio stations that you definitely need a bandpass filter for. The HF radios I have are good enough to filter them out, but the noise eliminator? It's doing all its mixing before there's any band filtering, so it's also mixing in all the AM broadcast band crap.

So this points out another couple things I need to do - I need to add a little bandpass filter on the noise antenna, and I need a proper HF preselector (ie, a narrow adjustable bandpass filter) on the main radio receiver. That way the transistors doing amplification/mixing in this phaser doesn't get swamped by crap that you're not trying to receive.

(For those who are radio inclined - I'm a few miles from 1100KHz (KFAX) which is a 50KW AM radio station - which shows up as S9+30 or more here. I have a bunch of others that show up as s9+20; so I have a lot of AM noise here..)

Friday, September 27, 2019

Fixing up KA9Q-unix, or "neck deep in 30 year old codebases.."

I'll preface this by saying - yes, I'm still neck deep in FreeBSD's wifi stack and 802.11ac support, but it turns out it's slow work to fix 15 year old locking related issues that worked fine on 11abg cards, kinda worked ok on 11n cards, and are terrible for these 11ac cards. I'll .. get there.

Anyhoo, I've finally been mucking around with AX.25 packet radio. I've been wanting to do this since I was a teenager and found out about its existence, but back in high school and .. well, until a few years ago really .. I didn't have my amateur radio licence. But, now I do, and I've done a bunch of other stuff with a bunch of other radios. The main stumbling block? All my devices are either Apple products or run FreeBSD - and none of them have useful AX.25 stacks. The main stacks of choice these days run on Linux, Windows or are a full hardware TNC.

So yes, I was avoiding hacking on AX.25 stuff because there wasn't a BSD compatible AX.25 stack. I'm 40 now, leave me be.

But! A few weeks ago I found that someone was still running a packet BBS out of San Francisco. And amazingly, his local node ran on FreeBSD! It turns out Jeremy (KK6JJJ) ported both an old copy of KA9Q and N0ARY-BBS to run on FreeBSD! Cool!

I grabbed my 2m radio (which is already cabled up for digital modes), compiled up his KA9Q port, figured out how to get it to speak to Direwolf, and .. ok. Well, it worked. Kinda.

Here's my config:

ax25 mycall CALLSIGN-1
ax25 version 2
ax25 maxframe 7
attach asy 127.0.0.1:8001 kissui tnc0 65535 256 1200

.. and it worked. But it wasn't fast. I mean, sure, it's 1200 bps data, but after digging I found some very bad stack behaviour on both KA9Q and N0ARY. So, off I went to learn about AX.25.

And holy hell, there are some amusing bugs. I'll list the big showstoppers first and then what I think needs to happen next.

Let's look at the stack behaviour first. So, when doing LAPB over AX.25, there's a bunch of frames with sequence numbers that go out, and then the receiver ACKs the sequence numbers four ways:
  • RR - "roger roger" - yes, I ack everything up to N-1
  • RNR - I ack everything up to N-1 but I'm full; please stop sending until I send something back to start transmission up again
  • REJ - I received an invalid or missing sequence number, ACK everything to N-1 and retransmit the rest please
  • I - this is a data frame which includes both the send and receive sequence numbers. Thus, transmitted data can implicitly ACK received data.
I'd see bursts like this:
  • N0ARY would send 7 frames
  • I'd receive them, and send individual RR's for each of them
  • N0ARY would then send another 7 frames
  • I'd receive a few, maybe I'd get a CRC error on one or miss something, and send a REJ, followed by getting what I wanted or not and maybe send an RR
  • N0ARY would then start sending many, many more copies of the same frame window, in a loop
  • I'd individually ACK/REJ each of these appropriately
  • .. and it'd stay like this until things eventually caught up.


So, two things were going wrong here.

Firstly - KA9Q didn't implement the T2 timer in the AX.25 v2.0 spec. T2 is an optional timer which a TNC can use to delay sending responses until it expires, allowing it to batch up sending responses instead of responding (eg RR'ing) each individual frame. Now, since the KISS TNC only sends data and not signaling up to the applications, all the applications can do is queue frames in response to other frames or fire off timers to do things. The KA9Q stack doesn't know that the air is busy receiving more data - only that it received a frame. So, T2 could be used to buffer sending status updates until it expires.

N0ARY-BBS implements T2 for RR/RNR responses, but not for REJ responses.

Then, both KA9Q and N0ARY-BBS don't delay sending LAPB frames upon status notifications. Thus, every RR, RNR and REJ that is received may trigger sending whatever is left in the transmit window. Importantly, receiving a REJ will clear the "unack" (unacknowledged) window and force retransmission of everything. If you get a couple of REJ's in a row then it'll try to send multiple sets of the same window out, over and over. If you get an RR and REJ and RR, it may send more data, then the whole window, then more data. It's crazy.

Finally, there's T1. T1 is the retransmisison timer. Now, transmitting a burst of 7 frames of full length at 1200 baud again takes around 2.2 seconds a frame, so it's around 15.4 seconds for the full burst. If T1 is less than that, then because there's no feedback about when the frames went out - only that you sent them to the TNC - you'll start to try retransmitting things. Now, luckily one can poll the other end using a RR poll frame to ask the other end to respond with its current sequence number - that way each end can re-establish what each others send/receive sequence numbers are. However, these can also be batched up - so whilst you're sending your frames, T1 fires generating another batch of RR's to poke the other side. This in itself isn't such a bad thing, but it does mean the receiver sees a big, long burst of frames followed by a handful of RR polls. Strictly speaking this isn't ideal - you're only supposed to send a single poll and then not poll until you get a response or another timeout.

So what have I done?

I'm doing what JNOS2 (another KA9Q port) is doing - I am using T2 for data transmission, RR, RNR and REJ transmission. It's not a pretty solution, but it at least stops the completely pointless retransmission of a lot of wasted data. I've patched both N0ARY and KA9Q to do this, so once KE6JJJ gets a chance to update his BBS and the N0ARY BBS I am hoping that the bad behaviour stops and the BBS becomes useful again for multiple people.

Ok, so what needs to happen?

Firstly, we've learnt a lot about networking since the 80s and 90s. AX.25 is kinda part TCP, part IP, so you'd think it should be fine as a timer based protocol. But alas no - it's slow, it's mostly half duplex, and overflowing transmit queues or resending data incorrectly has a real cost. It's better to look at it as a reliable wireless protocol like what 802.11 does, and /not/ as TCP/IP. 802.11 has timers, 802.11 has sequence numbers, and 802.11 tries to provide as reliable a stream as it can. But it doesn't guarantee traffic; if traffic takes too long it'll just time it out and let the upper layer (like TCP) handle the actual guarantees. Now, you kind want the AX.25 LAPB sessions to be as reliable as possible, but this gets to the second point.

You need to figure out how to be fair between sessions. The KA9Q stacks right now don't schedule packets based on any kind of fairness between LAPB or AX.25 queues. The LAPB code will try to transmit stuff based only on its local window and I guess they treat retransmits as something that signals they need to back off. That's all fine and dandy in theory but in practice at 1200 bps a 7 packet window at 256 bytes a packet is 7*2.2 seconds, or 15.4 seconds. So after 15.4 seconds if the remote side immediately ACKs and you then send another 7 packet burst, noone is going to really get a chance to come in and talk on the BBS.

So, this needs a couple things.

Firstly, just because you can transmit a maximum window of 7 doesn't mean you should. If you see the air being busy, maybe you want to back that off a bit to let others get in and talk. Yes, it does mean the channel is being used less efficiently in total for a single session, but now you're allowing other sessions to get airtime and actually interact. Bugs aside, I've managed to keep the N0ARY BBS tied up for minutes at a time squeezing me tens of kilobytes of article content. That's all fine and dandy, but I don't mind if it takes a little longer when there are other users trying to also do stuff.

Next, the scheduling for LAPB shouldn't just be timers kicking off packet generation into a queue, and then best effort transmission at the KISS TNC. If the AX.25 stack was told about the data link status transitions - ie, between idle, receiving, transmitting and such - then when the air was free the TNC could actually schedule which LAPB session(s) to service next. I've watched T1 (retransmission) and T2 kick over multiple times during someone else downloading data, and when the air is eventually busy an AX.25 node sends multiple copies of both I data payload and S status frames (RR, RNR, REJ, probes, etc.) It's insane. The only reason it's doing this is because it doesn't know the TNC is busy receiving or transmitting and thus those timers don't need to run and traffic doesn't need to be generated. This is how the MAC and PHY layers in 802.11 interoperate. The MAC doesn't queue lots of packets to be sent out when the PHY is ready - the MAC has the work there, and when the PHY signals the air is free and the contention window timer is expired, the MAC signals to get the air and sends its frame. It sends what it can in its given time window and then it stops.

This means that yes, the KISS TNC popularity is part of the reason AX.25 is inefficient these days. KISS TNCs make it easy to do AX.25 packets, but they make it super easy to do it inefficiently. The direwolf author wrote a paper on this where he compared these techniques to just using the AX.25 stack (and AX.25 2.2 features) which have knowledge of the direwolf physical/radio layer. If these hooks were made available over the KISS TNC interface - and honestly, they'd just be a two byte status notification saying that the TNC is in the { idle, receiving, decoding, transmitting } states - then AX.25 stacks could make much, much smarter decisions about what to transmit and when.

Finally - wow this whole thing needs per packet compression already. AX.25 version 2.2 introduces a way of negotiating parameters with remote TNCs for supported extensions and so one of my medium term KA9Q/N0ARY goals is to introduce enough version 2.2 support to negotiate SREJ (selective rejection/retransmission) and maybe the window size options, but primarily to add compression. I think SREJ + per packet compression would be the biggest benefits over 1200 and 9600 bps links.

If you're interested, software repositories are located below. I encourage people to contribute to the KE6JJJ work; I'm just forked off of it (github username erikarn) and I'll be pushing improvements there.

Oh, and these compile on FreeBSD. KA9Q and direwolf both compile and run on MacOSX but N0ARY-BBS doesn't yet do so. Yes, this does mean you can now do packet radio on FreeBSD and MacOSX.



Wednesday, June 19, 2019

wow, it's been a while

Well, it has been a while. I've been busy with a new job (Facebook/Oculus) which has taken some adjustment to get to. My new baby girl Alice is now 1 and a half years old and a redhead handful of gremlin energy. Nora (who is almost 5 now) is now inquisitive, playful and talkative. And daddy needs some sleep.

But, hey, I have been playing with radios and slowly getting back into FreeBSD wifi hackery. So hopefully I will write a few posts here over the next few months to catch up on what I've been doing.

Field Day 2019 is this weekend. I hope to bring Nora upstairs for a few hours to work some stations with me from my home setup. I'm hoping we can work a handful of modes on different bands during the day.

Saturday, August 4, 2018

Aligning a TS-430S, or "wait, how am I supposed to check FM again?"

I'm fixing another (I know I know) TS-430S for a friend. Yes, this means I'm returning it back to them. After all of the repairs I had to do to get the thing up and going reliably I did an RX carrier calibration. It was a little bit off - a combination of using WWV at 10MHz and the scope to calibrate CW, USB and SSB.

However, the AM and FM carriers didn't at all meet the expectations of the service manual. Notably, the AM carrier is seemingly the same as the USB carrier on transmit and there isn't one on receive. The FM carrier just didn't appear during receive or transmit. But .. it's transmitting FM.

Now, I need to go get the TS-430Ses I've fixed and compare the carrier behaviour to the other rigs, but .. well, they work on AM/FM receive and transmit. So ok, let's figure it out.

The AM carrier matches the USB carrier. It's weird because the circuit has an AM/FM carrier crystal however.. yeah, AM carrier here is linked to the USB carrier. I need to figure that out. And the FM transmit has no power control - it's 100W carrier only. So the only way to do it without dumping 100W out into the finals whilst adjusting it is to remove the RF drive output on the RF board (which feeds the finals with RF), attach a 50 ohm resistor across it and check the final RF carrier signal on the scope. This worked mostly OK but since there's no ALC feedback, the output is .. very distorted. Now, I don't know if these rigs were supposed to output a clean sine wave at all carrier output settings but .. well, they're very loud signals on lower bands, sometimes more than 8V peak-to-peak, which is almost triple what you need to feed the finals to get 100W out. So I got it in the ballpark - because well, the thing is not outputting a true sine wave here because the carrier output is way too high - and then had to resort to checking using a directional coupler and the scope.

Now, this isn't too bad - I was in the rough right spot for the FM carrier frequency anyway, and I can key down for a few seconds at a time without making things sad. But, this step was delayed until I verified the finals were working and that took a lot of work to get right. It turns out it was on the nose anyway after all of that and FM modulation now works great.

So - if you're aligning a TS-430S, the AM/FM carrier bit in the service manual may not be entirely correct.