Althea Development Update #50: Putting out fires in production

Last update we created a ‘stable’ release channel and rolled out our latest and greatest to the 7 devices we had in production.

Last update we created a ‘stable’ release channel and rolled out our latest and greatest to the 7 devices we had in production.

By production I mean on real radios and in other people’s homes as a source of internet access. Despite performing admirably in testing everything, of course, immediately started crashing.

Earlier this week we managed to clean up all the race conditions that might cause odd crashes, reduce spam routes that leaked in form exits, and generally polish off a lot of rough edges. That’s all in Alpha 4

But we still had a problem.

Once the crashes where out of the way it became clear that radio performance had tanked. Your first instinct may be ‘well the new software doesn’t perform well’ that’s what we thought too.

TCP dump of dup acks trying to resolve out of order packets

But after a series of performance tests we where able to determine that the performance drop can be attributed directly to the traffic over the radios themselves.

It seems that when you send a ipv6 peer discovery packet on some Ubiquiti firmwares it causes a sudden and dramatic drop in throughput and dramatic latency spikes, even reordering of packets. Which does not a happy TCP session make. Definitely the sort of issue you only find in production.

We’re still working on if the solution to this is re-factoring our neighbor discovery protocol to be more restricted or if just updating the firmware on all radios involved is sufficient.

Speaking of those performance tests, we gathered some very interesting data. It seems our current software is mostly memory bound in terms of networking speed. On embedded devices like routers this means you want a lot of cores, the more or higher frequency the better, to handle all the copying. On more advanced platforms like x86 the clockspeed advantage goes away since there is dedicated hardware for memory copying and it simply becomes a memory frequency war.

As a summary a $15 router gets you 20mbps of encrypted throughput, $60 gets you 45mbps and $160 gets 300mbps. I couldn’t really test the throughput of anything faster than with the confidence I would like. I’m satisfied that we’ve reached a decent price to performance ratio now that we’ve got our general network structure finalized.

Deborah is busy on outreach, working on building an Althea community in Portland over the next couple of months.

They met up this weekend for a flashing party, playing around with our latest release and planning a network.

Coming up in the next couple of weeks is Alpha 5, which will include some new features in addition to more bugfixes. As well as an AMA hosted on Reddit’s mesh networking community, this coming Sunday the 17th.