Althea Development Update #73: For want of a transaction
In my last blog post I focused on improved testing specifically for payment functions.
Most of my dev updates in the 40-60 range focus on stabilizing the routing software and we're getting into the tail end of that same process with payments. Many of the fixes in our Beta 7 release focus on payment bugs that affect as little as half a percent of all payments.
We're down to about one payment instability event that requires human intervention a week across the whole network. I think we'll be seeing one a month this coming month.
After more than a year of this stabilization treatment Althea's network stack is robust enough that networking failures due to the point to point radio software are more common than Althea device failure.
For want of a single transaction
How two http call timeouts took down our test network for about 30 minutes.
You can see a graph of payments between the gateway and the exit here, with the debt in Wei graphed over time. You'll notice how the paymetns become more frequent as we move into internet prime time and more traffic is being moved and paid for.
Around midnight PST a confluence of high traffic activities where maxing out our backhaul connection. This wouldn't normally be a problem except for a confluence of failures.
Transaction 0x744c7b7b... was part of a flurry of more than 3 transactions to keep up with the traffic spike. The gateway was successfully notified of the payment, but when it went to check if the payment was valid the request to our Ethereum fullnode timed out.
This normally isn't a problem, we load balance across two providers at random.
Except a minute later when the gateway tried to validate that same payment again it chose the same failed node at random.
It wouldn't get a third chance.
Without that payment there was a very short window where the gateawy determiend the exit had exceeded it's grace payment amount. The gateway began enforcement action by reducing the exit's bandwidth.
This set off a chain reaction as a fully saturated backhaul connection was limited to a mere 1mbps. The gateway had accidentally made itself unable to check any transactions and thus made recovery impossible.
Manual intervention restored functionaltiy within 30 minutes.
In response to this incident we started running the payment validation process every 5 seconds rather than every minute.
As a longer term solution this problem is part of the gateway client corner case. Where users expect a gateawy to also act as a client and it causes no end of special case requirements. We will probably move to doing things like verifying payments over the backhaul directly rather than via an exit.
Headline feature in Beta 8, stablecoin payments
We've finally merged the token bridge module. What this does is that it takes Ethereum deposited into the router and converts it to the DAI stable coin using Uniswap before passing it over to the Xdai sidechain.
This lets users take advantage of the relative ease of buying Ethereum without having to expose themsevles to price changes while using their Althea device.
It also provides insulation from gas price changes and other network disruptions that may happen on Ethereum.
Transacting in native Ethereum will of course remain an option on the routers.
You may recognize that this is very similar to what we've been talking about with the Althea blockchain, the main difference is that the Xdai chain is 'proof of authority' meaning you must be approved to become a validator/miner.
While this is somewhat decentralized it's not as open as we would like for a long term solution. Our chain will be using a more open system where anyone can join as a validator.
What's new in Beta 7
- Upgraded to OpenWrt v18.06.4
- Experimental Xdai bridge support, routers in xdai mode will bridge over ETH automagically when deposited
- DAO address based pricing oracle more closely reflects eventual functionality
- Fixes to rare payment failures
- Fixes for importing backed up private keys
- Fixes for exits behaving improperly when private keys where imported
- Fixes for DAO payments sometimes causing the router to crash
- Long term memory for router debts to prevent restarts resulting in not paying other people
- Fix for the dashboard thinking routers had a password set when they did not
- Added a version toggle that allows devices to opt into early access updates
- Users can now visit althearouter.net instead of 192.168.10.1 to access their own router dashboard.