Put on your night time sunglasses because I have a tale from our cyberpunk future, or present as it may be.
How a poorly coded smart contract disrupted an ISP in rural Oregon.
Around 3pm this Sunday we started experiencing degraded service for the entire Clatskine test network. An automated alert notified the on call network operator and the investigate commenced.
A few minutes of fervent log searching later a pattern emerged. transactions weere being published to the Ethereum full nodes successfully but not a single Althea transaction had made it into the blockchain for the last 5 hours.
While the bandwidth billing system is very resilient to temporary failures in payment it does have to cut off users who don't pay eventually. In our case the gateway determined it was being underpaid by the exit and started limiting the connection speed to the exit as an underpayment warning.
The bigger question of course was, why were the exits payments not getting through?
The answer to that comes in two parts. A poorly coded airdrop script and someone who was exploiting it.
You see this airdrop script suffered from a vulnerability we call a 'sybil attack' it allowed anyone with a small amount of Ethereum in their account to mint some small amount of this new token for free.
An enterprising hacker simply wrote a script that would take a large amount of Ethereum, split it into many new addresses with smaller balances and then call the contract. Allowing them to collect many hundreds or thousands of times the amount of minted token they were supposed to be limited to.
This script interfered with Althea specifically because both Althea and the hacker selected the same method for determining the gas price on our transactions.
The script was producing thousands of transactions with a 1Gwei gas fee, taken from the gas_price() endpoint of an Ethereum full node. That value is computed as the median of fees paid over the last several blocks.
Althea uses the same endpoint and there is flaw we were looking for. Since we chose the same gas fee as the contract attacker, our transactions had the same priority while being hundreds of times less prevalent. Resulting in almost none of them getting through.
If we had increased our tx fee value by even 10% Althea transactions would make it into the blockchain in less than 3 minutes. But because we trusted the median gas price endpoint our transactions were directly competing with the attacker and would take over two hours to get in a block, if they got in at all.
We patched our nodes to pay 50% more for gas and had the network back up and fully operational within 15 minutes of the first service degradation.
The Althea routers advanced traffic shaping kept the network slow but usable for the duration. We actually ended up with many more complaints about this weekends Google Cloud outage.
But the problem here is not over. The fix required human intervention and the level of 'meta' thinking to determine what the solution would be.
Althea is unusual in that autonomous micropayments for bandwidth have higher expectations on the level of automation. People shove networking gear into a cubbord and leave it there for years unattended. No one is going to come along and refresh the page or reboot the device.
In order to achieve that reliability we've slowly been building more and more awareness of the all the failure modes of blockchain payments into our own software. Modern blockchains are simply not reliable enough to treat as an abstracted payment system where a transaction is valid, submitted, and done.
Hopefully proof of stake based systems can reduce the variability of the transaction process as well as help reducing scaling issues. In the meantime we'll need to update Althea to inspect transactions in the mempool and automatically make the gas price adjustment required to win the next duel.
Beta 5 Released!
The Beta 5 release is out with some new features, the real headliner is that routers now estimate how long your existing balance will last based on previous usage.
- Client devices no longer come with WAN ports by default
- even more fixes to payment stability
- bandwidth history and payment history now stored and displayed on the dashboard
- changes to client billing that make it resilient to packet loss
- fix issue with binding to ports not on built in switches
- more fixes to port selection for peer discovery
- protection against payment replay attacks
- fix to babel parsing issue that could cause strange billing behaviour
As you can see I need to top up my personal router