Althea Development Update #66: New dashboard, exit redundancy
Althea now supports multihomed exits, meaning a single exit can be run across multiple datacenters and bridged into the mesh like any other alternate route. The other major change is the creation of a 'low balance' mode.
What's New In Beta 3?
- Dashboard overhaul!
- Local low balance warnings
- Automated failover at the exit level
- Phone registration for exits
- During periods of very low balance (below 10c) routers will throttle locally, preventing disruption to downstream clients and the tricky situation of entering and leaving the free tier every time the balance hits zero
- Temporary fix for peer discovery on particularly slow machines
This release mostly focused on improving exit resiliency. Althea now supports multihomed exits, meaning a single exit can be run across multiple datacenters and bridged into the mesh like any other alternate route.
The other major change is the creation of a 'low balance' mode. It's mostly a solution to the problem of a relay node running their balance to zero.
Because traffic is encrypted it's actually not possible for the upstream node to identify the non-paying relays traffic and only apply the free tier to that. Instead they must put everyone south of that relay onto the free tier.
Obviously this isn't good, so by having nodes voluntarily throttle when they reach 10c balance we can at least keep this from happening by accident.
What's coming next
Multi-homed exits where sort of an existential issue, as our networks grow downtime becomes even more critical to avoid, so they took priority over debugging some of the remaining underpayment issues.
Beta 4, if I have my way, will entirely focus on fixing two bugs.
- Underpayment
As a pure percentage the number of users underpaying is now about 20%, down from around 50% before the last release. It's also dramatically less severe, with the average underpaying node only underpaying by 15%.
The remaining problems will require a more subtle and research-focused touch to really get this down to zero.
2. Lack of recovery
When the gateway or the exit goes down or is restarted for whatever reason there is about a 5% chance that a node will not reconnect properly.
I've been chasing this one for a long time, it's difficult to gather data on a bug when the bug is defined by a node not being online to tell you anything.
I think I've finally picked up the trail after debugging a strange issue brought on by remove the startup wait for Rita. I'm speculating that PeerListener hits some error code when sending it's multicast packet, doesn't handle it and drops the interface from it's watch list.
These are the sorts of issues you need a decent number of real users to encounter at all, so I'm glad we've gotten far enough to tackle fixing them.