4G Backup for Home Broadband
I've been thinking about how working from home means my wife and I are at the mercy of Virgin Media, and in 2022 we had a few outages. Nothing prolonged, but I wanted to add some automation and resiliency to ensure our internet connection is as stable as possible. I wanted to do it for the most reasonable cost without another contract for a landline connection, so 4G seemed the obvious choice.
During any outages we've tethered to our phones which is fine - but not great. Using O2 as our mobile provider means that we're still at the mercy of Virgin Media as, now they've merged, there's a potential that O2 mobile data will be sent over VM backhaul: https://community.virginmedia.com/t5/Tech-Chatter/Could-Virgin-Leverage-O2s-backhaul-to-add-new-POPs-for-broadband/td-p/4905489
This is just speculation though, I couldn't find anything concrete.
Both BT and Vodafone offer "unbreakable" wifi which consists of a 4G backup dongle which is plugged into the router they provide you with. From their respective websites:
I don't like how vague this is - what's a "short time" or a "total loss of service"? How will it deal with sporadic routing issues or test the quality of your connection?
My home network is modular - I prefer to have specific devices fulfilling their own roles, to make it easier to replace faulty equipment or upgrade one piece when new technologies come out. I use a Ubiquiti EdgeRouter ER3-Lite as my router, connected to the Virgin Media SuperHub in modem mode. Connected to the router is a TP-Link PoE switch, and connected to that is a Ubiquiti Unifi UAP-FlexHD. This setup has been working really well, and I get full Virgin Media line speed (500Mbps+) over WiFi.
As the EdgeRouter Lite has three ethernet interfaces, and I was only using two, I wondered how difficult it would be to add a 4G modem to one of the ports and use this as an automated failover connection. It turns out, not that difficult!
First of all I had to find a 4G modem which was suited to being on all the time and I found the Teltonika TRB140 - usually used for IoT (Internet-of-Things) applications. I found one on eBay for a reasonable price (~£50), added a power supply and LTE antenna, and put a £5 ASDA Mobile SIM card (which runs on the Vodafone network) with 3GB of data in it just to check everything was working. The web interface is very easy to navigate, and it Just Worked (after receiving the SMS code from ASDA Mobile via the utility in the web interface to activate the SIM card and setting the mobile network APN (Access Point Name) to the correct one).
Running a speed test showed around 12Mbps down, and 20Mbps up. This isn't brilliant, but it's plenty to have one or two concurrent calls and keep connected during a Virgin Media outage. Latency was good and consistent at around 30ms pinging 1.1.1.1.
Once this was tested and working connected directly to my laptop, I changed the local IP of the modem, and connected it to the eth2 interface on the EdgeRouter. I added a static route so that traffic from my local network to the internal IP of the modem went out of the correct interface, and did the same for the SuperHub local IP (192.168.100.1) so that I can still check the status of both devices or modify configuration if needed.
I set the TRB140 to Passthrough mode, which means the EdgeRouter sees the "external" IP address of the modem (which is actually a CG-NAT 10.x.x.x address, not a real public IP) but is still accessible on the local IP where needed.
Once this was done, I needed to configure the EdgeRouter to understand what the two interfaces were for. I didn't want some traffic going out through the 4G modem and some going out through the Virgin Media connection, so I had to do some research. I found this page from Ubiquiti on WAN load-balancing, which makes a passing reference to failover. However, it expects you to use the wizard which will overwrite the existing configuration - I didn't want to do this. I ran the following commands through the CLI to set things up:
(eth0 is the VM SuperHub, eth1 is the switch (local network), eth2 is the 4G modem)
# enter configure mode
configure
# create a PRIVATE_NETS network group (note I didn't add 10.x/8 due to concerns with CG-NAT IPs and LAN addresses, I guess I could've been more specific)
set firewall group network-group PRIVATE_NETS network 192.168.0.0/16
set firewall group network-group PRIVATE_NETS network 172.16.0.0/12
# modify firewall rules for private traffic
set firewall modify balance rule 10 action modify
set firewall modify balance rule 10 destination group network-group PRIVATE_NETS
set firewall modify balance rule 10 modify table main
# modify firewall rules for WAN traffic
set firewall modify balance rule 20 action modify
set firewall modify balance rule 20 destination group address-group ADDRv4_eth0
set firewall modify balance rule 20 modify table main
set firewall modify balance rule 30 action modify
set firewall modify balance rule 30 destination group address-group ADDRv4_eth2
set firewall modify balance rule 30 modify table main
set firewall modify balance rule 110 action modify
set firewall modify balance rule 110 modify lb-group G
# local traffic
set interfaces ethernet eth1 firewall in modify balance
# WAN traffic
set load-balance group G interface eth0
set load-balance group G interface eth2
commit
save
Doing the above got the router using the 4G modem, and running curl https://ident.me
returned a Vodafone public IP. However, this is only one part - any connections from my local network could use the 4G connection. Let's add some availability testing and failover:
# For VM, ping 8.8.8.8 every 20s, 20s after the interface comes up, and count success or failure as 4 failed checks (80s minimum failover time)
set load-balance group G interface eth0 route-test count success 4
set load-balance group G interface eth0 route-test count failure 4
set load-balance group G interface eth0 route-test initial-delay 20
set load-balance group G interface eth0 route-test interval 20
set load-balance group G interface eth0 route-test type ping target 8.8.8.8
# For 4G, ping 8.8.8.8 every 120s, 5s after the interface comes up, and count success as 4 checks (8 mins) and failure as 3 checks (6 mins)
# these are higher as they're not as important as the VM checks
set load-balance group G interface eth2 route-test count success 4
set load-balance group G interface eth2 route-test count failure 3
set load-balance group G interface eth2 route-test initial-delay 5
set load-balance group G interface eth2 route-test interval 120
set load-balance group G interface eth0 route-test type ping target 8.8.8.8
# Only use 4G as failover
set load-balance group G interface eth2 failover-only
# Load balance internal traffic
set load-balance group G lb-local enable
# When failing over, flush the connection tracking table
set load-balance group G flush-on-active enable
I had originally set these checks to run way too frequently and got ICMP traffic blocked by 1.1.1.1, which caused my connection to fail over. Oops. But, this highlighted an issue that needed solved - when failing back to the Virgin Media connection, a number of devices were still using the 4G connection. It turned out we need to flush the connection table on a fail back - failing over does this with the flush-on-active enable
directive, but seemingly not the other way. I found this very helpful script which I set up at /config/scripts/notification.sh
: https://github.com/dennisb1/edgerouter-load-balancing-notification - this gives me an email when the status changes and I also added a small function to flush the connection tracking table:
if [ $INTF = "eth0" ] && [ $STATUS = "active" ]
then
/usr/sbin/conntrack -F
fi
This was added to the load-balance group by running the following commands:
configure
set load-balance group G transition-script /config/scripts/notification.sh
commit
save
There are some useful commands when diagnosing failover problems:
user@edgerouter:~$ show load-balance status
Group G
Balance Local : true
Lock Local DNS : false
Conntrack Flush: true
Sticky Bits : 0x00000000
interface : eth0
reachable : true
status: active
gateway : <VM Gateway IP>
route table : 201
weight: 100%
fo_priority : 100
flows
WAN Out : 270K
WAN In : 1359
Local ICMP: 10914
Local DNS : 0
Local Data: 77376
interface : eth2
reachable : true
status: failover
gateway : <CG-NAT IP for 4G>
route table : 202
weight: 0%
fo_priority : 60
flows
WAN Out : 0
WAN In : 2
Local ICMP: 1821
Local DNS : 0
Local Data: 0
admin@router:~$ show load-balance watchdog
Group G
eth0
status: OK
pings: 10924
fails: 1
run fails: 0/4
route drops: 0
ping gateway: 8.8.8.8 - REACHABLE
eth2
status: OK
pings: 1822
fails: 51
run fails: 0/3
route drops: 2
ping gateway: 8.8.8.8 - REACHABLE
last route drop : Thu Jan 5 11:16:34 2023
last route recover: Thu Jan 5 11:18:35 2023
I'm happy enough with how it's running, and it has been very stable since setting it up. I'll need to do a proper failover test, maybe by pulling the power out of the SuperHub coax to fibre converter so the interfaces remain up. I have received a couple of email notifications of brief failover events, and these line up with the Broadband Quality Monitor I have running at ThinkBroadband.
The only change I've made to it since is to swap to a Lebara SIM card which is £6.95/month for 15GB data - this should be more than enough. If our VM connection wasn't as reliable, there are unlimited data SIMs out there for just a bit more money per month.
If I was to do it again, I would spend a bit more on the 4G modem - the TRB140 is great but it only has a single antenna. The RUT240 has two antennas presumably to help with MIMO capability and deliver higher speeds. If I ever need to in the future though, I can easily replace the 4G modem - maybe with a 5G modem!