I've been thinking about how working from home means my wife and I are at the mercy of Virgin Media, and in 2022 we had a few outages. Nothing prolonged, but I wanted to add some automation and resiliency to ensure our internet connection is as stable as possible. I wanted to do it for the most reasonable cost without another contract for a landline connection, so 4G seemed the obvious choice.
During any outages we've tethered to our phones which is fine - but not great. Using O2 as our mobile provider means that we're still at the mercy of Virgin Media as, now they've merged, there's a potential that O2 mobile data will be sent over VM backhaul: https://community.virginmedia.com/t5/Tech-Chatter/Could-Virgin-Leverage-O2s-backhaul-to-add-new-POPs-for-broadband/td-p/4905489
This is just speculation though, I couldn't find anything concrete.
Both BT and Vodafone offer "unbreakable" wifi which consists of a 4G backup dongle which is plugged into the router they provide you with. From their respective websites:
I don't like how vague this is - what's a "short time" or a "total loss of service"? How will it deal with sporadic routing issues or test the quality of your connection?
My home network is modular - I prefer to have specific devices fulfilling their own roles, to make it easier to replace faulty equipment or upgrade one piece when new technologies come out. I use a Ubiquiti EdgeRouter ER3-Lite as my router, connected to the Virgin Media SuperHub in modem mode. Connected to the router is a TP-Link PoE switch, and connected to that is a Ubiquiti Unifi UAP-FlexHD. This setup has been working really well, and I get full Virgin Media line speed (500Mbps+) over WiFi.
As the EdgeRouter Lite has three ethernet interfaces, and I was only using two, I wondered how difficult it would be to add a 4G modem to one of the ports and use this as an automated failover connection. It turns out, not that difficult!
First of all I had to find a 4G modem which was suited to being on all the time and I found the Teltonika TRB140 - usually used for IoT (Internet-of-Things) applications. I found one on eBay for a reasonable price (~£50), added a power supply and LTE antenna, and put a £5 ASDA Mobile SIM card (which runs on the Vodafone network) with 3GB of data in it just to check everything was working. The web interface is very easy to navigate, and it Just Worked (after receiving the SMS code from ASDA Mobile via the utility in the web interface to activate the SIM card and setting the mobile network APN (Access Point Name) to the correct one).
Running a speed test showed around 12Mbps down, and 20Mbps up. This isn't brilliant, but it's plenty to have one or two concurrent calls and keep connected during a Virgin Media outage. Latency was good and consistent at around 30ms pinging 188.8.131.52.
Once this was tested and working connected directly to my laptop, I changed the local IP of the modem, and connected it to the eth2 interface on the EdgeRouter. I added a static route so that traffic from my local network to the internal IP of the modem went out of the correct interface, and did the same for the SuperHub local IP (192.168.100.1) so that I can still check the status of both devices or modify configuration if needed.
I set the TRB140 to Passthrough mode, which means the EdgeRouter sees the "external" IP address of the modem (which is actually a CG-NAT 10.x.x.x address, not a real public IP) but is still accessible on the local IP where needed.
Once this was done, I needed to configure the EdgeRouter to understand what the two interfaces were for. I didn't want some traffic going out through the 4G modem and some going out through the Virgin Media connection, so I had to do some research. I found this page from Ubiquiti on WAN load-balancing, which makes a passing reference to failover. However, it expects you to use the wizard which will overwrite the existing configuration - I didn't want to do this. I ran the following commands through the CLI to set things up:
(eth0 is the VM SuperHub, eth1 is the switch (local network), eth2 is the 4G modem)
# enter configure mode configure # create a PRIVATE_NETS network group (note I didn't add 10.x/8 due to concerns with CG-NAT IPs and LAN addresses, I guess I could've been more specific) set firewall group network-group PRIVATE_NETS network 192.168.0.0/16 set firewall group network-group PRIVATE_NETS network 172.16.0.0/12 # modify firewall rules for private traffic set firewall modify balance rule 10 action modify set firewall modify balance rule 10 destination group network-group PRIVATE_NETS set firewall modify balance rule 10 modify table main # modify firewall rules for WAN traffic set firewall modify balance rule 20 action modify set firewall modify balance rule 20 destination group address-group ADDRv4_eth0 set firewall modify balance rule 20 modify table main set firewall modify balance rule 30 action modify set firewall modify balance rule 30 destination group address-group ADDRv4_eth2 set firewall modify balance rule 30 modify table main set firewall modify balance rule 110 action modify set firewall modify balance rule 110 modify lb-group G # local traffic set interfaces ethernet eth1 firewall in modify balance # WAN traffic set load-balance group G interface eth0 set load-balance group G interface eth2 commit save
Doing the above got the router using the 4G modem, and running
curl https://ident.me returned a Vodafone public IP. However, this is only one part - any connections from my local network could use the 4G connection. Let's add some availability testing and failover:
# For VM, ping 184.108.40.206 every 20s, 20s after the interface comes up, and count success or failure as 4 failed checks (80s minimum failover time) set load-balance group G interface eth0 route-test count success 4 set load-balance group G interface eth0 route-test count failure 4 set load-balance group G interface eth0 route-test initial-delay 20 set load-balance group G interface eth0 route-test interval 20 set load-balance group G interface eth0 route-test type ping target 220.127.116.11 # For 4G, ping 18.104.22.168 every 120s, 5s after the interface comes up, and count success as 4 checks (8 mins) and failure as 3 checks (6 mins) # these are higher as they're not as important as the VM checks set load-balance group G interface eth2 route-test count success 4 set load-balance group G interface eth2 route-test count failure 3 set load-balance group G interface eth2 route-test initial-delay 5 set load-balance group G interface eth2 route-test interval 120 set load-balance group G interface eth0 route-test type ping target 22.214.171.124 # Only use 4G as failover set load-balance group G interface eth2 failover-only # Load balance internal traffic set load-balance group G lb-local enable # When failing over, flush the connection tracking table set load-balance group G flush-on-active enable
I had originally set these checks to run way too frequently and got ICMP traffic blocked by 126.96.36.199, which caused my connection to fail over. Oops. But, this highlighted an issue that needed solved - when failing back to the Virgin Media connection, a number of devices were still using the 4G connection. It turned out we need to flush the connection table on a fail back - failing over does this with the
flush-on-active enable directive, but seemingly not the other way. I found this very helpful script which I set up at
/config/scripts/notification.sh: https://github.com/dennisb1/edgerouter-load-balancing-notification - this gives me an email when the status changes and I also added a small function to flush the connection tracking table:
if [ $INTF = "eth0" ] && [ $STATUS = "active" ] then /usr/sbin/conntrack -F fi
This was added to the load-balance group by running the following commands:
configure set load-balance group G transition-script /config/scripts/notification.sh commit save
There are some useful commands when diagnosing failover problems:
user@edgerouter:~$ show load-balance status Group G Balance Local : true Lock Local DNS : false Conntrack Flush: true Sticky Bits : 0x00000000 interface : eth0 reachable : true status: active gateway : <VM Gateway IP> route table : 201 weight: 100% fo_priority : 100 flows WAN Out : 270K WAN In : 1359 Local ICMP: 10914 Local DNS : 0 Local Data: 77376 interface : eth2 reachable : true status: failover gateway : <CG-NAT IP for 4G> route table : 202 weight: 0% fo_priority : 60 flows WAN Out : 0 WAN In : 2 Local ICMP: 1821 Local DNS : 0 Local Data: 0
admin@router:~$ show load-balance watchdog Group G eth0 status: OK pings: 10924 fails: 1 run fails: 0/4 route drops: 0 ping gateway: 188.8.131.52 - REACHABLE eth2 status: OK pings: 1822 fails: 51 run fails: 0/3 route drops: 2 ping gateway: 184.108.40.206 - REACHABLE last route drop : Thu Jan 5 11:16:34 2023 last route recover: Thu Jan 5 11:18:35 2023
I'm happy enough with how it's running, and it has been very stable since setting it up. I'll need to do a proper failover test, maybe by pulling the power out of the SuperHub coax to fibre converter so the interfaces remain up. I have received a couple of email notifications of brief failover events, and these line up with the Broadband Quality Monitor I have running at ThinkBroadband.
The only change I've made to it since is to swap to a Lebara SIM card which is £6.95/month for 15GB data - this should be more than enough. If our VM connection wasn't as reliable, there are unlimited data SIMs out there for just a bit more money per month.
If I was to do it again, I would spend a bit more on the 4G modem - the TRB140 is great but it only has a single antenna. The RUT240 has two antennas presumably to help with MIMO capability and deliver higher speeds. If I ever need to in the future though, I can easily replace the 4G modem - maybe with a 5G modem!