ScottDotDot

Redundant VPN Tunnels via Different ISPs

Scott — Tue, 23 Apr 2019 18:57:12 +0000

Intro

My friends will tell you that I’m obsessed with redundancy, both in life and in I.T.

At home I have two main internet connections, via Altice Optimum (“cable”) and Verizon FiOS. They’re both relatively high bandwidth, and are connected to my two core routers that operate in an active/passive configuration. Basically this:

                                                                O------O
                                        +--------+             /        \
                                        |        |------------/          O
+------------------+--------------------|  Core  |           /          /
|  Optimum Router  |                    | Router |----------O          /
+------------------+\     ______________|   01   |           \        O
                     \   /              |        |------------O        \
                      \ /               +--------+           /          \
                       X          Keepalived |              /  Various   O
                      / \          Heartbeat |             O  Networks  /
                     /   \              +--------+          \          O
+------------------+/     \_____________|        |-----------\          \
|   FiOS Router    |                    |  Core  |            \          O
+------------------+--------------------| Router |-------------O        /
                                        |   02   |            /        /
                                        |        |-----------O        /
                                        +--------+            \      /
                                                               O----O

Hmmm.. I can’t tell if that thing on the right looks like a cloud or a turd. Probably the latter. I’ll skip the ASCII “art” next time.

But is that really enough? Ever since “hurricane” Sandy I’ve been worried about losing both FiOS and Optimum simultaneously. It’s never happened due to a coincidence of network failures on both providers, but it’s a different story if a tree takes out the lines.

Enter Sprint. Many years ago, I configured a Netgear 6100D from Sprint to act as an emergency failover (and backdoor) so some things would stay up and running in the event of a failure. But lately I started thinking about the scenario of a core router failure.

Now, I should point out that, aside from misconfiguration oopsies on my end, I’ve never had a complete failure of both core routers.

Nonetheless, wouldn’t it be better to have yet another router — sorta seperate from the other two — in case they go down for whatever reason? And wouldn’t it be yet better if that new router wasn’t reliant on the Optimum and FiOS lines? And wouldn’t it be even superer betterer if the new router also had two independent internet connections?

Yes.

This isn’t as costly as it sounds, btw. My routers are just commodity hardware (right now they all happen to be Dell T110 II chassis with a bunch of NICs giving 12 ports per router).

The Sprint connection costs ~$15/mo (after taxes and fees) for 1GB per month (more than enough for the veritable trickle of pings that run through it on a regular basis).

And it was cheap enough for me to add a second cell connection via T-Mobile’s network, because I have Google Fi (aka Project Fi) which provides free “data only” SIMs that operate on TMo. (Note that a full Fi phone will choose the best connection amongst TMo, Sprint, and Something Cellular.) The “data only” SIM shares its allowance with my regualr Fi user account, so the cost there is negligible. I did, however, purchase a Netgear LB1121 which is a very simple 4G LTE to Ethernet “adapter” (to call it a router would do disservice to actual routers).

To be fair, I think the ASCII diagram was better.

The one thing that might be perplexing about this diagram is the External Backup VPN01 machine in the lower-right.

Perhaps needless to say, the Sprint and TMo connections won’t have static IPs. To make matters worse, they’ll only have one IP each. I did prevously use dynamic DNS with the Sprint device, but the Netgear 6100D is a HUGE pile of shit.*

*The biggest embarassment for the 6100D is that it comes with a telnet interface exposed. Which you can’t turn off. Which has no password. Which lets you view AND EDIT the config files for the entire device. Oh, and did I mention that a config file includes the admin password? IN PLAIN TEXT? Disgusting.

Besides, dynamic DNS would still only afford me one non-redundant IP per connection, and cellular network IPs can change very frequently.

Hence I spooled up an Amazon EC2 instance and installed OpenVPN on it. The backup router at my house connects to it via two independent tunnels, such that if one internet connection/VPN tunnel goes down, traffic will still flow on the other one.

Network Interface Naming

It took me a shockingly long time to figure out that this was a good idea, but I change the udev rules on my systems to rename the network ports to something logical. Usually it’s the name of the network to which the port is connected. So, for example:

File: /etc/udev/rules.d/70-persistent-net.rules

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:1f:29:5a:c5:d7", ATTR{type}=="1", KERNEL=="eth*", NAME="ethdev"

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="00:1f:29:5a:c5:d6", ATTR{type}=="1", KERNEL=="eth*", NAME="ethgst"

SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="90:e2:ba:69:bf:91", ATTR{type}=="1", KERNEL=="eth*", NAME="ethmgt"

That’s a snippet from one of my core routers. (Note that I’m using CentOS/RedHat; The location and format of that file may differ.) The interface names are bolded, and correlate this way:

ethdev = Development network
ethgst = Guest network
ethmgt = Management network

Of course, if you rename the interfaces here you’ll have to rename them anywhere else. grep -R eth0 /etc/* 2> /dev/null should find every existing use of eth0 if, for example, that were the name of the interface before the change. Particularly look at your network configuration scripts (/etc/sysconfig/network-scripts/ifcfg-* in my case) and your firewall rules which may or may not specify interface names.

Strictly speaking, it’s not necessary to start the interface name with “eth“, but I stick with that to distinguish, for example, hardline ethernet interfaces from VPN tunnel or WLAN interfaces.

And likewise I also name the VPN tunnels, usually based upon what’s on the opposite end of the tunnel. But in the case of this article, I named them based upon the ISP via which the traffic transits.

OpenVPN Server Configuration Files

I’m using OpenVPN 2.4.7. If you’re using a different version, the options presented here may differ. But this should be acceptable for many a version.

Per my poorly construed diagram above, I want to connect a router at my house (rtr-backup01) to an Amazon EC2 instance in “the cloud” (ext-backup-vpn01).

The EC2 host is a nano instance, incidentally. One CPU core, 1GB RAM, and 8GB disk space. That’s actually more than what’s required for this purpose, so don’t go overboard in a similar circumstance.

There will be two VPN tunnels connecting those two hosts, which will be redundant to each other. One tunnel will be connected via Sprint, and the other via T-Mobile.

Here’s the tmobile server config:

(I show the sprint configs all together down below so you can see the differences, though they’re broadly similar.)

port 1199
proto tcp
dev tuntmobile
ca ext-backup-vpn01/ca.crt
cert ext-backup-vpn01/ext-backup-vpn01.crt
key ext-backup-vpn01/ext-backup-vpn01.key
dh ext-backup-vpn01/dh2048.pem
server 10.208.3.0 255.255.255.0
push “route 172.31.41.125 255.255.255.255″
push “route 172.31.41.126 255.255.255.255″
push “route 10.71.246.0 255.255.255.0″
client-connect ext-backup-vpn01/ccd/client-connect-tmobile.bsh
client-disconnect ext-backup-vpn01/ccd/client-disconnect-tmobile.bsh
route-metric 10
client-config-dir ext-backup-vpn01/ccd
topology p2p
cipher AES-128-CBC
comp-lzo
tcp-nodelay
persist-key
#persist-tun
keepalive 5 30
status /var/log/openvpn/ext-backup-vpn01-tmobile.status
log /var/log/openvpn/ext-backup-vpn01-tmobile.log
verb 3
mute 20

The port, protocol, and dev fields are pretty standard and self explanatory.

Same goes for the ca, cert, key and dh fields. I won’t get into the generation of certificates (etc.) here, but there are plenty of good tutorials on the subject.

server must be different between the two tunnels, otherwise it’ll lead to confusion when trying to route traffic. This essentially defines the network that will be used within the VPN tunnel, between the server and client. (In this case there’s only ever going to be one client, but all clients would be allocated an address in this space.)

The push commands tell the clients which networks are accessible via the tunnel, on the server side. In this example, the two addresses beginning with 172.31.41 are the private network addresses of the EC2 instance, as assigned by Amazon. The network 10.71.246.0 is used by a different VPN instance, allowing me to connect to ext-backup-vpn01 from anywhere.

These are the two most important configuration items, at least as far as making these redundant tunnels function properly:

client-connect and client-disconnect specify shell scripts that are run when the client connects and then disconnects, respectively. In my case, the purpose of those scripts is to establish routes to the networks behind each client when they connect, and to tear down those routes when they disconnect. I’ll post the full code for those below.

route-metric is essentially ignored, as the two scripts mentioned above set the routes and their metrics. Usually this setting would be used to establish the metric for routes created by OpenVPN, e.g. with the route configuration option. I left it in the config as a reminder: The tmobile routes have a metric of 10 whereas the sprint routes have a metric of 20.

client-config-dir points to a directory that contains various configuration options specific to each client. I’ll also show that below.

topology p2p specifies that it’s a point-to-point configuration. (Not valid when using Windows.) Here’s a more robust discussion of that option.

cipher, comp-lzo, and persist-key are pretty standard options. See the OpenVPN reference manual for more info on these and all other options.

persist-tun may be essential for other use cases, as it causes the tunnel interface (i.e. tuntmobile) to remain even when there’s no connectivity between server and client. You may have some scripts or programs that rely on finding your tunnel’s interface, or it may be referenced elsewhere. For example, I’m not sure what would happen if you referenced a transient network interface in your iptables config. In my case, I want the tunnel interface to be torn down when the tunnel isn’t established.

Another important option: keepalive [interval] [timeout]. The interval parameter is the frequency at which the client “pings” the server to determine if the tunnel is still up. The timeout parameter is the amount of time without a successful ping that would elapse before OpenVPN decides the tunnel is actually down. Importantly, when it decides the tunnel is down, the client-disconnect script is run.

You may need to fine-tune keepalive to suit your needs, but remember that the timeout is the minimum amount of time that the primary tunnel will be down before its routes disappear, thereby allowing the secondary tunnel to take over traffic.

Due to the routing metric of the tmobile tunnel being lower (10) than that of the sprint tunnel (20), tmobile is the primary tunnel. So when that connection goes down, it will take at least 30 seconds (but probably no more than 40-ish) for sprint to take over.

status, log, verb, and mute all relate to logging (and status, natch), and can be set as desired.

Client [Dis]connect Scripts

Incidentally, these scripts don’t need to live in the client-config-dir (named ccd), but that’s where I felt like putting them.

Note that they do need to be readable and executable by the OpenVPN process. So if, for example, openvpn runs in the user:group context of openvpn:openvpn, then you’ll want to chown openvpn:openvpn * and chmod ug+rx * for your scripts (where * would only reference the applicable scripts).

Also, your OpenVPN process must have the ablity to create routes in the kernel routing table (though you can use tables other than the main/default table). It can be useful, when troubleshooting, to run the OpenVPN process as root:root. Once everything is working, you can manipulate the user/group context.

Here’s what I have in the script referenced by client-connect ext-backup-vpn01/ccd/client-connect-tmobile.bsh (one also exists for sprint, and is shown much farther down on this page):

#!/bin/bash
while read ROUTE; do
ip route add $ROUTE via $ifconfig_local metric 10 >> /var/log/openvpn/client-connect-tmobile.log 2>&1
done < /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes
exit 0

And here’s client-connect ext-backup-vpn01/ccd/client-disconnect-tmobile.bsh (one also exists for sprint:

#!/bin/bash
while read ROUTE; do
ip route del $ROUTE via $ifconfig_local metric 10 >> /var/log/openvpn/client-disconnect-tmobile.log 2>&1
done < /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes
exit 0

Both of those files reference the file /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes, which in my case contains:

10.201.0.0/16
10.253.0.0/16
10.1.1.0/24
10.1.2.0/24
192.168.0.0/21
192.168.10.0/24
10.250.0.0/16
10.101.0.0/16
10.121.0.0/16
192.168.90.0/24
192.168.81.0/24

Each of the networks above are accessible on the client end of the tunnels.

The scripts iterate through each line of /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes, calling ip route add or ip route del to either establish or remove the routes when the client-connect or client-disconnect scripts are called.

The only difference between the client-connect and client-disconnect scripts above is that one contains add and the other contains del.

The only difference between the tmobile version of the scripts shown above and the sprint versions is the metric. (And, as you can see, the name of the log file.. which is not required, but may help with debugging.)

The astute viewers amongst you will say “WTF? That could all be done with one script!”

Kinda.

Because I’m running two separate instances of OpenVPN servers, each one needs both a connect and disconnect script. (That’s 4 total.) Those scripts could then call a single script which would do all the route manipulation. I dunno, what I have is pretty functional, but yes, it could be a bit more streamlined.

Note that OpenVPN sets a whole bunch of environment variables in the context of each script when calling it. See the OpenVPN reference manual for a full list. (The document doesn’t appear to have anchor tags, but search the page for “bytes_received”. That’s the first variable in the list.)

So you could have all sorts of caveats (if/then) and other functionality within those scripts. If you had multiple clients connecting to the same server instance, those variables would tell you who that client is, and as such you could take different actions for different clients. It’s actually a pretty robust arrangement.

The only environment variable I’m using is $ifconfig_local, which is the IP address of the server on its end of the VPN tunnel. So, in the examples above, 10.208.3.0 255.255.255.0 is the VPN’s network (defined by the server option in the config file), and so 10.208.3.1 is the server’s IP. Thusly, $ifconfig_local is 10.208.3.1.

The last bit of the configs are the client config directory files.

Here’s the contents of ext-backup-vpn01/ccd/client-tmobile01.

BTW, that directory is defined in the main OpenVPN config file by the parameter client-config-dir, and the file name (client-tmobile01) is the X509 name of the client certificate (defined when you created the certificate).

ifconfig-push 10.208.3.100 10.208.3.1
iroute 10.201.0.0 255.255.0.0
iroute 10.253.0.0 255.255.0.0
iroute 10.1.1.0 255.255.255.0
iroute 10.1.2.0 255.255.255.0
iroute 192.168.0.0 255.255.248.0
iroute 192.168.10.0 255.255.255.0
iroute 10.250.0.0 255.255.0.0
iroute 10.250.0.0 255.255.0.0
iroute 10.101.0.0 255.255.0.0
iroute 10.121.0.0 255.255.0.0
iroute 192.168.90.0 255.255.255.0
iroute 192.168.81.0 255.255.255.0

There is something important to note here: iroute does NOT create routes in the kernel routing table. That’s what the scripts above do.

iroute tells OpenVPN itself that it is capable of transiting traffic to that network. Hence every single one of those iroute commands correlates to a network in the file /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes, above. The routes need to be enumerated in both places.

(I only thought of this just now, but to avoid maintaining two different lists the client-[dis]connect scripts could iterate through the client config file and create a route in the kernel routing table for each of the iroute lines.)

OpenVPN Client Config Files

Here’s the OpenVPN conf file for the tmobile client:

client
dev tuntmobile
proto tcp
port 1199
local 10.222.3.5
remote 50.60.70.80
route-metric 10
resolv-retry infinite
persist-key
dh client-tmobile01/dh2048.pem
ca client-tmobile01/ca.crt
cert client-tmobile01/client-tmobile01.crt
key client-tmobile01/client-tmobile01.key
topology p2p
up-delay
cipher AES-128-CBC
comp-lzo
verb 3
status /var/log/openvpn/client-tmobile01.status
log /var/log/openvpn/client-tmobile01.log

There’s nothing too crazy on the client side, but there are a few things to discuss:

local 10.222.3.5 is the address of the ethernet interface which connects to the T-Mobile cell modem / “router” (the Netgear LB1121).

I’ve changed remote to a nonsense address to protect the innocent, but it’s the public (Elastic) IP of my EC2 instance on which the tmobile OpenVPN server runs.

up-delay is probably best defined by the OpenVPN reference manual:

Delay TUN/TAP open and possible –up script execution until after TCP/UDP connection establishment with peer.In –proto udp mode, this option normally requires the use of –ping to allow connection initiation to be sensed in the absence of tunnel data, since UDP is a “connectionless” protocol.

On Windows, this option will delay the TAP-Win32 media state transitioning to “connected” until connection establishment, i.e. the receipt of the first authenticated packet from the peer.

Needless to say, the client configuration for the sprint connection is nearly identical, and is shown below.

The Sprint-Related Files

Just for completeness, here are the full readouts of the files on the sprint server.

I marked in bold each place where the files differ from the tmobile files.

port 1198
proto tcp
dev tunsprint
ca ext-backup-vpn01/ca.crt
cert ext-backup-vpn01/ext-backup-vpn01.crt
key ext-backup-vpn01/ext-backup-vpn01.key
dh ext-backup-vpn01/dh2048.pem
server 10.208.2.0 255.255.255.0
push “route 10.71.246.0 255.255.255.0″
push “route 172.31.41.125 255.255.255.255″
push “route 172.31.41.126 255.255.255.255″
client-connect ext-backup-vpn01/ccd/client-connect-sprint.bsh
client-disconnect ext-backup-vpn01/ccd/client-disconnect-sprint.bsh
route-metric 20
client-config-dir ext-backup-vpn01/ccd
topology p2p
cipher AES-128-CBC
comp-lzo
tcp-nodelay
persist-key
#persist-tun
keepalive 10 30
status /var/log/openvpn/ext-backup-vpn01-sprint.status
log /var/log/openvpn/ext-backup-vpn01-sprint.log
verb 3
mute 20

Note that I used the same server certification authority, certificate, and key file for both servers. It’s perhaps not best practice, but honestly what does it matter… if someone compromises one tunnel’s encryption, then they compromise both. But they’re redundant connections serving the same purpose, so the risk is minimal. You may, of course, use completely different certificates for both.

client-connect-sprint.bsh:

#!/bin/bash
while read ROUTE; do
ip route add $ROUTE via $ifconfig_local metric 20 >> /var/log/openvpn/client-connect-sprint.log 2>&1
done < /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes
exit 0

client-disconnect-sprint.bsh:

#!/bin/bash
while read ROUTE; do
ip route del $ROUTE via $ifconfig_local metric 20 >> /var/log/openvpn/client-disconnect-sprint.log 2>&1
done < /etc/openvpn/ext-backup-vpn01/ccd/client-connection-routes
exit 0

ext-backup-vpn01/ccd/client-sprint01:

ifconfig-push 10.208.2.100 10.208.2.1
iroute 10.201.0.0 255.255.0.0
iroute 10.253.0.0 255.255.0.0
iroute 10.1.1.0 255.255.255.0
iroute 10.1.2.0 255.255.255.0
iroute 192.168.0.0 255.255.248.0
iroute 192.168.10.0 255.255.255.0
iroute 10.250.0.0 255.255.0.0
iroute 10.250.0.0 255.255.0.0
iroute 10.101.0.0 255.255.0.0
iroute 10.121.0.0 255.255.0.0
iroute 192.168.90.0 255.255.255.0
iroute 192.168.81.0 255.255.255.0

Here’s the configuration file on the sprint client.

client
dev tunsprint
proto tcp
port 1198
local 10.222.2.5
remote 10.20.30.40
route-metric 20
resolv-retry infinite
persist-key
#persist-tun
dh client-sprint01/dh2048.pem
ca client-sprint01/ca.crt
cert client-sprint01/client-sprint01.crt
key client-sprint01/client-sprint01.key
cipher AES-128-CBC
topology p2p
up-delay
comp-lzo
verb 3
status /var/log/openvpn/client-sprint01.status
log /var/log/openvpn/client-sprint01.log

In Conclusion

With both tunnels providing routes to my home infrastructure via Amazon’s network and my EC2 instance, I have the ability to have unlimited static, public IPs for the Sprint and T-Mobile connections.

Using iptables’ DNAT manipulation, I can reverse NAT those public IPs to any internal IP addresses I desire.

Moreover, I have a separate VPN server running on the EC2 instance which will allow me to connect to it, and therefore my entire infrastructure, using an OpenVPN client on one of my laptops, tablets, or phones. That’s particularly useful when I’m traveling and my network goes dark. Up until now, if my cable and FiOS connections went down or my core routers went down, I’d have no visibility as to what happened. This was also true if I became subject to a DDOS attack.

Finally, by having the backup router and backup internet connections, I can route outgoing mail through them as a redundant path. That means that my Zabbix servers (for system monitoring) and other scripts can communicate issues to me even during a widespread outage.

Overkill?

Definitely.

Fun?

Definitely.

Though your mileage may vary ;)

Gratuitous Pics

This is the backup router which maintains the VPN tunnels via Sprint and T-Mobile to Amazon’s network (and hence my EC2 instance). In addition to the connections for those two cellular ISPs, it also connects to my FiOS line for direct VPN access. The other connections are for various in-house networks.

This is the Netgear LB1121 which provides connectivity to T-Mobile’s network. It’s not exactly feature rich, but it serves the purpose of providing an ethernet port routed to T-Mobile. It does have PoE, though, which is pretty awesome. Here I’m just using the internal antennae, and as you can see I get mediocre service in the basement. (I may put this upstairs eventually… hmmm.)

This is the Netgear 6100D, providing connectivity via Sprint’s network. Even though the software of this device is terrible, it’s pretty good hardware-wise. It even has PoE! (But only on the WAN port for some bizarre reason. That’s why it has 2 ethernet cables running to it; One is just for power.) There’s also a coax cable attached to it, connecting to a directional antenna in my attic! I did a whole video about that install, which you can check out if you’re bored. :)

Netgear LG 6100D Sprint LTE Gateway – Advanced Configuration

Scott — Tue, 26 Aug 2014 22:07:19 +0000

Man, oh man! I was getting frustrated with my new 6100D LTE gateway from Sprint. In fact, I posted a very long rant about it yesterday.

This post is all about solutions.

Really, it’s about one very big solution:

http://`[Netgear 6100D Address]`/index.asp

What is that? Oh, not much, just the native Netgear configuration GUI.

It has about ten times the feature set of Sprint’s half-baked GUI. Seriously.

Already have a problem?

This didn’t happen to me at first, but I must have triggered some state within the 6100D that causes this screen to appear when returning to the Netgear GUI after having used the Sprint GUI:

If you find yourself redirected to this utterly pointless landing page, just change the path of the URL to /adv_index.asp (I assume you want the advanced config page).

Clicking “Take me to the Internet” uselessly takes you to Netgear’s site.

The good…

What can it do that the Sprint branded GUI can’t?

Static routes
Ability to turn the DLNA server off (Sprint doesn’t even mention it, but it’s enabled by default)
Multicast settings
UPnP advertisement settings (as opposed to just on or off)
Better port forwarding settings with port triggering
Wireless repeater settings
~~The ability to disable the WiFi radios~~ Update: Though this option exists, hitting the “Apply” button on the page does nothing.
FTP server settings
Email notification settings (for alerts and logs)
A DMZ server setting that lets you change all four octets
VPN passthrough settings
RIP settings
QoS settings
The menu system is generally organized in a logical fashion and it’s easy to navigate
The ability to send and receive SMS messages (It doesn’t work for me, but that’s probably because my plan doesn’t include SMS)

…and that’s just what I found on my first quick look.

Sprint completely crippled this device.

…the bad and the ugly

Sometimes the Netgear GUI redirects you to a page that asks if you want to use a wizard to configure the router or configure it manually. A minor annoyance.
The interface has a very 90s look and feel (as opposed to the Sprint-branded interface which is cleaner)
There’s a link to “documentation”, which opens up a window for the N600 Wireless Dual Band Gigabit Router*
I still can’t find a place to turn off the telnet console
Strangely the date and time settings only list “AU 2011-2012″ as a daylight savings time option.
They really can’t get their timezone knowledge together. On the Sprint GUI it lists “EST (Central Standard Time)”, and in the Netgear GUI the timezone options are “EST”, “CST”, and “WST” (which should be PST).
After using the Netgear web GUI and going back to the Sprint-branded GUI, it requires that you agree to the EULA again. This makes me think that at least one flag is getting wiped out when the Netgear GUI re-writes the config.

*Check out the N600 User Manual. It describes a lot of the 6100D settings in more detail than the Sprint documentation.

Major flaws

I don’t mean to harp on this, but it’s so significant that I can’t help it:

There is an an unprotected telnet server that cannot be turned off, requires no authentication, and lets anyone view and MODIFY the router’s config. This includes VIEWING THE ADMIN PASSWORD IN PLAINTEXT!

I’ve hit another bug twice now:

For no consistent reason that I can discern, the device will start flooding the LAN with IGMP (multicast) messages. For example:

15:25:09.027136 IP (tos 0xc0, ttl 1, id 0, offset 0, flags [none], proto IGMP (2), length 36, options (RA))
    10.222.2.1 > 224.0.0.1: igmp query v3

It’s creating these messages as fast as it can; When this is happening igmpproxy uses around 75% CPU with the remainder used for IO. The GUI also becomes unresponsive. Fortunately BusyBox (via telnet) does not, so a remote reboot is possible.

This IGMP activity lasts a few minutes, but then refreshing the GUI causes (?) it to start again. I haven’t spent a lot of time testing this issue, but it is a PITA.

Then we have a nice one where the router seems use THE MAC ADDRESS for port forwarding regardless of the actual IP setting.

For example, let’s look at this composite of screencaps:

What’s going on here?

On the top is the active port forwarding configuration of the 6100D, after a save. On the bottom is a tcpdump of the traffic between the 6100D and my core router.

The 6100D is sending traffic to 10.222.2.3 even though I have it set to send traffic to 10.222.1.1.

Where is it getting the IP address 10.222.2.3? Well, it’s right there in the lower right of the device settings. But that option is NOT selected.

Why is it showing that IP? Without getting into too much detail, I have two core routers running in a master/backup configuration. They each have a “real” IP on the 10.222.1.0/16 network (last octets being 3 and 4 respectively, as well as a VIP (last octet of 1).

The routers are also my VPN servers, so I want VPN traffic (in this example) sent to the VIP, which is 10.222.1.1. This way it doesn’t matter if the backup router takes over; The VIP will be reassigned to it and traffic will continue to flow.

The address of 10.222.2.3 came from a misconfiguration (my fault). I forgot to change that when I changed the VIP. That is no big deal in this case, because this is a /16 (class B) network, and so 10.222.2.3 and 10.222.1.1 can coexist on it just fine.

My misconfiguration is not the cause of the problem, because even after I changed the “real” IP on the router to 10.222.1.3, it still sent traffic to that IP instead of the VIP!

However, both the “real” IP and the VIP have the same MAC address. This shouldn’t be a problem either, because we only need to use the ARP table to find the MAC address for the IP, and not the other way around. Here’s the ARP table on the 6100D:

For some reason it’s picking the first MAC address and forwarding traffic there; I have no idea why they designed it like that!

Let’s dig into the config file (located at /WFIO/current.cfg in the 6100D’s unsecured BusyBox environment):

table=FWPortRedirectionConfig;
columns=Enable;Nickname;Protocol;WANPortStart;WANPortEnd;LANIPAddress;LANPortStart;INSTNUM;isPredefined;isMore;portMapIndex;HostName;Permissions;Le
0;Westell Modem Service VoIP SIP;udp;5060;5060;MODEMREDIRECT;5060;1;1;0;0;;0;0;0;;
1;Westell Modem Service Envoy;tcp;6363;6363;MODEMREDIRECT;6363;2;1;0;0;;0;0;0;;
1;Westell Modem Service Rip;udp;520;520;MODEMREDIRECT;520;3;1;0;0;;0;0;0;;
1;VPN (SMR);tcp;1199;1199;10.222.1.3;1199;4;1;0;3;10.222.1.1;GUI, TR069;0;0;d4:ae:52:d4:62:02;

First of all, why does it have two services enabled by default and not listed in the GUI?

Secondly, the last line is my entry. You’ll see that it has 10.222.1.3 listed as well as 10.222.1.1. Well, looking at the column headers it decided to stick in 10.222.1.3 as the LANIPAddress, with a HostName of 10.222.1.1.

So the setting in the GUI for “Internal IP address” is actually the setting for the host name!?!?

The worst part is that if I go into the BusyBox environment and manually change the LANIPAddress field to the correct IP, upon reboot it changes it right back. There’s no way to win with this thing!

The problem arises that both of my core routers have different MAC addresses. So if this thing is basing its decisions on the MAC address, what’s going to happen when the master fails and the backup takes over? The master’s MAC address will be offline. The VIP will still be online, but this thing may just ignore it.

(By the way, this is a testing environment. That’s why you don’t see an entry in the ARP table for the backup router’s IP.)

I could remove the “real” IP address from the routers and just use a VIP, but that is irritating from an administrative perspective because the backup router will be unaddressable on this network. Also, it may not solve the problem because the MAC address of the VIP will of course change in the event of a failure.

I also can’t give the master and backup the same MAC address, because that would confuse any device connected to this network.

Sigh. This will require more testing.

SOLVED!

The solution is simple, obvious, and of course it took a couple of hours to think of it:

Put the 6100D on its own /24 and give the core routers a VIP in that /24.

In other words, the LAN configuration on the 6100D is now:

IP: 10.222.2.11
Mask: 255.255.255.0

And the core routers now share:

VIP: 10.222.2.1 with a mask of 255.255.255.0

For administrative purposes (and expansion, etc) the master core router still holds its “real” IP of 10.222.1.3, but it’s now masked as a /24, and it still has a VIP of 10.222.1.1/24.

Despite having yet another IP on that physical network, it’s fenced from the others by its subnet mask (so the 6100D isn’t just basing its decisions on the MAC address). Here we see the correct IP is detected for the core router’s MAC address:

Apparently the 6100D is a real slave to subnets.

Config diffs

Before fiddling with port forwarding and various other settings, I saved one setting in the Netgear GUI: I added a static route. That resulted in this snippet being added to the config file:

table=StaticNetworkConfig;
columns=Enabled;Nickname;InterfaceTable;InterfaceReference;RouteType;IPDestination;IPNetmask;IPGateway;Metric;RIPAdvertised;SaveToFlash;INSTNUM;
1;Test Route;;;Network;10.78.1.0;255.255.255.0;10.222.1.1;10;1;1;2;

The following entry was also added, even though I didn’t modify the NTP settings:

table=NTPConfig;
columns=Enabled;NTPServer;NTPServerSec;Interval;DayLightSavingsUsed;LocalTimeZone;BackoffIntervalMin;BackoffIntervalMax;TimeZoneName;DayLightSavingsStart;DayLightSavingsEnd;
0;time-b.netgear.com;time-a.netgear.com;3600;1;GMT+5;5;60;;M4.1.0/02:00:00;M10.5.0/02:00:00;

Otherwise it doesn’t look like anything else was altered, aside from some timestamps (phew). More importantly, the router still works!

That’s important because I was concerned that the Netgear GUI might wipe out or otherwise alter important settings that the Sprint GUI had added in.

I keep talking about the GUIs because the fact is that I don’t know if there is any difference between the two GUIs as far as configuration management on the back end goes. They may well use the same configuration management scheme, in which case of course they won’t conflict! But it’s possible that they manage the config differently, and could kill each other’s settings.

Disclaimer

I just found out about this roughly 30 minutes ago. I have no idea what undesirable consequences might arise from changing settings in the Netgear GUI. I don’t even know if all of them will work as intended. So use this information at your own risk!

Netgear LG 6100D LTE Gateway for Sprint Review – Bad Device, or the Worst Device?

Scott — Mon, 25 Aug 2014 20:41:47 +0000

I recently obtained a Netgear LG6100D LTE Gateway from Sprint as a backup for my hard internet connections. The device seemed perfect on paper: Cellular connectivity for the home or business network!

I’ve used some bad consumer routers in my day, but this is one of the worst I’ve encountered. Or maybe it’s that it looked so promising at first and then let me down so hard.

Update (2014-08-26): I found that you can access the native Netgear web GUI. It has a heck of a lot more features, and solves many of the complaints I have with the “correct” way of configuring this device.

Upon logging in the user interface is clean, fairly informative, and I noticed that the values were updating automatically for Status and Data Usage. Some AJAX is a nice touch on this kind of device.

The very first thing I decided to do upon seeing the Wi-Fi networks listed in the lower-left was to disable WiFi. I’m going to be integrating this with my existing network, and I already have multiple access points.

Complaint 1: There is no place to turn off WiFi. You can turn off the “Guest Wi-Fi”, but can’t disable the 2.4Ghz and 5Ghz regular WiFi access points.

OK, fine. Not a huge deal. I set the passphrases to something ridiculously long and random, set the “Wi-Fi Range” to “Short”, hid the SSID and changed the connection rate to the lowest (narrowest) possible. The device is in my basement, so hopefully that’ll be enough to prevent any Nosy Nellies from racking up charges on my data plan.

The next thing I did was to set up my LAN. Here’s what the setup page looks like (I’m using some fake values for these screenshots):

I did actually RTFM for the DMZ setting so that I was sure what it did: All unsolicited traffic from public networks (the internet) will be forwarded to this address.

That’s perfect for me, because the only network devices downstream from the 6100D will be my routers. They’ll handle all the firewalling and NATing.

Complaint 2: Although the LAN settings allow you to specify any netmask you want (I went with a /16, or 255.255.0.0), you can only change the last octet of the DMZ IP. In other words, the DMZ device has to be on the same /24 as the 6100D.

Again, not a big deal, but for organizational purposes I would have liked to have had them on different /24s.

Because of that limitation I ended up messing with my settings. Here’s how I had the router configured at one point:

LAN: 10.222.2.1/16
DMZ: 10.222.2.3

I then realized that those settings wouldn’t be ideal, and so I changed the LAN IP:

LAN: 10.222.1.11/16

I saved it, and the third octet of the DMZ changed to “1” to match the third octet of the LAN IP. Then I changed the last octet of the DMZ to “1”, saved, and wound up with these settings:

LAN: 10.222.1.11/16
DMZ: 10.222.1.3

But wait, the DMZ should be 10.222.1.1. I tried to change it again. It remained stuck (even across reboots) at 10.222.1.3.

Then I looked at the actual network traffic going from the 6100D to my router. DMZ traffic was going to 10.222.2.3 — the old setting.

Complaint 3: The DMZ IP address can become “stuck” on a value that doesn’t match what’s displayed in the GUI, and there is no way to change it.

I tried re-IPing the router back to 10.222.2.1 and then changing it and the DMZ value back in various different sequences. No dice.

I was afraid to do a factory or settings reset, as I worried that might wipe out some cellular data settings that were preloaded by Sprint. (In theory all it should need is the SIM, but you never know!)

I used the 6100D’s “Download / Backup” feature to download my config. It was base64 encoded plaintext. I decoded it and found this setting:

table=StaticNatConfig;
columns=Enabled;LocalHostIPAddr;LocalHostMACAddr;
1;10.222.2.3;d4:ae:52:xx:xx:xx;

Great! That’s the setting!

I changed it, re-encoded the text to base64, and uploaded it to the device. A JavaScript alert dialog warned me that the router was going to reboot… and nothing happened.

I did notice that the URL now had the suffix ErrorNum=3, so I suppose that the upload failed.

Complaint 4: However no error was given in the GUI. There was no indication that the upload had failed, and certainly not a reason for the failure. (I want to be clear that I don’t blame the upload failure on Netgear; I probably didn’t notice/update some CRC or other information. My objection is the lack of reasonable error reporting).

Lack of error reporting brings me to the system log. I went there to see if there was explanation for the failed upload. There was no mention of the upload, but…

Complaint 5: All the dates were in 1970. Clearly this thing hadn’t synchronized with an NTP server or some such (even though it had been connected to the Sprint network for some time).

That brings me to the “Date & Time” settings:

I decided to set the date and time manually. I unchecked “Automatic Time Update” -> “Enabled” and hit “Submit”. I got alerted that the settings were saved successfully and, uhhh…

Complaint 6: There is no way to set the date and time manually. Look:

The “Local Time” field is static. There is no way to set the date and time. So why even be able to disable NTP in the first place? (Oh, there’s a reason — we’ll get to it.)

Incidentally, it says “EST(Central Standard Time)” in the time zone dropdown. I’ll not make that a separate complaint, but it gives you an idea of the amount of quality control that went into this thing.

While I was poking around in the plaintext config file, I found this little doozy:

table=AdminInfo;rev=2;
columns=AdminUserID;AdminPassword;PWNotAllow;RemoteAccessEnable;SessionTimeoutEnable;SessionTimeoutInterval;SessionTimeoutTimeLeft;EnableGUIAuth;Preffered_Proto;RemoteHttpsEnable;EnableRecovery;SecQ1ID;SecAns1;SecQ2ID;SecAns2;UserRole;TimeStamp;
admin;MyActualPassword;password;;;;;1;;;0;0;;0;;0;;
admin;password;;0;0;20;16;1;0;1;0;;;;;;;
;;;;;;;;;;;;;;;;;
support;password;;;;;;1;;;;;;;;1;;
user;password;;;;;;1;;;;;;;;2;;

Complaint 7: The admin password is stored in plaintext in the backup file.

OK, it’s base64 encoded which will put off the average user-level pair of prying eyes. But I wouldn’t exactly feel comfortable leaving my router’s backup settings unencrypted on a network drive.

Complaint 8: What are those “user” and “support” accounts? I tried logging in as both from the GUI and could not. But is there some back door that I’m not aware of? They’re not mentioned in the GUI, and there’s no way to change those passwords that I can see (well, there is, but we’ll get to that).

I don’t know about you, but I find superfluous and immutable user accounts to be sketchy at best.

Let’s talk about why I’m using a DMZ host in this scenario.

I already have redundant router/firewalls that are directly connected to the internet using my two hard line connections. They both “own” public IPs and firewall/NAT traffic to and from my internal networks. They’re simply PCs running CentOS having 9 ethernet ports each, and they work great.

For me, this DMZ setting will result in “double-NATing”. In other words, all traffic coming into the 6100D will be DNATed to my router, and the router will DNAT it to my server. That’s sub-optimal for a variety of reasons.

(Of course if this “router” actually let me add “routes” I could use its port forwarding feature and obviate the need for the second level of NATing. We’ll get to that topic later on.)

The 6100D does offer a setting that’s extremely sexy on first glance: IP Passthrough.

From the documentation:

You can designate a computer behind the gateway to receive unsolicited traffic from the public
network.

Note: The public WAN IP will be assigned to this computer.

That sounds perfect! Let’s look at the settings page for this feature:

Hmm.. “Device Name” is a drop-down with nothing in it. And what’s the DHCP lease time?

The documentation says:

In the Device Name drop-down list, select a computer.
[..]
In the DHCP Lease Time fields, enter the days, hours, minutes that you want to assign the public IP to this computer.

That was not very helpful, which is typical of the documentation.

Complaint 9: This feature does not work. There is never a computer listed in the “Device Name” dropdown. In fact, I tried this with the 6100D connected directly to my laptop back when it was fresh from the factory and it still didn’t work.

Besides that complaint, a whole host of questions are raised:

Does the computer to which the IP is “passed through” use the upstream DHCP server on Sprint’s network, or does it use the DHCP server on the 6100D?
If my cellular WAN IP address changes before the end of the lease time I’ve set, will it still update my computer’s address? What exactly does that lease time mean? And why is it there? Why not just use the upstream lease settings?
Based upon what’s written in the documentation (“[time] that you want to assign the public IP to this computer”) is this really a DHCP setting, or does it mean that after that time period the IP will simply revert back to the 6100D instead of the downstream computer?
Is it accomplishing the passthrough by bridging the WAN connection to the LAN connection? Or does it use some kind of internal double-NATing? In other words, by what mechanism does it “pass through” the IP?

Complaint 10: Even if I did see my computer listed in the “Device Name” dropdown, this feature would be completely useless to me as it’s documented to the point of obscurity.

Speaking of obscurity:

Complaint 11: The “Custom” setting on the firewall is useless. You cannot make custom firewall settings of any use.

Let me show you:

Looks simple enough. There are some default rules. ICMP is allowed, and a variety of TCP and UDP services are blocked. (Note that there is no “remove” button, but I suppose you could override these rules by putting another rule prior in the chain. It’s not really “custom”, but whatever.)

The page does state:

Control outbound traffic initiated from within the local network.
Inbound traffic may be controlled by configuring Port Forwarding.

Wonderful. So it’s more like half of a firewall. Port forwarding is not firewalling. I’m using the DMZ feature, not the port forwarding feature, yet I’d still like to block ports at the edge. This is not only for security, but to avoid unnecessary data usage charges (more on that later).

But OK, let’s press the “Add” button:

BTW – The “Rule Name” is actually the “Service Name”. You set up services in another section of the GUI. They are basically just named definitions of a port range and protocol. In this case I have already configured “VPN (SMR)” in services to match my OpenVPN server settings.

“Action” allows you to set either “Allow Always” or “Block Always”. I want allow.

Here’s why the custom firewall is meaningless:

THERE IS NO NETMASK FIELD! A firewall wherein you’d have to black- or white-list every individual IP is useless.

If I leave the “Lan Users” or “Wan Users” blank, I get an error that the IP addresses are required. If I set either to “0.0.0.0” (figuring maybe it would accept that as a wildcard) it gives an error that the IP is not valid. So neither “intuitive” ways of inputting 0.0.0.0/0 are allowed, let alone a more nuanced netmask.

Complaint 12: The documentation is terribly vague about a lot of things, and the custom firewall in particular. This is literally all it has to say on the matter:

Custom is an advanced configuration option that allows you to edit the firewall configuration directly. Only expert users should attempt this

I’ll grant that I’m not an expert on the Netgear 6100D custom firewall. So maybe there’s something I’m missing. But if there is, I can’t find it.

Oh, and those two screenshots that I’ve showed to you? That’s all there is to the “custom” firewall. The rule editing dialog is the same as the add dialog. That’s it.

Speaking of custom things:

Complaint 13: There is no ability to edit the routing table(s).

Leaving a heck of a lot out, here’s how I have the 6100D connected to my workstation:

WORKSTATION <-----> CORE ROUTER <-----> NETGEAR 6100D

My network is similar to this:

Workstation: 10.10.1.50
Core router: 10.10.1.1, 10.222.1.1
Netgear 6100D: 10.222.1.11

But since I can’t add a custom route to the 6100D, it tries to route all packets destined for my workstation over the public internet! Hence in order for me to even administer the device, I had to SNAT all traffic destined for 10.222.1.11/32 with a source of 10.222.1.1. It’s pretty stupid that I have to do that, but it works.

It also means that (even if I wanted to) I couldn’t use the Netgear’s “port forwarding” (DNATing) in my environment — none of my servers are on the 10.222.0.0/16 network.

Complaint 14: Dynamic DNS: Paid or Chinese.

The only two options for Dynamic DNS are DynDNS.org or 3322.org.

DynDNS.org no longer offers free DDNS services. 3322.org is apparently Pubyun, a Chinese company. I have no problem with it being a Chinese company in general, and it looks like they’ve been doing DDNS since 2001. However their website is in Chinese, and I can only assume that their servers are in China and that they may not provide support in English.

My problem is not with the two services on offer, it’s that they are the only two services on offer and that there is no custom option.

Fortunately I happen to know of a relatively new (and believe me, very unknown) DDNS service from Kisolabs that is both free and will let you spoof your device’s DNS so that it thinks it’s hitting DynDNS.org.

Let’s get to what is probably my biggest complaint of all.

In trying to resolve the DMZ IP address issue that I had, I said to myself, “hey Scott, this appears to be running some kind of *nix because the config file shows snippets of iptables commands. Maybe you can SSH in.”

So I issued a nc -z 10.222.1.11 1-1023 with the following results:

Connection to 10.222.1.11 23 port [tcp/telnet] succeeded!
Connection to 10.222.1.11 80 port [tcp/http] succeeded!
Connection to 10.222.1.11 179 port [tcp/bgp] succeeded!
Connection to 10.222.1.11 443 port [tcp/https] succeeded!

Oh-kay. No SSH, but telnet is open!?

# telnet 10.222.1.11
Trying 10.222.1.11...
Connected to 10.222.1.11.
Escape character is '^]'.

BusyBox v1.1.3 (2014.01.02-13:26+0000) Built-in shell (ash)
Enter 'help' for a list of built-in commands.

# ls /WFIO/current.cfg  
/WFIO/current.cfg (<- That's the configuration file that's manipulated by the GUI, and that's read on boot.)
#

Wait. No authentication?

Complaint 15: NO AUTHENTICATION.

You’d think that maybe this would be an environment with some very strong deny permissions, but no.

Complaint 16: Not only are the configuration and even the GUI HTML files readable, THE CONFIG FILES ARE WRITABLE!

Complaint 17: And THE CONFIGURATION FILE CONTAINS THE ADMIN PASSWORD IN PLAIN TEXT!

Complaint 18: And THERE IS NO WAY TO DISABLE TELNET ACCESS from the GUI!

And remember: There is no way to turn off WiFi.

Who designed this thing???? It’s a security nightmare.

The only thing I can guess is that maybe Netgear charged its engineers with creating a honeypot, and they accidentally released that codebase to production for this device.

Oh, and search the user guide for “telnet”. You’ll find that it’s mentioned three times. Once in regards to services that could be permitted by the firewall, once in the index, and once on page 109:

Are Terminal Sessions Supported?
Terminal sessions (for example, via telnet or ssh) are not supported.*

*Documentation written by Kafka.

ARE YOU KIDDING ME?

The craziest part is that I haven’t even tried playing with most of the other settings. I can’t imagine how many complaints I’d have if I actually delved into this!

Even just trying to navigate between the settings that I do need is counter-intuitive. Let’s look at the left navigation bar:

You want to change the password to the router. Quick, which menu item do you click on!? Nope, I would’ve thought it was Security as well. But it’s under Settings.

And what about dynamic DNS settings? NOPE! That’s in Security.

When you do go into Settings there are four tabs from which to choose:

General is fair enough, but Network actually means “WAN / Cellular” and Router actually means “Basically Whatever”. Manage VPN is refreshingly self-explanatory.

Here’s the sub-menu under the Router tab:

Most of this is just… wrong.

Even though there’s a section for “Port Forwarding”, the DMZ port forwarding setting is under “Basic”.
Port filtering is here instead of under the Security menu.
“MAC Address Cloning” only specifies that it’s the “Router MAC Address”. BUT THIS ROUTER HAS SEVEN INTERFACES! Does this apply to any one of the four WiFi interfaces, the LAN interface, or one of the two WAN interfaces? (The documentation makes it seem like it applies to the hardline WAN port — but all the other settings for the hard WAN interface are under the Network tab. So why is this here??)
“File Sharing” should not be under the Router tab. It shouldn’t even be a feature on this device.

Hence, Complaint 19: Poor organization of the menus.

Thanks for hanging in there! I know it’s been a long ride, but let’s round this out to an even twenty:

Complaint 20: By default this thing suckles at your data plan.

It’s constantly in communication with various servers in the sprint.com and netgear.com domains. I can see this in the system logs. The requests appear to be for data usage information and NTP synchronization respectively. By default it also checks for system updates (I am up to date, BTW).

I haven’t done any “scientific” testing, but look at this:

In the last 50 minutes it’s used 0.12MB of data. My router is not pushing traffic to the 6100D, and I currently have no publicly addressable services served by it. I’m administering it over the LAN port.

That’s just “idle” utilization.

So, that’s 0.0024MB per minute. Assuming that’s an average level of utilization then it uses 104MB per month at idle.

That’s over 10% of my data cap just gone! Sprint actually sells a 100MB/month plan for this device. Imagine your face when it uses up your entire data cap (and then some) on trivial, unwanted, unnecessary data!

OK, to be fair I can disable NTP or point it to local (LAN-connected) servers. And maybe (maaayyybe) Sprint doesn’t actually bill for the data going to/from sprint.com. But are most users going to know this?

And remember that even (possibly) unbilled data to/from sprint.com will spawn DNS requests. Are you using Sprint’s DNS servers? Do they charge for that traffic?

But what about unsolicited requests on blocked ports? Does every TCP SYN count against my data plan? Does every UDP packet destined for my IP count against it?

Moreover, since I can’t set up custom firewall rules and don’t want to use “port forwarding” the 6100D is going to happily SYN/ACK any TCP connection and forward the packets right along to my DMZ host! So someone could rack up huge charges on my connection just by spamming my IP with large packets, even if I don’t reply!

In conclusion: The Netgear 6100D LTE Gateway is not ready for prime time. I couldn’t even recommend it for home use due to gaping security holes, let alone in a business environment as Sprint suggests:

What about me? I’m going to keep it. It’s my only reasonable option. Sprint has the most competitive pricing of any of the major providers, and this hardware appears to be the best available (at prices I’m willing to pay). I have found workarounds for all of the complaints that are strictly relevant to my environment. The security holes are acceptable to me because I’m using a one-off password and my LAN interface is firewalled off from being accessed by all but my own workstation.

It’s still the worst networking device I’ve seen since the “Cisco” (Linksys) RV042.

I’ve been doing this long enough to know that rants about a device that I’ve only owned for a few days may contain some inaccuracies. I may even be dead wrong about some of my overarching complaints and assumptions.

As of today (August 25, 2014) comments are closed on this site due to an extraordinary number of spammers. But please contact me by email if you have any comments or corrections: scott[at]s.co.tt

Update (2014-08-26)

Complaint 21: The 6100D runs a DLNA server, and there’s no way to turn it off. (Well, there is.)

Complaint 22: The 6100D listens on port 3457 on all interfaces and port 9000 on the LAN interface. They both appear to be HTTP servers, but I have no idea what they do or why they’re there. The documentation doesn’t mention them.

This isn’t a complaint about the device itself, so I’m not going to number it:

I posted a link to this review on both Netgear’s and Sprint’s timelines. Netgear hasn’t replied, but even worse Sprint did reply:

That’s just disgraceful. Am I talking to a bot?

I don’t expect Sprint’s social media rep to read the entire 3,500 word blog post. I’m not that self-important. But look at that last comment:

Sprint: What kind of device are you using? Have you tried to call the manufacturer of the device? Are you eligible for an upgrade? Let me know. – Brenda

The manufacturer and model number are both right in the title! And no, I’m not eligible for an upgrade. It says right in the blurb that “I recently obtained a Netgear LG6100D LTE Gateway from Sprint”. Not two years ago.

But that is completely irrelevant in the first place, because this device is not sold with a contract. You simply buy it. Now that is something their reps should know.

And this sounds like a classic ELIZA response from the 1960s:

Sorry that you feel this way. What’s going on to have you feel like this? – Fernendez

Get it together, Sprint.

Update (2014-08-28)

Man, I just keep finding more and more stuff about this device that is just stupid or downright buggy.

Complaint 23: The device seems to, without obvious reason, occasionally flood the LAN with multicast messages from the igmpproxy. That process uses about 75% CPU, with the remaining available CPU going to IO. It freezes the GUI, but it stops after a few minutes.

Complaint 24: IP forwarding seems to be based upon MAC address in some convoluted way, rather than the IP address you actually enter. This may actually be the cause of the DMZ problems I was having (discussed above), but in this case I’m specifically talking about “Port Forwarding”, not the DMZ setting.

Complaints 23 & 24 are discussed in more detail in my post on the native Netgear GUI, and some of the problems it solves.

Complaint 25: Another simple example of stupid design. Let’s look at yet another screencap:

Click on the image for a full-sized version.

The 6100D tracks data usage both by billing cycle and by session.

(I think “month” in this interface means “billing month”, but now that I look at it I’m not sure. It may mean “calendar month”. Who knows?)

Tracking data usage right in the router is a big plus! The billing usage data comes from Sprint’s servers (I can see that in the logs), so even though it says that it’s “approximate and may vary”, it should be a pretty good indication of billable usage. Hopefully.

So, what do you think that “Reset” button does? It’s right there tucked to the lower-right of the session data usage.

It should reset the usage counter for the session, wouldn’t you think? That would be really useful if, let’s say, you were playing Call of Duty and were curious as to how much traffic that game was pushing through the WWAN. You could reset it, play away, and then take a look.

It would be absolutely stupid and pointless if that button reset the statistics for your billing cycle. I mean, it wouldn’t actually reset your billing, right? It wouldn’t turn back time and start the month over, right?

WELL THE RESET BUTTON RESETS THE BILLING CYCLE USAGE STATISTICS, NOT THE SESSION STATISTICS.

You can see this evidenced in the screepcap, wherein I have 24 days left in my billing cycle and yet have used no data. Even though in my current session I have used 0.57MB!

This boggles my mind more than even the unsecured telnet interface. The telnet thing was clearly an accident. I’m giving them the benefit of the doubt that they probably just forgot to comment out the telnet daemon start command in the init script(s) before releasing to manufacturing. (Though it should have been caught in QA, but what do I know?)

But this reset button seems to be part of an intentional design decision. It’s so vastly illogical and pointless that I can’t imagine how it made it into this device. Unless the device is wholly under-planned, under-engineered, and under-tested. And it does indeed seem to be all those things at once.

Update (2014-08-28) – I’m just about done with Sprint

Today’s complaint is a bit of a tangent, as it doesn’t pertain to the device. It’s about Sprint’s website.

Specifically the bill payment section of their website. You know, the one that has to do with my hard-earned money and their revenue (something for which shareholders have a great concern — I’m glad I’m not one). This is the section of the site that should be absolutely reliable and well designed.

This is what I was treated to when I paid my bill:

Looks perfectly normal, right? But see that grey button in the lower-right? The one that says “Authorize”? The one I’m only supposed to click once to avoid duplicate charges? Well, I’ve already clicked it. It was yellow before, and now it’s grey.

It’s been grey like that for 75 minutes and nothing has happened. No “payment successful”, no “sorry, payment unsuccessful”, no timeout. No response at all.

And I’m sure that Sprint would be happy to tell me that it was the fault of my internet connection, except that I seriously doubt it because I wasn’t using their horrible device. Plus I paid four other bills while waiting for them to process my payment. Then I went to lunch.

So now here I am. I decided to go back to view my payment history, and there was nothing there. I checked my credit card online and there was no charge from Sprint. Fine. I’ll try again.

On my second try the payment went through in about 10 seconds. (Which in this day and age is an eternity.) Success!

But I don’t really trust these guys, and so I wanted to make sure it actually did go through properly. I went into the “Payment activity” tab and found this:

That’s right, no payments scheduled even though I did get a confirmation screen saying that my payment was scheduled successfully.

The one item in my “Payment history” is dated 11 days ago; That had to do with my account activation and etcetera.

It’s now been about 15 minutes since my payment was “successful”. I still don’t see it as scheduled or processed in my account on Sprint’s website. I haven’t received a confirmation email, either.

But I guess all is well for them: They got their money, as evidenced by my bank’s website. It would just be nice if they would let me know that it was applied to my account.

30 minutes later…

“Don’t worry guise! Teh sights are now down complerply!”

… My first time using it, and the entire f**king customer portion of their site is down. What a pile of s**t.

And how dare they say “We are enhancing this section of our site”. What nerve.

Hey Sprint: Your site broke and you lie to your customers about it? Unless you consider “basic f**king functionality” to be an “enhancement”. If that’s the case, I ask you to please post that opinion publicly. I dare you.

“We here at Sprint believe that a functional site is an enhancement over a non-functional site. That’s why we do our best to keep our site functional most of the time. Because we at Sprint care about our customers and their occasional access to such great features as online bill pay, viewing usage history, and letting them sometimes buy, you know, phones and stuff.”

I’ve been using Verizon Wireless for almost 15 years (since they were Bell Atlantic). And though they’re by no means perfect I’ve yet to see a catastrophic failure of their ability to process payments.

Fix for: Keepalived router enters fault state on link down

Scott — Fri, 06 Jun 2014 19:01:49 +0000

TL;DR: This is the configuration option you want: dont_track_primary

At work and at home I have pairs of redundant “core” routers in an active-passive (or master-backup as you like) configuration. They consist of commodity hardware, a few 4-port gigabit NICs, and CentOS. All of these machines had been running flawlessly for anywhere from two to six years (as they were put into service or upgraded).

That is until yesterday when my primary router at home had an SSD failure which completely stopped it in its tracks. The backup router took over, and in less than a second traffic was being routed. All of my point-to-point VPNs reconnected within about 20 seconds. In other words, it worked exactly as it should.

Until I turned off power to the broken router. Then everything stopped.

I had made a minor change to my router pair a few months ago, and didn’t think anything of it. Instead of running VRRP traffic through the switch, I had dedicated a NIC port on each machine and connected them directly using a crossover cable. I had only tested by bringing the primary router down gracefully, and did not pull the plug.

When the plug was pulled on the broken router, the now-master saw the link go down on the VRRP port and keepalived went into the FAULT state. It gave up its VIPs and basically stopped keeping anything alive.

That behavior can make sense in certain scenarios. For example, if just the NIC port used for VRRP went down on the master router, I wouldn’t want the backup also taking the VIPs (and certain routes, etc.) If I had VRRP going through one switch and production traffic going through another, I wouldn’t want a failure on the less important switch to again cause VIP conflicts.

In my case, I find it much (much, much, much) more likely that the link having gone down will mean that one of the machines has died completely. In my experience power supplies and HDDs (or SSDs) are far more likely to fail than a NIC or NIC port. It’s not to say that the latter is impossible, but rather that I have to plan for the most likely worst-case scenario.

All that being said, there is one setting for your keepalived.conf to obviate this issue: dont_track_primary

That’s it. It doesn’t have options or qualifiers. From the man page:

# Ignore VRRP interface faults (default unset) dont_track_primary

From the keepalived changelog:

VRRP : Chris Caputo added "dont_track_primary" vrrp_instance keyword which tells keepalived to ignore VRRP interface faults. Can be useful on setup where two routers are connected directly to each other on the interface used for VRRP. Without this feature the link down caused by one router crashing would also inspire the other router to lose (or not gain) MASTER state, since it was also tracking link status.

Perfect, right?

Here’s my keepalive configuration that’s been sanitized and edited for brevity:

global_defs {
   notification_email {
     me@mydomain.corn
   }
   notification_email_from rtr-core02@int.meagain.net
   smtp_server 10.80.1.41
   smtp_connect_timeout 30
   router_id RTR-CORE-A
}
vrrp_instance VI_0 {
    state BACKUP
    interface p4p1
    smtp_alert
    virtual_router_id 50
    priority 50
    advert_int 1
    dont_track_primary
    notify_master /etc/keepalived/promotemaster
    notify_backup /etc/keepalived/promotebackup
    authentication {
        auth_type PASS
        auth_pass sanitizedpassword
    }
    virtual_ipaddress {
        192.168.1.1/24 brd 192.168.1.255 dev p3p1 label p3p1:100
        192.168.1.2/24 brd 192.168.1.255 dev p3p1 label p3p1:101
        10.1.1.1/24 brd 10.1.1.255 dev p3p2 label p3p2:100
        10.1.1.2/24 brd 10.1.1.255 dev p3p2 label p3p2:101
        # Many VIPs omitted here for brevity
    }
    virtual_routes {
        158.209.0.99/32 via 78.123.265.1 dev p1p1 table main
        0.0.0.0/0 via 91.59.24.131 dev p1p2 table 50
        193.266.0.0/16 via 91.59.24.131 dev p1p2 table main
        # Many routes omitted here for brevity.  IPs are sanitized/randomized
    }
}

I’m hoping that I put enough keywords in this article so that you found it easily. The whole point of this post is to counter the drought of discussion on this topic.