The hackspace's Internet connection is provided by Bristol Wireless (BW).

PoP=point of presence

Hardware

BW's OpenMesh box (an OM2P-HSv2 “NEW1”, mounted on the partition near the front door in G11) provides a wired connection to their wireless mesh. It has a second port through which it gets power (PoE) and perhaps also traffic (unknown).

Configuration

Access is either wired via the OpenMesh box (managed by our Roles), or wireless via the many WiFi access points called BVStudio in the building.

The OpenMesh box (or something upstream of it?) gives us a five-minute dynamic lease in the range 10.255.148.0/22, naming itself as the default route and DNS provider. Observations:

  • This lease is pretty short; it may be that this is necessary owing to how dynamic BW's own upstream configuration is, but it seems the box should be isolating us from at least DNS and routing details, and if so it's not clear why/whether the lease needs to be this short.
  • We don't necessarily get the same lease every time (we've certainly been switched between 10.255.150.96 and 10.255.150.97 in April 2016).
  • There have been issues with DHCP on this unit.

BV's own upstream appears to be variable and can change as often as several times an hour; we've seen two different Spectrum Internet routes and a Zen route in the span of an hour or two during April 2016. There doesn't seem to be any relationship between BW's PoP and our lease.

Issues

Gleaned from outages, router logs and anecdotes; in roughly worst-first order:

  • We have recently seen a period during which we got no DHCP lease for tens of hours. DHCP/DNS on our OpenMesh unit certainly seems to cope badly with upstream changes in routing. This may be a problem just on this one unit, since there were anecdotal reports that service was still available at least some of the time via BVStudio access points during both outages.
  • BW's service is not hospitable to long-lived connections without extra measures:
    • BW appears to dynamically choose its own upstream PoP, and every time that changes it'll break all existing sessions. (There are workarounds like keepalived for this sort of problem, but BW doesn't appear to use them.)
    • It appears there are several layers of NAT between us and BW, and we share with probably several hundred other clients, meaning connections may need frequent keepalives to maintain their slots in upstream's fixed-capacity, most-recently-used NAT tables:
      • To exercise an acceptable level of centralised control over our internal network we have to use our own range behind the single IP we get from BW, so we must to do NAT ourselves.
      • BW itself is obviously NATing us (10.x.x.x is not a routable range).
      • Even when we have a long period under the same PoP, there are often several hops through other private IP ranges before our packets reach the Internet.
    • For connections passing through our router, these problems could be worked around by running our own VPN between the router and some other point (our Mythic Beasts VPS, or a commercial VPN provider, and/or an IPv6 tunnel broker supporting AYIYA or similar).
  • BW's service is not hospitable to inbound connections (including those negotiated using UPnP, often used by peer-to-peer and video calling) without extra measures:
    • every unit in our route that performs source NAT (see above) would need an exception, which BW may or may not support
    • our PoP is not stable (see above)
    • we'd need a static lease, which BW may or may not support
    • again, as above we could work around this with a VPN
  • BW's service doesn't give us a predictable source IP on outbound packets (which would be useful for anti-mischief authentication for things like Tarim's “anyone in” script), although we currently use dynamic DNS to work around this.
  • BW does not appear to support IPv6 yet (the OpenMesh box appears to ignore router solicitations as of April 2016).
  • The PoE cable supplying the OpenMesh box has a damaged end at the wall opposite the front door.
  • There have been some issues with packet corruption in larger downloads (hopefully fixed by cable repairs in March/April 2016).

Plans

  • Get more solid diagnostics:
    • Make router buttons export a problem report to persistent storage.
      • Mostly written (/etc/export-problem-report.sh), but entirely untested; should probably add distinction between button-generated and monitoring-generated exports; not yet attached to the buttons; needs to be publicised once done.
    • Monitor more things, so we can generate problem reports automatically:
      • we already have an RRD full of ping stats, just need to query it periodically
      • should gather stats on success/failure to get a DHCP lease (logread -f and a bit of glue to put that in an RRD)
    • Can't really say a VPN improved things unless we have some stats on TCP connection reliability before and after. To do this properly (accounting for NAT drops) we should probably hold connections to the VPS open at a range of keepalive intervals, and log stats on their uptimes.
  • Verify and establish a fallback uplink: Give the existing router the means to route via wireless. Lets us verify that BVStudio remains connected when our G11 mesh unit is denying us DHCP, and if so lets us bypass our mesh unit should it misbehave again.
    • Grabbed a spare unit from the networking box, installed OpenWrt on it, and will install it (2016-04-28) as client of some selected BVStudio access point besides ours. (Could probably also have used the existing router's 802.11bgn radio in AP+STA mode, but don't really want to introduce that complexity unless it's confirmed to be worth it.)
    • Some criteria and a mechanism for switching routes will be needed after that, so it relies on “monitor more things” above.
  • Establish an experimental VPN and get some volunteers to try it: permits inbound connections, and lets us confirm whether we can maintain stable/reliable connections despite our dynamic PoP.
    • OpenVPN needs configuring at both ends.
    • Add monitoring for comparison to non-VPN routing.
    • A mechanism for selectively routing volunteers via the VPN needs to be tested (probably Linux policy routing; rt_tables and ip rule add, plus custom dnsmasq leases for the volunteers).
    • Get some folks to use it.

Unknowns

  • An idea of our typical traffic, in case we want to go down the VPN route and need to compare traffic to allowances (being monitored).
  • What the network arrangement is in the workshop, since we should be managing that too (just waiting on higher priorities like stability issues to settle down).
  • What our upstream network neighbourhood looks like, since it might inform other decisions (waiting for an opportunity when nobody is relying on the connection).
    • Where the OpenMesh box's other cable goes. Does it supply our neighbour too, or provide redundancy? Assuming for now that the OM is in client mode and talking directly to BW wirelessly, not through that cable.
    • Whether any BV studios infrastructure lies between us and BW, and if so, what.
    • Whether we can reliably see clients on other BVStudio mesh units (might avoid cabling crossing the corridor).
  • Whether our DHCP lease comes from the NEW1 OpenMesh box or beyond, so that we can confidently report which hardware we suspect is unreliable (just waiting on higher priorities like stability issues to settle down).

Given the general inhospitable nature of BW's config it's probably immaterial whether we can get a stable subnet, but in case VPNs turn out unworkable, we might avoid all the NAT in other ways:

  • Maybe we get granted a private IPv4 subnet through some exotic DHCP options?
  • Maybe by convention/arrangement we actually get granted the /24 within which our single IP falls?
  • Maybe BW intended us to bridge the mesh with the rest of our network? Traffic dumps seem to show several laptops in our broadcast domain (directly plugged in, or bridged, to our segment).
  • archive/internet
  • Last modified: 24 months ago
  • by 54.243.53.148