network_outage_logbook [2016/04/27 13:31]
iwilcox [2016-04-26] Graphs
network_outage_logbook [2016/04/29 02:58]
iwilcox 2016-04-28 update
 BW=Bristol Wireless. ​ PoP=point of presence. BW=Bristol Wireless. ​ PoP=point of presence.
 +==== 2016-04-28 ====
 +Not really a new outage so much as the old one continuing, but we have an explanation and a workaround:
 +Every time we do an ARP "​who-has ''​''",​ two OpenMesh boxes reply claiming to have it: ''​AC:​86:​74:​57:​3C:​92''​ (the correct one) and ''​AC:​86:​74:​13:​A6:​F2''​ (the wrong one).  Typically the fastest wins, and from the router'​s point in the network, more often than not the fastest is the correct one (but it varies a lot, without much obvious explanation). ​ The winner dictates our outbound route until the ARP entry is revalidated.
 +If we don't make ARP requests and instead add a private permanent entry to our table, mapping ''​''​ to ''​AC:​86:​74:​57:​3C:​92'',​ we reliably get routed out via Spectrum, and that's our workaround for now.  Tarim noted the effect can be achieved by waiting for the correct ARP entry then constantly refreshing it without revalidating,​ by leaving an ''​arping''​ running.
 +This explains:
 +  * long-lived (tens of seconds and above) connections dropping
 +  * the shorter delays observed in replies to our DHCP requests (the first request goes to and might use the wrong MAC; since ''​AC:​86:​74:​13:​A6:​F2''​ is not always reachable, the first may go unanswered; in that case the second always goes to the limited broadcast address so will always be answered by ''​AC:​86:​74:​57:​3C:​92''​)
 +It does not explain the [[#​2016-04-19 |total DHCP outage 2016-04-19]].
 ==== 2016-04-26 ==== ==== 2016-04-26 ====
 {{:​network:​2016-04-26-dnsdrops1.png?​direct&​200|}} {{:​network:​2016-04-26-dnsdrops1.png?​direct&​200|}}
 {{:​network:​2016-04-26-load.png?​direct&​200|}} {{:​network:​2016-04-26-load.png?​direct&​200|}}
 ==== 2016-04-22 ==== ==== 2016-04-22 ====
 {{:​network:​2016-04-22-drop.png?​direct&​200|}} {{:​network:​2016-04-22-drop.png?​direct&​200|}}
 {{:​network:​2016-04-22-rtt.png?​direct&​200|}} {{:​network:​2016-04-22-rtt.png?​direct&​200|}}
 ==== 2016-04-19 ==== ==== 2016-04-19 ====
