Currently away at the NANOG meeting in Dallas. Got an alert from the RIPE Atlas system that my Atlas probe had become unreachable.
Bit of testing from the outside world showed serious packet loss, and nothing on the home network was actually reachable with anything other than very small pings. I’d guessed the line had got into one of it’s seriously errored modes again, but thought I’d try leaving it overnight to see if it cleared itself up. Which it didn’t.
So, how did I get around this, and reset the line, given that by now my tolerant girlfriend would be at work, and couldn’t go into the “internet cupboard” and unplug some wires?
Well, turns out that you get BT to do an invasive test on a line using this tool on bt.com. This has the effect of dropping any calls on the line and resetting.
The line re-negotiated, and came back up with the same speed as before, 3Mb/sec down, 0.45Mb/sec up, no interleave.
Looking at the router log, the VirtualAccess interface state was bouncing up and down during the errored period, so the errors are bad enough to make the PPP session fail and restart (again and again), but the physical layer wasn’t picking this up and renegotiating.
Of course, BT’s test says “No fault found”. In terms of the weather in London, it has been damp and foggy, further fuelling the dry joint theory.
I’ve also had a chat with Mirjam Kuehne from RIPE Labs about seeing if it’s possible to make the Atlas probe’s hardware uptime visible, as well as the “reachability-based” uptime metric. They are looking in to it.