Contrary to popular belief – the internet is not magic.
It’s okay to think of it as a series of black boxes – with a given input and a given output – but there are services and connections that layer up into what you see as the end user. For want of a better word – it is transparent.
…until something goes wrong and then it’s all assumptions, leaps of faith, and pointy fingers.
Yesterday morning, a number of you will have noticed that we appeared to be down… all of us, in entirety, but other places appeared to work.
This, despite how it may have appeared was “not just us” – moreover it was “not just Virgin Internet” as it appeared from the support tickets and calls we were getting. However – they did appear to be being affected more…. it will become apparent why in a bit – stick with me.
With the caveat of “I Am Not a Network Engineer But” (IANANEB) – here is an explanation of how traffic gets from A to B.
Each company will operate their own network.
They will have an allocation of addresses which are grouped together into handy bundles referred to by their AS number. Ours is AS33968 for this office here for example.
Traffic within our network, and arriving at the edges of our network me manage ourselves.
So this begs the question how does the traffic get from one network to another.
The simplistic answer is in two ways:
Connectivity is expensive. Unlike your contended and shared home and business DSL lines – these are the real deal the plumbing of the internet… like having – or in some case “having” a stretch of copper or fibre from one location to another. Transit is fixed pipes if you like – out to other providers further up the tree, who in turn have connectivity and peering, up to a point where you have three of four giants that just peer between each other. Transit can be narrowed down into two types – full, and partial. Where full will allow you reach the entire internet, and partial, parts, or other specific networks.
Peering is where providers get together on single network and say “Hi, I am your neighbour, I have the following AS here – if you have traffic for them, lets trade”. These cooperatives just pay membership for a port, and traffic passes directly between networks. The mechanics of this means a number of things:
– Less hops on the route;
– Faster from A to B;
The peering model has less points of failure and is cheaper, however it also relies on everyone having the right configuration (which it is in their interest to have). Transit, partial or otherwise is expensive, limited in capacity, but less prone to other parties breaking things.
Another way you could describe this to demonstrate these three is “you want to deliver a package to your neighbour”:
Transit – Full routes – you take your package to the post office;
Transit – Partial routes – you pick a local courier who will make use of other couriers for the long haul;
Peering – leave the house – knock on the their door, give them the package.
Routing is the dark art of smoke and mirrors that makes decisions on where the traffic needs to go. Again, generally speaking while many mechanisms are at play the routes are aggregated – that is to say when in London you will see signs to The North, and you will then start to see signs to larger cities, cities, towns, villages, and so on down to street signs. You hand your traffic off in the general direction and it is routed from there. This is exceptionally resilient (as there will normally be MANY wants presented to you to get from A to B), and robust.
Yesterday’s event was related to exactly that.
We peer at LINX – the UK’s premier peering point. It transpired an unnamed African ISP leaked some routes here that it should not have. In short “Hello, do you need to get to [provider or network] – I am your best choice.”
Try not to think about it as a cheap work around – LINX delivers 2TB connections … go have a look
Much to their surprise, and the rest of us, the internet starts pouring its traffic onto their network… and in doing so – things do not get to where they are intended… and to all intents and purposes, networks appear off or broken from some locations.
Below are some examples of how this presented across the internet in terms of third party sites – showing delay in terms of a graph. So the spike towards the right hand side being the incident in question ….
As providers change their routing to work around the presented issue service returns.
The networks that made the most use of peering (or indeed had traffic that had the most need of peering) were hit the hardest – which from this side of the fence appeared to be Virgin – good for them.
So – if we disappeared yesterday, and you use Virgin, or in some cases BT sorry, it was not us. If you use neither then hey – we can finish our morning coffee having learned something : )
As a member of the iomart group plc – we are connected around the UK by our own fibre ring linking over 10 data centres (and thus exit points – and peering points) in the UK, and now Europe and America. We also deliver fibre circuits to amongst other the Police. If you have any questions regarding resilience, networks, security – or indeed connectivity, peering, fibre – let us know : )