The plenary session commenced at 9 a.m. as follows...
CHAIR: Good morning, all. Let's get started since the programme is a little bit crowded. We have two different topics, the first one a panel headed by Michael Behringer, followed by a talk by mark us on his colleague. Just one note I'd like to, the RIPE NCC has asked me to bring to your attention. Later today at 6 p.m. just as the last session of the day ends, there will be a BGP assigning in a room somewhere around here. If you want to be there and sign or get your key signed, there is information about the key ring, how to plug in the meeting site. I hand over to Michael.
MICHAEL BEHRINGER: Thanks. Good morning, everybody. Thanks for coming, thanks for your interest. I hope you are interested. Network complexity is a subject that has kept me on my toes for the last year or so, I am coming from the security angle ?? not security in the traditional space where you put firewalls and, I keep to network up and running, when I talk about security it's making sure the network is available, not attack able, security on routers and stuff like that. If you do that, we have a number of best practices we recommend, what you should do, and frank ly, if I hook at our own recommendations that we give as a company, and look at we make you do, it makes me cringe, it's very, very complicated, lots of stuff to do, every platform is different, and guess what, that sort of leads automatically to the topic of network complexity. We have a number of technology securities, only one of them where complexity is starting to kill us. And so this is what this panel is about. We had organised two workshops, small scale invitation?only workshops, not publicly announced, where we tried to get people in the room to discuss the topic, what is network complexity, there is no definition upfront, I should say that, we have, still, no ?? sort of the elephant thing, you know it when you see it, but hard to describe in words.
In those two workshops that we had, one informs London and one in LA, we had one common problem: We didn't get operators in the room, we had academia, we had people from vendor s, but we didn't have the people that are faced with the problem, you. So guess what, we need to change strategy, that is why we are here, I hope this is of interest, and what I am really looking forward to is the corridor discussions afterwards so please come to me if you have ideas or thoughts on the subject.
What is complexity? Again, it's this elephant thing, you know it when you see it, hard to describe. Complexity already in the meeting room, right. At 59 in the 9 in the morning. Whatever curve you look at, essential ly has this sort of behaviour, without even knowing what that is in detail, you sort of see, things are getting bigger. So the left side is a tier 2 European ISP that helps me out with giving me data, so this is the number of routers and the number s of config lines per router on the HIS network, if you do that for your network I am pretty sure we have similar trends. The right top graph is the image size, the binary image size and it doesn't really matter which one that is but again we see this trend of of getting bigger because we put more features and that is linked to the lines of config on the other side. We put more stuff into your network. The bottom one is our P certificate, our vulnerability announcements over the years and those are also going up and up. Now, I have tried to do a correlation between the top and bottom, like image size going up, vulnerabilities going up, this is doomed to failure, I have given up on that because P certificate announcements are for all Cisco products, they changed the way they did it at some point so there is a dip in there. There is no scientific correlation possible, in my opinion, between two those graphs. But in any case you look at that and you have the similar graphs for your network and we can make the scientific observation, things tend to grow. And that is about as scientific as it gets, I am afraid, we don't know much more. We all know intuitively what that means, we need to, at some point do something about it because if we put the complexity of the operating testimony together with the complexity of the config, then for the human that is behind it it's getting harder and hard er to understand it. The key question I am asking you and you don't have to answer right now: How many people in your organisation actually understand a router configuration top to bottom? You see what I mean? It's getting very difficult. Because there are so many bits where oh, that is the security guys doing that or that is the routing guys doing that and top to bottom there are not ma many people any more who understand the config. This doesn't give me the warm fuzzy feeling I would like to have about network ing. What is this network complexity?
So first of all, the tendency and the knee?jerk reaction that we all have, complexity bad, keep it simple. So everything that is complex, we tend to say, no, we don't want that. Wrong. Let's take a step back, let's Levey motions out, this is going to be hard for the next 90 minutes but let's try to keep emotionings out of this. We need complex ty and we want complexity. Give you a simple example: If you construct any network, you have dual redundancy typically and that is what we all do. If you had only a single way of doing things, a single row of routers, you would, of course, not have the robust ness that you want so you put two of each. It's that more complex than one of each? Of course it is but it buys you something, robustness and your robustness is good. So we need a certain level of complexity to achieve maximum robustness. Complexity, good. Now, I have in my 12 years at Cisco I had one customer that wanted triple of redundancy where ?? I shouldn't say ?? every normal network has two paths, two routers interconnected, they want ed three. I have only had that once. And they thought if two is good, three must be better. I think now you are on the tail end of this curve you have added complexity but actually, probably it is getting less robust.
So, what we are talking about here and what we are looking for is not complexity; what we are looking for is gratuitous complexity. Complexity that has no benefit to the network but complexity that actually decreases our goal.
Now, let's look at network solutions and trade?offs. We all have ?? we all are making lots of trade?offs in networks so one is cost versus scalability and you can design a solution to be at a certain price point and at a certain scalability point. So, say, you have done that is your solution is here on this axis, you now can optimise the solution and you can, for example, go make it more scaleable in one direction or you make it less expensive in the other direction. So you can work and shift your solution within this space.
On the other hand, you can also do things that do the opposite, and that is what we are looking for, so you can do things that don't make it more scaleable, in fact less scaleable and may make it even more expensive. So you are shifting a solution around in space. Now, this is not for me; this is from John Doyle, who is, I would say, the leading scientist on network complexity at the moment, he came up with this curve of optimal complexity, and he says we don't exactly know where this curve is but somewhere there is a curve where we can't go beyond. So if you look at the right end of this space, you can make a solution very, very cheap, it's probably not very scaleable then but you can make it very, very cheap, but there is a certain limit, and you can't go beyond this limit, the same on the left?hand side of this graph, you can probably make it extreme ly scaleable, it's not going to be cheap but it's going to be expensive but you can move it up or down. Now, the point here is, this is your design decision, and any point on this curve is a good point as long as you chose it consciously. So there is no point in saying network have to be absolutely 100 percent scaleable. In my home network, guess what? I don't care. I make a very conscious choice, it's not scaleable but it's dirt cheap and I think that is the correct choice for my home network, it wouldn't be for yours, probably. Everybody has to make his choice. The point is you can't go beyond this curve. And way you work this gratuitous complexity comes in, is when our solution is not on this curve but somewhere further out. Now what we ideally want is metrics, ideally we would like to say in terms of scalability, my network is number 7 on a scale from 0 to 10, and if I invest so much more ill get to 8, and then we would be able to make a scientific, scientifically?based business decisions on how we do network engineering. We do not have this possibility right now and I get this question from you, from my customers all the time, like, so should we go down this architecture trend or that one? And complexity always comes up in these discussions. And you know what our decision is? Thumb in air, that feels more complex, that feels less. We don't have metrics and that is what I really would like to achieve. We don't have that. Elements to consider in this game when we look at complexity, if you want to start measuring at some point, we have the physical network first of all, routers, lines and so on, we have the operator. Now, in most complexity discussions it's operate er does not come in. 70 to 80% of our outagees, though, are operator?based so to leave the operator out of complexity discussions is ridiculous in my opinion. We have to have the human element in it even if we don't know how to handle it at this moment and we have network management third party software that runs your network around it. So what I have come up with is probably the complexity of the network has a component on each of these axis and you could draw a cube out of that and the volume of the cube sort of says how complex your network is.
This is a thought model, it hasn't got us any further so far. So this is all discussion and looking for input, how you see complexity.
Of course, rate of changes is an important one here, as well. So if you have a frequent rate of change you probably want a function to be in the network. If it is a rate of change of once per year, maybe it's better the operator and you avoid an additional protocol on the network so there are these trade?offs we have to make in this thing.
Now, to the survey, a week ago I sent out this e?mail with a link to the survey, 64 of you have responded, thank you very much. That was a pretty good turn out. And here are the results:
So this is the questiones that I asked you on what makes your network complex and so on. In terms of profile you see on the slide here most of you are more than 10 years in the business. I will let you look at the rest. So relatively high grade of experience among the respondents.
So first one: Probably not too surprising, have you experienced catastrophic failure in your network? Oops. 62 have, two have not. I'd like to meet those two.
Of course, we haven't precisely defined what catastrophic failure means. That is deliberate because we don't know, either. Catastrophic failure is subjective at this point but again it's this elephant thing: You know it when you see it. And why am I asking for catastrophic failure? Because one of the key properties of complexity is this emerge enter feature, and that means you have a number of input variables but even if you calculate all the calculation power, all the CPU power that you have available, you cannot actually logically deduce the outcome. That is this emergent property and that is what leads to catastrophic failure. So if you experience catastrophic failure in your network, the cause of it is invariably complexity and that is why we are always coming back to this topic here in this panel.
Other points: What was the cause of this catastrophic failure? A complexity, yes, of course, bug in the vendor ?? I can't imagine, we have never had that. So I don't know where that comes from. Operational mistakes. Yes, I mean, there are lots of statistics around that, and depending on the statistics you get different variable, so not surprising the result here, either.
So what I am doing at the moment is I am gathering examples for catastrophic failure and try to analyse the root cause. And invariably, we are actually looking at a combination of causes that led to catastrophic failure so we are never actually looking at a single one; it is, typically, a combination of things that make a network fail in a catastrophic way.
How likely do you think that is ?? that over the next five years your network is going to experience catastrophic failure? And so unlikely is only a quarter of the respondents, so three?quarters of the respondents think it's possible or even very likely. That is something that I would also guess it is, if I ran a network today, it's very, very hard to predict whether you are safe against this thing or not.
How often do you encounter network problems that can only be resolved by the top experts in my organisation? That is another case of complexity because, ideally, you want the frontline operationers to be able to resolve issues and look at the results here: That is actually more positive than I thought. I expected a little bit more in the frequent ly range. It actually didn't turn out to be that high, so, still, the majority of networks do not need frequent intervention of the top level experts, which is good, that is where we want to be.
Where is the origin located and the configurations have been pointed out. I think we need to drill a bit more into detail here because the configurations only mirror what is ?? well mirror to some extent what is in the operating system so that is a source of complexity, no doubt. The architecture and design is a big source, of course you can have more or less complex designs, no surprise here. And then there is the other category, and other I made a slide with all the answers, and since I got this off the site last night, I didn't have the time to actually go through that in detail and try to find out an overall trend here. I will do that over the next few days. So I leave that here for now and go straight to the last one.
Your definitions of network complexity. So since you wanted to see that all, good luck. I have actually gone through those configs and tried to figure out what some trends and key words in there are, and I picked those out and put them separately on the slide, none of which is too surprising, predictability and so on. So again, this requires more analysis; I will do that over the next days, weeks and post back on the RIPE list for potentially some more discussion. With that, I would like to open the panel. With me on the panel Nick owe, Gert from space net and Geoff from APNIC I think you although know them, I hope. What I was suggesting is each makes a quick statement or view on how they they see complexity and then we discuss a little bit between us.
Geoff, do you want to kick off.
Geoff Houston: I am hoping my slides will appear here. Thank you. So these are some personal thoughts about complexity taking it from a more general perspective, because when you think about the question of what is complexity, it's not clear what we are talking about. A neuron is a remarkably simple thing. Sensors, trance mitt er, fire. On/off. When you put a few billion of them together you get a human brain. Remarkably simple component, remarkably complex outcome. Those images are all what we see as complexity of one sort or another in our world, but the underlying question inside all of that is: Truly, what is it? When I look at this I must admit I get this view that complexity, for me, without being a value judgement of good or bad is actually the fact that you can take a small thing, a simple thing and just whack it together a few 100 million times, many parts in an astonishing intricate arrangement at the time of the original industrial revolution, the spinning loom was considered to be complex. Easy now. Steam engine was considered to be complex. Computer systems, complex, networks, complex. And that complexity is actually all about, if you will, putting things together in ways that inter react with themselves and with the environment in ways that are not obvious. There are a number of ways of looking at this and the first thing is: If you want to engage in meaning ful interaction, you need complexity. If you want to have a network that understands how to heal certain kinds of outages all by itself, you need complexity. If you want this ability for a system to understand its own operating state, and adapt to it it, you need complexity. And you need complexity. If you want a system to have a huge number of state outcomes and respond in different ways to different inputs, you are going to have to have a certain amount of complexity, without that you might as well bang the rocks together. In other words, a certain amount of complexity is the only way you are going to address meaning ful problem.
But complexity has extraordinary cost, and the cost equation s are often a nonlinear. At some point, the system becomes remarkably difficult to operate, and requires astonishing specialisation and actually reduces the amount of flexibility. Don't you find it amazing? I do, that we build networks, today, with a tool that is only slightly different to the job control language of IBM computers of three decades ago and we have this optimistic outlook that we can individually configure each one of say, 1,000 or 2,000 elements by hand, each router with its own individual configure and somehow have some faith, which in this town I will even call religious, that says that the outcome will be coherent. None of you configure a network; awful you configure routers. But what you want is a network?based outcome, what you want is an interaction that is systematic and coherent. It always strikes me as amazing that that ever happens. It's almost despite our efforts. And some of this issue is that complexity is really difficult and we don't have the tools that actually manage it. Because programming individual elements of a network by hand with, if you will, their own JCL is not a way to actually make a network. And at some point complexity works against you, and if you happen to be quantity us it can get really, really bad because at some point when you create systems that are that that intricate, that is there that intermeshed, that, if you will, are that reliant on every individual component operating within certain parameters but the system itself, we don't quite understand, then all of a sudden you can get catastrophic failure from individual vulnerability s in there. So the ugly part of complexity is, at times you create systems where you cannot limit or bound the failure.
So from the abstract, let's go to the specific. IP was never meant to be what it is today. It really wasn't. IP was actually meant to be the dream of simplicity. What you are really meant to have is a remarkably simple network. In other words, you are meant to push things out to the edge that inside the network was forwarding. I look at the packet and the destination, I figure out where the address is, I figure out from my individual forwarding table where to chuck it out to interface 1, 2 or 3 and send it out. It was meant to be astonishing cheap and flexible because everything else was not the network's problem. Who was meant to actually do the flow control? Who was meant to actually ration the resource, to figure out that I can send a packet and you can't yet? It wasn't the network. In the original IP dream, that wasn't part of it. Everything was off at the edge. This whole idea of where do I send the packets, this DNS rendezvous, was not an attribute of the network. How many packets do I send to wait to see before I made it. Not an attribute of the network. What about traffic integrity and fixing out what happens when datagrams got lost, into the property. All of that was in the end systems and applications. Complexity such as it was was just pushed off the network. Great dream. We haven't got there. We have got to some other place. Because everybody, everybody, gets paid by aDoran ment. You are not doing your job properly unless you add things to it and whether you are a router vendor or a hardware vendor of end systems or a browser person, no matter what you do, you adorn. We all adorn. And what actually happens that we actually start to create astonishing complex systems?
MPLS I still think was not the world's best idea I think personally. That degree of isolation of taking the underlying forwarding elements and put another layer of forwarding between that and IP level was a remarkably weird model and understand its operational parameters and when MPLS starts to get self healing you get very strange outcomes and some remarkably strange failures. The whole idea of doing fast reswitching and rerouting, so everyone is trying to heal an outage, the network level, the application level, the applications itself and all bits in between, all are trying to attack the same problem simultaneously, often creates more failure rather than faster healing. And then when you add into that things like QOS and resource control, dynamic networks and NGNs, what you get are remarkably complex networks that are astonishingly insecure, but more to the point, astonishingly unpredictable. Well that is bad. And then you add to that what actually happens in edges today. And look at what is going on now with even the simple things like the DNS. Issues how does rendezvous happen, what do we do about flow control and traffic security and traffic integrity, multi?protocol and so on. Life is astonishingly complex out there and I am not sure that it's needed to. It's just where we are.
As far as I can see, as an industry, we have actually taken the old Telco model and reproduced it with accuracy our business is complexity because that way we charge more and you guys, you amateurs can't do T we are professionals, we do this for a living. It's not necessary, but it's what we do. And it's the way we, if you will, protect our territory. So it's an astonishingly good business mod he will and we charge a lot because you need to have clever people to understand this. Oddly enough, there are very few businesses that used to be around 20 years ago. Business is based on simplicity and minimal ism, all I do is buy some carriage put the router at either end and flick the packs. Because quite frankly it's really, really hard to do a business model based around minimalism and simplicity, it doesn't sell. And I am not sure there is anyone business today that is able to do that. So, complexity, I think, is actually part of why we are and where we are. And the outcome of all that is, we tend to get networks that are now remarkably difficult to operate and oddly enough for users, the failures get remarkably hard to debug. So that was my sort of personal view of where we are. Thank you.
Nick owe: I am going to keep this short because I think we also want to open the discussion with the ?? one thing to say you are impressive. I mean the transcript, I was just looking at it, unbelievable, how you manage that and it's complex, different type of English and accents and you handle that pretty well, quite impressive.
I think we have touched on most of the topics, I think all of us in the service provider industry or Telco space have had some form of catastrophic failure, it's often trigger ed by a bug, sometimes we know about the bugs and sometimes we don't. How much do they know, do they don't know. There is always a kind of a right mix of combinations that will trigger the bug, you know, is it triggered by you, by the human, customer, traffic, who knows what. And then I was kind of discussing with Michael and we also thought what is behind that? We know the network is complex, we know that we need to have, some sort of complexity, that we try to lower by using tools, but at the same time the tools abtract the network from the human, not is really good, but what we found out is let's go beyond catastrophic failure because we know it's going to happen, what does it take to resolve. How quickly can you find the root cause. All of us at some point, either to 10 to 12 years in the industrial, we have a layer 2 broadcast domain and packets looking around, at least a local network, a country or even European wide, how quickly can you resolve complexity and when what can you build into your network to resolve situations, and we had one which was sort form of ? I am not going to name the vendor and, an ice berg we didn't know, the vendor knew about it but didn't think it was a security issue P what happens then when you have a layer 2 broadcast domain which covers Europe and where do you find the packet that is triggering it? So I think complexity is about building in safety mechanisms, what is your capacity to sniff in the network, to find where the thing is starting and how do you stop it? So it's also not just the plan of the network and trying to kind of model the network complexity, but at the same time, how do you have mitigation solutions in place, like with DDoS, when look up a solution for 10 gig sniffing, one thing to be able to detect, another to respond. Kind of written down at some point before the incident happens. So this is also something that is part of network complexity.
But the issue we see and going back to the example about the aviation industry, we all know a plane is complex, looking at the system as a whole not just a engines, the system is evolving, the passengers have USB connectors in the seats that can inject some crap, even so it's kind of a split system but you know the system is finite, the system ?? the Internet is evolving every nanosecond, something is happening, so how do you manage these type of things. Some of my views I wanted to drop on us and, you know, use them later on maybe in discussion and Gert, over to you.
GERT DORING: Hi, everybody. I am wondering about whether I should stick to network complexity or just attack it from a different angle. I am Address Policy Working Group chair and we have complexity there, as well. We do changes to policy. We think just a small tweak here is going to do us good and something there explodes. So that is complexity as well. Inter dependent things. Coming back to the network thing, I have been running the network at the company I work for since, I don't know, 13 years or so, starting from two routers to sort of like a medium?sized network today. I still do understand all the router configs, all lines in there, but I have trained new people on the job and some of them have been working pm this network for four or five years and still don't understand all the lines in there because it's not part of what they need to do every day and if something there breaks, they may not be able to fix it because there is too many protocols going on and too many interdependent things and if you don't have the specific experience with all of these bits, fixing a problem is hard, so our network might be a bit too complex to get new people up to speed and this is something I am a bit worried about.
Job security is nice but being able to go to a vacation if you ?? and leave your telephone at home, is even better.
MICHAEL BEHRINGER: So, I don't quite know how to do this best now. I have a few questions on the list over there and the first out was to let the panel ists answer those questions and see what they make of it. They have already partly answered those. If you have ?? maybe if you look at the questions and look at one where you have a clear answer, that is probably the best way to go about it, and then let's open the floor to questions from you guys. Do you want to start?
Nick owe: I think I answered all of the three in the previous intervention. Catastrophic failure, we had two, one was US ?? triggered broad cost main loop across Europe, bad packets at the right point in time or the wrong point in time, the human mistake, somebody doing a nice ? IS IS which blew up the backbone. Those are the two things we had. What makes it complex? For us I think it's the size of the network, because it spans 14 countries, and the fact that we have everything in, all the technologies and layers going up, you know, to voice IP service and applications and having the ability to understand all the cross depend an sees is very difficult. How do you contain single ?? what you said Gert it's about education and making sure the engineers understand what is going on. I can I think today the way most of the organisations are organised there are some technical stylos between the groups and you can't have one person toe understand end?to?end, you have to find the experts in the company to fix it. And I think quite a number of telcos or CDN providers are using service activation to service assurance to try to abstract the complexity of the network. Maybe it turns the human into a dumb user or dumb operations person but at the same time it helps making sure some of the mix takes we have seen in the past that triggered some of the failures, don't happen any more.
GERT DORING: Me again. Regarding the example of a catastrophic failure, I have thought up something that I think many of you might have seen in your network at some point and that is Cisco VTP VLAN tracking protocol. You have a network with lots of VLANs on it, you have configured a lot of things on the lab switch and you clean it up and remove all the VLAN s from the lab switch and unless you have taken care to disable VTP before, if you take this and connect it back to your production network, it will tell everybody else I have the most recent configuration revision and will remove all VLANs from your network and then you are dead. If you are a luck y you have a lab top because you won't be able to connect to your switches and reconfigure the VLANs, you have them safe somewhere so you know which VLANs to add to it. So ?? the root cause of that is basically an operator error connecting VTP enabled switch to VTP enabled network. But it's only half of it. The other half is is an unexpected protocol ?? well sort of like, you have a very powerful protocol and you plug in one single cable and it changes the configuration of lots of other devices and usually if you notice like OK, I should have not plugged in this cable, you unplug plug it and undo the damage but in that case the damage sticks. So this is sort of unexpected behaviour which amp ifies operators so you cannot easily undo the problem. What we learned from that was disable VTP because it's too easy to make that operator error and it happens to many networks again and again. VTP is convenient in you don't have to go to every switch to actually create the VLANs but the danger is higher than the convenience so we decided that we don't want to do that.
What makes our network complex? The problem with complexity in our network is that we runabout every protocol that, somebody found useful in places. So v4, v6, MPLS, layer 3 and all of it. (Layer 3) the way we try to get these ?? to stop this complexity from breaking things is all over the place is isolation. We try to keep protocols from depending too much on each other, so we don't run IPv6 over MPLS but basic native and parallel, if it breaks IPv6 won't be affected. I hear people telling them I shouldn't do this because it is so great but I know I can teach people thousand debug, figure where the route points, without needing to explain to them how MPLS works so the amount of knowledge that people need to have to debug something goes down if you keep the dependencies low. And then there is of course, isolation, like in we don't run dynamic protocols with our cuss er routers so if a new engineer starts up, OK go configure the customer router, if it breaks that is bad luck but he won't affect the backbone then. So errors are isolated. If that person gets more experience he is per MIT ed to configure at our network edge and only really experience to configure the core routers so the amount of damage he can do gross with the amount of he has and that has proven to be quite useful.
Geoff Houston: Catastrophic failure. I work for ten years in a Telco and obviously, the instances of catastrophic failure were probably daily. And it's hard to actually bring out a single instance rather than litany of them, but I must admit there was one outage that lasted for three days that I thought was particularly pleasant. Catastrophic failure because it was in an MPLS based network where the label distribution protocol failed in such a way that everything else worked in terms of the operators' diagnostic. Ping worked, traceroute worked but for certain customers under certain conditions the labels weren't working so the customers rings up "I have a problem," the operators say you ignorant bastard, the network is just fine, watch me ping, aren't I clever, for three days. What was the failure? Obviously there was a failure in the distribution protocol but in the operational procedure. There was a failure in the way the operators were looking at the network. There was a failure in the way they dealt with customers. They were being a might too Australia yen about it. And you sort of look at this and think well, what is really happening? What was going on to cause this kind of behaviour? The first thing is catastrophic failure is by definition, catastrophic. You can't plan for it. It just happens at you. Bus because that is what catastrophic failure is all about. And it happens because the systems push way outside its design parameters and you know it starts to operate in a mod that was unplanned. There was no design manual left. You hadn't anticipated that mode of operation. So, why do we get there?
Does anyone truly run a network? I actually think we don't. If you are ever luck y enough to go and visit cape Canaveral, you will see a sat enfive rocket lying on its side. The bottom stage was made by one contractor and the other by another, every part of that rocket was made by the cheapest contractor for that bit. No one built the rocket. There were a whole bunch of individual vendors building particular components. Doesn't that sound like your network? Do you have a single homogenous network designed from scratch? Rubbish. You actually have multiple vendors giving you lots of components, you have multiple operating groups each operating what they think is the network, multiple product managers placing their product on this same sort of system and because they are all cheap Bastards you actually have the one network for multiple customers, each doesn't get their own individual network. And the last thing you have, is you don't have a network; you have an evolved mess. Some things were bought years ago. You are constantly picking up bits of legacy, shoveling the bits around and changing it, so what you actually have is this this remarkable glob of materials from many different folk and no single point of understanding its behaviour as a system.
What is the root cause of that? We are cheap bastards. Every single time we get a new requirement we bolt it on to the old network, change few components and press on. What you get is this continuous layering of functionality on to the same sort of core and things become horrendously complex and so the some point you change out core at the bottom, operating this evolved top on something completely different underneath, going to something else. So when you look at a network like that and you think, well, what is it? The answer is, well, it's a strange beast. We actually have no single design. We actually have no coherent architecture. We actually have something that seems to work most of the time, and sometimes it fails. How do you contain it? By employing people like you and me. We are professionals at complexity. We sit at it and go we are comfortable with this. Yes, things fail and there is very little you can do about the really bad failures but the rest of it we try and work as best we can and try and put up with it.
MICHAEL BEHRINGER: So, I wanted to say now we understand where complexity coming from, probably we don't, but we have talked a lot about it. Where do we go from that? And so I would like to ask you guys one more questions in this direction: What do we do now? So we have dug ourselves a hole. How do we get out of it or at least make the hole more comfortable and after that, I open the questions to the floor, so if you have any questions, then please ask, but first this question, how do we get out of this before we get to the open mic. Who has an answer, how do we make our life more predictable.
Geoff Houston: The last time a hole of this level of complexity was 20 years ago, using IP and getting rid of that stateless circus switch theme. Sometimes you can't add more weight, bulk and complexity to what you have. You actually have to go to a different and revolutionary new architecture. You kind of wonder at some point where that network ing technologies actually do have a generation al span and at the end of it, something else comes along. And I suspect that we might be looking at that here.
MICHAEL BEHRINGER: You are looking at things like ?? clean slate as a solution to current problems? Which is essentially what we are doing in our networks as well, right? Every now and then a company sort of says OK we build our next generation network now, let's do away with the 15 different bits and pieces we had before and let's make one consistent thing.
AUDIENCE SPEAKER: One talking about being the cheap Bastards. Most of the this talk has been about technical issues. Sometimes there are restraints coming from the business side, you buy a network, you want to connect them, you can't afford to pay for the lines, you end up tunneling MPLS over IP and you are solved sold to the devil basically. So it's not only a technical problem.
The second one is about complexity. Have you got router configurations and one thing I am amazed still seeing how networks has evolved, when I compare it to software development, a one KMC, that started being too complicated to keep your mind how everything ?? you invented functional programming, the model you use to attack the problems in software development has been evolving, there is a research dedicated to how to tackle the problem. In network designs and operations that doesn't seem to be there. People still look at the actual configurations in routers and some of them are particularly unpleasant, some of them when you define a neighbour, the chunks that the affect how that relationship between the routers works is spread out in little bits throughout four or five different places in the con ration so it's not even visually helpful, much less conceptually helpful. Why is there no work in that sense?
MICHAEL BEHRINGER: So basically, abstracting a network going one layer up and treating it as a block, yes. And that goes along with what Geoff said earlier, we are configuring routers and not networks.
AUDIENCE SPEAKER: Anand from the RIPE NCC I have a question of roam ma /KRA*EU university. Complex systems are always managed by splitting them into modules. Software engineering community has a lot of experience in doing this. They measure complexity and ask for high cohesion and low couple ing. Also, they clear define interfaces among modules. Can this be useful in network ing and can this be adopted in network design?
MICHAEL BEHRINGER: Does anybody want to take that? I think we are trying to do that, we are just not there, but if you look at modern network designs you are clearly trying to do modules of networks. Now, they are still interconnected and interpreter linked, probably too many dynamic ways but the fundamental desire to do network engineering in modules and blocks is there, you probably need to get better at it.
GERT DORING: I think that is pretty much what I tried to say about isolation of things, run protocols in parallel, not stacked on top of each other. Try to keep failure isolated blocks in your network, like if one POP blows up it doesn't take the backbone with it and things like that.
Geoff Houston: It's fine this in theory but in practice it doesn't work like that. You never never get the opportunity to build a clean network very often, you get given a legacy that someone else has ended you and a tiny budget and you try and do the best you can with the pieces you have and often the results are really weird and sometimes the problems are really weird. Who ever thought that running dual stack v4 and v6 was going to be so dam difficult? But when you get down to it it is really difficult, and the outcomes for poor old customers are incredibly complex and hard and that is a simple problem. So functional decomposition and moderate is a ?? I will go again, such cheap bastards that we can't afford that, and, instead, if you will, we do the best we can within the limited resources and the results are chaotic and less than clean and the rules are often horrendously complex as a result, but we are happy, oddly enough, to accept that cost because none of us can afford to build it again from scratch.
AUDIENCE SPEAKER: Deutsche Telecom. I wanted to comment a little bit about what makes our networks complex and how to control the damn thing. Well, one factor that hasn't been named this way that contributes is that, well, OK, these times people are asking for many services and, well OK, that is kind of the Yes, sir easiest and and cleanest way to describe that additional commercial requests are to be satisfied and well, OK, not in a single way. And that has direct bearing and kind of cannot be completely avoided if you don't question the business model.
The other thing that, in my opinion, contributes to a very large extent is that, well, OK, there are so few damn few people around who remember that IP is extremely simple and, in this simplicity, actually extremely powerful and can do almost everything in somewhat reasonable ways. It's not what ATM was claimed to be, the universal solution of every problem on earth. And looking at how people are dealing with stuff, well, OK, actually going for a for simplicity is not really being appreciated any more; it is much more interesting for people and people are much more use to go into sessions with huge PowerPoints that show they are extremely clever. Unfortunately, operating the stuff that is of a PowerPoint, if it actually works, requires extremely clever people in the operations, and operations usually are not paid that high salaries to actually do that. So well, OK, actually, kind of a turn in culture would be needed to change that, well, how to control the thing, in the ancient capital of Rome, of course the sentence there is...
Unfortunately, unfortunately, the wise words from academia that tell us what ways to actually divide are there, are not really that directly applicable. There are many ways of how the thing can be sliced and diced and ?? there are many people that need to be employed like introducing higher levels of abstraction to deal with stuff, so indeed, I do not understand the definitions that come out of my policy tools and no one in operations is supposed to understand what comes out. We are supposed to understand what goes in and to check that, actually, the translation is right, just as one example.
Nick owe: Picking on some abstraction and looking at the application side, when you look at it a number of consumer applications they have tried to abstract network completely from the user. I do face time with my kid, this is the greatest thing on earth, how does it abuse ?? by pass firewalls and NAT and so on, when I look at what ? did with IPv6 or Facebook you actually start to use a layer at very, very high layers to completely abstract the complexity of the network but at the same time you had another layer of complexity. What if face time breaks? Do I know what is broken? Is it my local link, the IT server, my NAT at home, my wi?fi, we have tried to simplify the things a lot. When it works it's cool, so user?friendly, when it's broken no one can debug that. So there is also things, all those, things I will bypass, all those abusers contributing to complexity, not at the level we are talking at because we tend to focus on the network but really at the application layer too.
James Blessing: It's not just as simple as throwing away the network and starting again because I guarantee the minute you do that you are marketing department says oh we can't stop selling this service and our vendors come up with this other fantastic thing we want to introduce which will add even more complexity than we had five minutes beforehand. What we have got to do is keep going on. It's human nature to get comfortable with things. It's very difficult to throw away stuff that has worked for years and it's one of the problems to get to a system that is less complex, you have to give things up.
MICHAEL BEHRINGER: Good point. So when we speak about clean slate I think there is absolutely no illusion that it will stay simple, even if the first is simple, it will grow aagain from there. These curves that I how wide showed at the beginning will take off on a clean slate design.
Nick owe: It isn't the MPLS the answer to that ?? we will have one single MPLS across all those networks. I said it.
Geoff Houston: Would a simple 'no' suffice or do we have to go into this? I often think it's kind of bizarre that sort of every layer of the protocol stack and every sub layer and component, think I have to solve everybody's problem. It's kind of forget it, you know. They were datagrams, it's OK. And I don't need really, really fast healing, applications are clever. I haven't got 60 gigs of memory in this astonishingly fast process to have a really smart network. The network can be crap, it's OK, my application can understand that. So part of I think this issue with oh I need to do everything in MPLS and get faster, I need this, this need, bizarrely, that the computers that are using the network were built in 19 a 50 and have no blood y clue about network ing. We just try to see, if you will, the true environment and everyone want to optimise and possess their part of the answer so they possess the greatstest amount of money. Relax. You know, let other folk solve part of the problem and you will get a more coherent and I think healthier system. So no, I disagree. Don't solve my problem in MPLS, thank you.
MICHAEL BEHRINGER: Having said that, though, I personally one of the designs quite like is a large transport infrastructure, MPLS based, where essentially you put the complexity at the edge so at least the core of the network can be simple. What you do is, you still have the complexity in the network, of course, but you push it to a place where you understand it Bert and you control it better. Frankly, I sympathise Geoff, it would be nice to have simple networks but that is history and they don't exist any more.
Geoff Houston: You and I could have had that conversation 20 years about the glories of the switch network and both of us would have been completely ignorant of the fact that at some point, some research guys were doing this stateless packet switch thing that reduced the cost of network ing by a factor of more than 1,000, and that was going to wipe out your job and mine. I don't think we should ever be complacent about complexity, and we should never be complacent about cost because somewhere out there, someone is saying, I think I know how to do it cheaper and simpler.
MICHAEL BEHRINGER: And that would be a clean slate right and that is OK. That is part of the system. What I am talking about is the networks as they are now. If a clean slate comes along and a new model arises, that might well pass out, that would be a clean slate in that sense. We are running pretty much out of time now. Was there another question? Then let's finish with a positive spin on that. If you look at airplanes and cars, those are extremely complex systems and I don't think anybody would argue that. Guess what? We, but somebody made those things work. And while we hear about catastrophic fail newer an engine of a 380, guess what, the thing landed and everybody survived. That is what matters. You could argue that was within the design spec. Overall the system survived and I think we have managed to make complex systems like airplanes fly as rely ably as we do,
Mike Hughes: Just one point with your aeroplane analogy, you don't rebuild an A380 and reply, we ??
MICHAEL BEHRINGER: Absolutely. What I mean is we need to learn from what others are doing, I think it's a dual.
Nick owe: Again the aeroplane, one thing the aeroplane is like 2 or 3 systems, two different languages and teams, constructions and coming from software industry usually. In the network doesn't mean we have to introduce dual vendor everywhere, except for the engines where are we going to stay single vendor. I mean dual vend certificate, a network is sometimes we need but at the same time it decreases the complexity a lot. So I mean, I didn't want to open that kind of in my introduction, I think it's also something we need to keep in mind going forward.
MICHAEL BEHRINGER: With that, let's finish. I see that as a starting point, not as an end point. Please, keep the discussion up. If you have thoughts, send them to us, to the mailing list, I don't know. We need to keep this discussion up and running. I actually opened a WIKI. There is an animation thing here. This is where your technology defeats you, too complex, you know, and it doesn't come up for some reason. So there is a WIKI that I opened to discuss the problem, network complexity .org, it's a WIKI, you need to create a log in but then you can add your cases and your definitions, your problems, statements, that is sort of the idea about it. So if you have views, mailing list, keep the discussion up. Thanks.
CHAIR: Thank you very much. And now we have a battle of the optics.
MARKUS ARNOLD: So ?? Marcus and Thomas from Flex Optics. We are the founders of Flex Optics and I wanted to take the chance to answer questions from yesterday evening, I think I got it 20 times so I don't have to have the whole week, so yes, those caps, they are made by this man, and no, you can't get them. Yes, you can get those necklace s.
So having said that, we want to come to simple ity, we want to ask to you solution to add some simple lis ity to artificial complexity we saw in the past. Complexity for very, very simple small thing which is really these optics. So, we came across this problem or about this complexity about two?and?a?half or three years ago and it was like we have those cool standards there for plugibles, all defined, well different MSAs for all those kind of plugable like exterior and so oranges in the end it was a nice idea, it was simple, it was standard ised but then when you really dig deeper into it it gets more complex because the thing just didn't work, so coming back, maybe, to the title of the presentation, which is the battle of optics, I think this is going back about two years where we had the first slides on the D nothing which is the German network group meeting. I think I have to say thanks to Stephen for the title. That is really kind of weird. We have the standards there and people get out there, everyone wants to get in that is interested and we are ending up with a master for really, really simple part for the network.
So, again, so when should you listen to us here? That is when you can identify with one of those problems we have to run the slides, to the right. Which is really interoperability or compatibility of optics or when it comes to if we ever saw some cool standards out there that you wanted to plug in your system that didn't work or at least didn't work at a ?? as you expected them. You have some limited function A a optic out there for 30 bucks but you have to pay 300 which is pretty much the same thing just with a different label. So that is kind what have we stumble ed upon and we thought there must be a simple solution for that, coming back to the idea that you can have a plugable optic working in every system there.
So we both started in dramatics so the core thing was for us, it's a problem you can solve with software, it wasn't software only in the end so it was a bit more but later on that, because when we had the slides a few week ago in Edinburgh, we got a good comment, it shows something working and have all the same slides later on so I will do that or Thomas will do that.
Just a few words again. What we did there in the end is we have some kind of hardware solution called the flex box which is in the end the interface of the user to interact or to communicate with the optic and we have kind of a distributed software database or software system where all the intelligence is in there to make this optic being identified and, well, changed.
Thomas: So, yes, the simple idea it's a web page so it's just a normal web page. What we did there, we developed part of a module to connect the USB device to your browser actually, so we have this device and when you plug in a simple SFP now it's going to be red and shows up the technical specification s of your trance receiver and the simple configurations which you can make out of it, so if we just going to swap back, press the button, that is it, that is the whole procedure behind it. There is nothing more to show actually, it's so simple, I have to swap back over. But to be honest, when we count everything together now, the main thing to get the simplicity in that tooling actually was 50% of reading specifications just to get all bits and pieces together, specification of USB and I square C in the specifications done under four plugable modules. What we figure out, they didn't read the end just the summary at the beginning, this is how this complex system started actually.
What else happened? We put a little bit software in for sure to get everything running. We got some service side application server which is run in ruby together with framework and we used Java Script and C Sharpa to get access to the USB driver and so on. At the beginning we thought the hardware might be really trick y to get everything up and running but it's so simple at the end of the day, it's just one chip so it's only those 10 percent actually, it with a was not that much.
So, we also thought, OK, it must be simple so we started over to do something else. Hey there are some nice products out there as well, who might simplify, and this is a prototype we developed last couple of weeks. No windows does it implement beside tuneable what we saw so far and transport networks. This is called stuff you can have 80 channels, of P 10 gig running in your network but it's not provided at the moment of your core equipment and so we abused actually our flex box and said we can do this tunability as well and again it's quite simple: If you have 80 channels, select whatever you want, press "tune" and it changes the channel. And that is it.
MARKUS ARNOLD: So coming back to the initial things, in the end there was a big part of complexity definitely in it so there are, well the specks are not two sides and it's not only the specks it's so many different protocols and things you have to take into account, well starting really from simple how to implement that in a web application, how to make a hardware really work, really hardware based, communicating with web server and make all this really working and make it simple for the user, really, to to it, because as the telecom guy said the guys in the operations they are not that ?? they are not paid that high, they don't understand or they normally, they have a time where they don't understand what is in this small piece of hardware and we just wanted to keep it simple, just plug it it will be identified and then you can choose whatever you want it to be.
So in the end what is it really? Again it is a simple thing. In each of those optics there is a big part that say 95 percent of it is standardised,, which is necessary for the optic to be identified by the system. And then in the end, for different systems you have to add certain pieces of information to make it easier for those systems to understand what kind of optic you plug in. For us it was really a big effort to implement a functionality to identify what did plug in the flash box, I can put something in there and see is it an SFP or multi mode or multi?or gigabit only, so I understand system when it's really to put in there some additional information to make it easier for them to identify the optic but because not every trance receiver, switch or router needs to really understand what optics you put in there. And I think we have, I think we really found some kind of easy solution to that and in the end it is not really to have cheap optic replace some gigabit, it is to enable new optics, some exotic optics to work in your systems you have already deployed, units or switches out there or some new SFP plus based switches, the window does knot support it at the moment or ER, but it would be cool for to you build up up some interesting networks, it's a mess to get their application notes to, will just have to switch with the SR, put another converter at the outside and just plug their SR optic and a standard DWDM optic in it and it's the solution there, much more handy to put the optic directly in the switch.
So it's really for us the main application to make things work which didn't work before. So, we have only ten slides so we are pretty done. The last thing is, some new project where we are looking for some, we are contributing partners, we showed this project already at the DNOG and SWENOG, nothing to do with compatibility but definitely has to be something with trance receivers, Thomas that is your part again.
Thomas: It's also a little bit about simplicity again, especially in terms of do you want to know when your network switch off the light actually so this was a question we got, we can't answer at the moment but a lot of customers ask us do you know how long an optic will last in the current set?up, it says will run 30 years we see that after five years some of the optics die actually. We started to say there might be a solution to figure out how you can calculate the actual D day of optic where it actually will die at the end of normal operating lifetime, and we make their heavily use of the digital monitoring information which is in the optic and some data mining techniques and algorithm and it started in September, there are some contributing partners already, so if you want to know how long your optics will last in your network just join us and we will work on that, hopefully we will have the first results mid?next year.
MARKUS ARNOLD: Just to summarise that, because I wouldn't understand you if I wouldn't know the project, in the end it is really collecting operational data from optics, which is, for example, the temperature or the optical power budget you get from optics, you get this information from the optic and really collect this data over a certain time period, over various networks and then, really, to find out if there was a correlation, if there is the operation temperature is maybe getting higher and/or when you see the power budget of an optic is decreasing, then you can have the assumption that it's ?? that this optic will die some way in the future and that is really to find out some correlation between maybe the temperature, maybe optical power budget, the life time of the optic to have a prediction when a specific optic might die, just to have an idea when to replace somewhere in the next years and therefore we started with some of the partners to really, well, get this data, whatever, every five or ten minutes, have to collect it in a network by some scripting, I think the thing they used was rancid, if there is questions afterwards, you can get to Thomas. There is some script based things which takes all the data from the sports, store them somewhere and our algorithm will dig in and find out if there is something ?? some information in all this data we will collect there. So it's really just a start of a project. We don't know where it will lead. I think we will have operational somewhere in the middle of next year. So it's just an idea to discuss the next day if one is interested to participate just let us know. So that is it.
Questions? No questions. OK then, applause.
CHAIR: Would Lorenzo be in the room now is never mind. It's time for the coffee break now. Be back here in half an hour for more interesting content. Thank you very much.
Coffee (coffee break)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.