Routing Working Group ?? 2 p.m. ?? Wednesday, 17 November 2010.
CHAIR: Hello, welcome back, it's 2 p.m. so let's get going. This session here is the Routing Working Group. The cooperation Working Group is taking place in the side room downstairs, if that's the one you want, well then, you are in the wrong room, so you have been warned.
So, let's start with the agenda. Thank you for coming and being here and participating. I'd like to begin by thanking the RIPE NCC for providing once again the minute taker, and our IRC gateway, Anand. Thank you very much.
Whenever you want to say something to the room, please do walk up to the microphones so that people who are listening remotely can actually hear you. And when you do so, also for the mainly for the benefit of remoters, state your name, so people can find out who is talking and saying what.
The minutes were circulated from the previous meeting were circulated sometime ago. I haven't seen any comments so unless anyone has anything to object, I declare them final.
Thank you very much.
One more point that's not there that I just wanted to mention. Throughout the most recent years, we have had three co?chairs for this Working Group. Unfortunately one of us keeps being tied with other activities and hasn't been able to attend, participate or contribute in a significant manner recently. So, the two other chairs have decided to propose to the Working Group that he, this person, ceases to be a co?chair. We have notified him of our intent a few months ago. Does anyone ?? we are talking about Joachim Schmitz, for those of you who remember. So no objections there.
What you see on the screen is the proposed agenda. There might be a little bit of time for AOB, if any of you have any other items to bring up, otherwise I suggest we follow this agenda. It's also published on the web site so you don't get lost.
And with that we can just kick off. And the first talk is by the RIPE NCC, Erik Romijn, about the experiments and how to analyse what's going on in the network with the bridge that they provide.
ERIK ROMIJN: Good afternoon, my name is Erik Romijn. I work in the RIPE NCC as senior software engineer. And as I already said, I will be talking about instant analysis using the RIPE NCC tools. The case study that we are going to discuss has already been covered in the RIPE Labs, who actually read that RIPE Labs post already? That's not too many people.
I have some new content as well so you'll find it interesting. First a little introduction into the tools we offer. We have the routing information service which collects BGP updates from about 600 peers worldwide and it stores everything. It has just over ten years of routing history now. It currently stores about tens of millions updates every single day. We have the DNS monitoring service which monitors the critical DNS infrastructure of the Internet, which means route and TLD servers. We have about 100 vantage points. Every server gets 120,000 measurements every day. And as you can see on the slide, this particular server didn't have a very good day.
Then we have the test traffic measurement service which does one way latency, jitter, loss and trace routes, spread worldwide wide. They're in full mesh. They system does about 18 million measurements every day.
So the case study here is the RIPE NCC, and dual stack university did the experiment. I'd like to start a bit more generally about these experiments. Basically RIS has been started about ten years ago and we have a long tradition of supporting research. We were just after Geoff the second AS in the world to announce a 4 byte AS number helping in the Quagga development for 4 byte S support. For the last eight years we have been providing beacon prefixes which follow predictable on and off patterns. People use this to look at propagation of routes, the speed of that and things like that. We also use ?? where we test the feasibility of recently allocated /8s by IANA. And we do some other things like we did the ?? an experiment with one /8 as part of the debogonizing where we looked at how much traffic was actually attracted to addresses 1, 2, 3, 4 and 1111, which was later extended on by APNIC.
The BGP experiment we conducted on 27 August this year. And basically we announced a route with an optional transitive BGP attribute. It wasn't a usually large one with 3,000 bites. It was completely valid to the RFC. There were no errors in the announcement. However, some other routers, due to a bug, actually corrupted it, then passed it on, which caused their peers to recognise the corruption and drop the session. And that in turn caused disruption to traffic. The announcement was there for about half an hour using one of our prefixes we also used for debogonizing. After withdrawing it we saw there was negative impact from it. We immediately started extensively investigating. It pointed, after a while, based on input from people who were affected and what we could see from our own systems, that this was Cisco IOS XR specific. And as far as we know no other router had this problem. At that time, we sent out a very extensive announcement with hindsight maybe with too much detail, slightly, regarding security aspects. We also provided Cisco with all the details on what happened, what we sent, what we saw. And they very quickly released updates for IR6R.
Now, basically to see how this actually happened. This is a small subset of the Internet. And basically, everything works here and at some point our announcement starts. And as usual, this prop gates through the rest of the Internet and this router here was actually a custom Quagga that announced the custom attribute. This router, which is our own collector connected to AMS?IX and all the other routers have no idea what it meant. But as they follow the BGP spec, they will announce it along with the prefix to the rest of the world. Despite not actually knowing what it is.
Now the problem as I said was this router actually had a bug and this resulted in all these peering sessions, sending out corrupted message on all these peering sessions. And in turn, the other routers followed the BGP spec and they Tore down the sessions. Now, with BGP, of course within about two minutes I think one of these routers will try to reconnect and they will repeat the entire exercise every, about every two minutes, maybe more often even, until the announcement disappears.
And then things go back to normal again. The goal of the experiment, it was triggered by a request from a research group at Duke university, they were involved in how transit attributes worked on the Internet. Their intention was or maybe still is to use this for secure routing for the idea of putting certificates in actual BGP attributes. They do not have an AS number. They don't have space, they don't have the facilities so they asked us to run this to support their research. They provided us with a modified Quagga, a BGP demon and that is what we run.
Basically, what we expected to see here is either of three things. Ideally, if everybody was RFC compliant, the route would propagate with the attribute intact and we would see it at our other collectors and everything would be fine. The other option would be that we expect is that the route would propagate but that some ASs would just drop the attribute. Or, that they even propagate along a different path because some ASs just do not propagate the route at all.
We have actually done something similar, or actually it's part of the protocol for 4 byte AS numbers, where optional transit BGP attributes unknown to many routers are successfully used. In the early days of that, A and B were actually seen. Some routers would drop the attributes of course in violation of the spec but no other ill effects were seen.
So, based on the information we collect through all our tools that I showed in the beginning, we worked on estimating the impact of the experiment. What I should add here is that a lot of this data is based on RIS and RIS only sees external BGP traffic. We may not see things which happen in IGP. So this is not guaranteed to be complete.
So basically, you can see in our graph that the blue highlighted area is the time at which the announcement occurred and you can see a little blue line in the bottom here and this is the amount of unstable prefixes on a scale from 0 to 100. Unstable here counts as if a prefix announces more than 100 updates within five minutes. So this was the impact on the Internet roughly summarised. This is the volume of e?mail that was received about it. In total, we have seen 280 e?mails spread throughout the day.
Now, zooming in a bit on the previous graph, we see that initially there was a very high peak and the peak is about 1.4% of the prefixes on the Internet. We are talking about 5,000 prefixes here, nine times higher than normal. You can also see that after the initial peak, there was ?? it stabilised a bit at a still much higher level and it quickly recovered about 15 minutes after the prefix was withdrawn.
Now, some prefixes actually became completely invisible. And this graph looks at providing that a prefix became invisible during a two?hour time frame, how long did it take for it to come back? And this is actually a normal day. This is July 30, from 8 to 10, and we can see that it's actually pretty common that prefixes stay invisible for less than 5 minutes and as the time gets longer, this gets rarer. The scale of this graph is about 500 prefixes at the top.
If we look at some other dates, we can see there is some variances, these are two Fridays and the day before the experiment at the same time of day. If we add to this the measurements at the time of the experiment, we can see there is actually a massive increase in the invisibility from 10 to 30 minutes. That's not surprising of course because that matches the time of the duration of the announcement. So, if you look at the, of roughly 10 minutes invisibility, we can see it's about five times as much as normal.
We have also looked at how the critical DNS infrastructure held itself under this, so we looked at DNSMON and that actually showed that none of the route servers were affected in any way. There was no visible impact from the view of DNSMON. Almost two?thirds of the TLDs showed actually no effect in the graphs, nothing at all. There were some minor effects for about 40%, as in a few dropped queries for some of their servers. Two of them actually had more significant effects. One of them is this. This is ?? the bottom server is one of them. You can also see some effects here. The thing I should stress here though is that at no point .FR became unavailable due to this, because this is DNS and it is designed to handle some of your server's failing. We can actually look more at why it became unavailable. This is BGP viz, our visualisation tool, you can go to our website and query this and this is looking at the prefix which contains a.nic.fr around the time. Basically, it shows us how all the routes changed around this time. So this is the graph as we saw it from RIS. It will let you look up who are folks this upstream is, which is, I believe, the friends research and education network. You can see behind it all the ASs changing as LINX break. This is the other upstream of them. The entire video you are seeing actually took place in about a minute and you can see some things that ASs ?? the link disconnects and they reappear behind another AS, behind a longer path. For example, this one here, this AS 9002 was very much closer and lost a link. This AS was connected directly here but then appeared behind here. So basically, as links breaks the paths again continuously longer which may in this process, also involve some unreachability.
We can also at this from the angle of TTM, which monitors the full mesh between 100 nodes and we see here at the time of the experiments, green is good, not green is bad. That initially the, all connectivity was actually lost. This is a measurement from the CTM node in Prague to our node in Amsterdam. This is the latency, so normal ?? it's about let's say 20, but here you can see that the latency appears here which is significantly higher around 125, as far as get through at all. And the red line shows the number of hops and you can see clearly quite some instability during the experiments.
Now, this is the actual number of updates we received corrected per amount of prefixes for IPv4. For IPv6, it looks a little bit different, actually nothing happened. So we could say that nobody who actually runs IOS XR deploys IPv6. Or maybe a bit more AK credit, the BGP sessions that's planned in this actually probably did not carry IPv6 routes, which is pretty reasonable because that is the common case to separate them. Another interesting thing is to look at the number of prefixes that became unstable and the total volume of updates. So we can see while the number of affected prefixes was going down, the number of updates was going up. I have not found the cause for this. Somebody suggested route flip dampening to me. It could also be some of the prefixes managed to stabilise on other routes. But in the meantime the propagation gets wider so other prefixes get more affected. We are not entirely sure, but it's definitely interesting. We can actually see the same thing here in our TTM graph, if we look a bit closer because we can see here that initially it was actually quite bad. No packets were getting through at all. However, during the rest of the experiment, the packet loss was more about 20% until everything came back after the experiment was withdrawn.
Now, we have collectors in about 15 locations and this is the amount of updates that we received in each collector corrected for number of peers we actually have there. And you can see Vienna being there at the absolute top, the v6, also clearly elevated are Milan and the DECIX. If we look at withdrawals, the picture is suddenly very different. Because here Milan is much ?? is the very largest exchange in number of withdrawals. The initial expectation was that these Internet exchanges simply deploy a lot more Cisco hardware. So, we looked at this. These are three Internet exchangeses which are not so much affect and we can see, this is based on ?? we can see they do 50, 60 or 42% Cisco. If we combine this with two that we saw sending a lot of updates, actually there is nothing different about them. Vienna is noticeably very large in Cisco but DECIX is not at all. The reason we think is behind this is not necessarily all Cisco devices are all Cisco IOSes XR, normal IOSes had no problem with this. In addition it might of course being the fact that the fibers are not connected to the Internet exchange which is close to it.
So, we picked up a number of things from this. Of course the event in full that are regrettable. Basically what we have picked up from this is that future experiments will be announced with sufficient time for operators to make sure that people know what's going on. We also intend to have more careful handling of vulnerabilities. The initial announcement, that we sent out, we sent out because we thought it was important to get all the information there, because there were a lot of problems happening and we thought it was very important for everybody to understand what was going on. With hindsight, we should not, maybe, have given the full details on the vulnerability.
Also we plan to have more extensive impact assessments before we do these experiments in order to try to prevent this unfolding. If you have any input on that, it is very much appreciated. And that is the end of my presentation. Do you have any questions? I know you do.
AUDIENCE SPEAKER: Randy Bush, IIJ. IOS XR is a normal Cisco image. Since you didn't announce the bad attribute on v6 sessions, you know, since everybody runs separate sessions for v4 and v6, as Sander says, that's why you are not going to see it in v6. Thank you for kind of hinting and saying oops sorry, that's normally what we do, we cause a mess even though, you know, other people participated in making the mess. It's hopeless. You are not going to be able to anticipate what happens when you do something different because there are many vulnerabilities in routing code, in Cisco, in Juniper and in all the other things, it's just end user you should try ?? we tried when we conduct experiments but it's hopeless, so when you feel something being able to react quickly, as you said, is what's critical. You are going to tickle bugs out there, that's why when there is a new release, we stick it in our labs four a little while before, but the live net is always different.
CHAIR: Anyone else with questions for Eric?
AUDIENCE SPEAKER: Well, okay, just for ?? Ruidger ?? just for for a little additional information, what we are seeing is ?? well, okay, unfortunately broken BGP implementations. As Randy pointed out, we have to live with that. And well, okay, the next step is unfortunately the BGP standard requires tearing down sessions whenever anything seems to be wrong, and as been pointed out last week at the testify by one of the groups at the point in time when BGP was invented that seemed to be indeed the right and appropriate thing, which is something like 20 years ago. Today, things need to be handled more resilient and in IETF, we are expecting appropriate changes. Nevertheless, it is extremely important to get advance notice of experiments so that help lines and operators can anticipate and quickly resolve, and actually, a half hour experiment pretty well analysed resulted in the fairly quick bug fix, well, okay, the next time something is done, we are not sure that circumstance will be as nice. And the impact, I think, I think for some operators probably has been worse than what you have been seeing in your statistics. Fortunately I think for my network, there were some boxes on the path which prevented the problem to spread to me, but I am sure there were different places.
AUDIENCE SPEAKER: Daniel Karrenberg, RIPE NCC. Just to be perfectly clear, we actually stuck this in the lab. We were just unfortunate enough not to have an incident of the broken thing in the lab and maybe you might want to spend 30 seconds explaining what we did in the lab before we went ahead.
ERIK ROMIJN: Roughly what we did is we set up the testing, the modified Quagga. We obviously looked at the batch and checked that there was nothing suspicious in there. Based on that, we verified the output of the packets to make sure that everything was actually good in there. Then ran that through another BGP incident and made sure the output of that was valid. So, that is level of testing we did, but we do not have an IOS XR device, so we have run no testing against those in our lab.
AUDIENCE SPEAKER: Randy again. You can't have everything in your lab, A, B, if you had ?? in this case, if you had an IOS XR in your lab, you'd have caught it. In other cases, you never catch it until it's in the real net. There is no bug free routing code. The level of bugs and what it takes to expose them.
AUDIENCE SPEAKER: Dave Wilson, I haven't heard yet from I think anyone whose network was directly affected by this. Ours was, quite severely, for the time of the experiment and I want to say, keep up the good work. It's a great relief to me that this was encountered first in a situation where a well known entity, a neutral entity was doing it and was willing to both mitigate quickly and document thoroughly afterwards. There is nothing more that we could ask for and I appreciate the steps you are taking to try and reduce this in future. I just want to be assured that this won't actually stop experiments happening in the future which will again have a positive outcome.
AUDIENCE SPEAKER: Nick Hilliard from INEX. On a slightly peripheral, or from a slightly peripheral point of view, I think this particular experiment makes a very strong case to ask vendors to provide virtualised router images so that you can run their router image on your particular hardware platform, because let's face it most of them, or an awful lot of them can be virtualised you know using dynamips or whatever. The router vendors are not going to lose Revenue because of this, they will probably gain Revenue because people gain more experience with their products and learn to like them. But if they had done this and if Cisco had a virtualised platform available for end users, this probably wouldn't have happened.
CHAIR: Thank you very much. I hope that ?? I mean this is clear that RIPE NCC have ?? unless some lessons are learned. I am hoping that the router vendors also learn and not just produce the patch but also integrate these into the future testing environments for new releases. So overall, I think the experiment has resulted in an improved Internet. After all, also, when we ?? if you were here for the complexity panel, the Internet is a complex system and the only way we have any hope of understanding it better is by conducting experiments like these and definitely the consequences of not doing them would be much worse than the occasional mishap that we might come along the way. So thank you very much. I think the whole Working Group expressed the support for the work you are doing.
The next speaker is...
SPEAKER: Thank you, I am going to talk about exploiting router programmability to ease routing and traffic analysis. Just before to get to the main point of this presentation, I just want to introduce ourselves and our groups. We are from Roma Tre University, computer science and automation department, and our research group who works on network and visualisation. Mostly on routing and BGPs in particular. Our most famous project is probably some of you already know is BGP play that is available at routeviews.org and we also collaborate with RIPE NCC for lots of years.
Router programmability opportunities. Well, the idea you can put your code within a router, allows you to avoid to have another box around, first of all. And so to it reduces costs. And have a proper event handling directly where the event happens, so really responsive, and a dynamic configuration change, and probably other stuff. There are several tools, options. And some are script based, so interpreted languages. Some others are, relies on architectures that allow you to write faster, fast code, let's say compiled code, code that can handle traffic, not just routing events or something like that.
We started to be interested in these things after starting this collaboration with Juniper and with the idea that we can provide some research based on the Juno SSDK, their software development kit and this started in April 2010. And we also have some support from Casper for that.
Projects that are active at this moment at the university. I said about traffic and routing, so about traffic, we are trying to exploit the features of the Junos SDK for traffic matters computation, this is an ongoing project so the very nice objectives I think is to have two comments, one on the configuration and one to get the output, that was ?? I mean, at least in the simplest user you can imagine. The ideal result of this research. And considering routing, we are thinking about our system that allows to check for service level agreements, the violation of service level agreements that are expressed on BGP, which which usually are not really complicated, it's not clear they are complicated or used at all. Because of the fact that cannot check it or because they are not useful in business.
So, starting to talk about the first project.
I will be very quick and, routing ?? well traffic matrix is basically to have, to represent the demand that that is there now, what you are external to your network, ask to your network in terms of what have to be routed and where. And this is needed ?? of course the output is a matrix, and you can imagine these are all the POPs and so you have POPs from and to and you need it for capacity planning, for traffic engineering and so on.
Well, state of the art is very large. And first works are very old. There are surely nice tutorial around in the Internet and yesterday we had a nice presentation, was at least in part dealt with this topic. And for people who were not in the Plenary yesterday, I provided here a quick review, only a quick review of course.
Well, there are several ways to compute the traffic matrixes. And one possible ?? there are some solution relies on on computations that are performed on the samples of the interfaces, of the counters of the interfaces. Well there are in literature, some studies, comparative studies that provide you with errors and there are some problems with the solutions like needs to tune some parameters, you can have even a very large errors. But you don't know that because you need the real traffic matrix to compare, and you can do, for research but it's not so easy to do in real life. So, this is ?? well, these are nice piece of research, but it's not clear if they can be easily adopted in real life.
Well, if you ask to a vendor to, that you need to compute a traffic matrix, we probably get one of these two results. And this is, well this is the easiest way, just provide a full mash of tunnels and read the counters. Well, this is one option. The other is to read the counters of forwarding /KWEUF enter classes. And it's not clear if one ?? well there are arguments to say that you may not want to do that, so, you are ?? probably those you know those arguments better than me.
So, there is another prospect, but this is the easiest one. Well, what I want to do is just to count bites destinated to a certain next stop. Well there is no way to do that in current architecture. Why? The idea is this: The forwarding engine of a router do not need BGP next HOP, it just needs the next op, so the forwarding engine does not have the information to do this computation within the same speed of the forwarding. So, what ?? so cannot do that at the same speed of the forwarding in a regular router, and so you can do that outside the router. So for example, the traffic, v5, v9 sampling, etc.. so yesterday, they found out that v9 was probably the is simplest way at this moment to compute the traffic matrix. I agree with him. And of course you have a router load versus precision trade off, and in the literature you can find also these. It may happen that what you get is an under estimation of the measurement.
Well, what is our objective? Our objective is to put some software within a router that independently computes one row of the external traffic matrix. Well, with the idea to have easy activation with one comment or a few lines within the configuration and easy retrieval of the data from the router with a common line interface or SN M P.
Well, we focused of course on the JunOS because this is the tool that we have at our disposal and Juno, native can define destination counters and you can apply them to one interface. The interesting idea, the interesting stuff is that you can ?? that these configuration is supported by JunOSDK, you can change this kind of configuration. But support goes beyond also you can put some special rules, counter rules, and we will talk a bit about after.
So this is what you can configure on a plain Juno,. Probably you know the syntax better than me and you can put filter. In this filter you can have terms and which are formed of two parts, conditions, and what to do when condition is matched. And condition can be many things, but also destination addresses which mean prefixes. And the things to do, the action can be many things, for example drop, but also with a counter name.
So, the idea is very easy, very simple, and you can use one counter for each BGP next hop. And basically, each counter is a cell of the whole traffic matrix and each router computes a row of the traffic matrix and you end up to have surely many terms, one of our counters, one for each counters and surely you are going to have this chance to have a huge amount of prefixes in there. Well this sums up to the 300,000. You can optimise, so you can have, you can take advantages of security distributions of next hops and there are papers with optimal execution of these operations. And we, of course, tested this on the, on a GARR routing table where you have about 370 counters, which means that actually much less are really meaningful, but for the moment we don't want to say that. This is another optimisation to put aside counters that actually count ?? that are not very interesting because there is not very much traffic.
Well, the interesting thing is that using routing programmability you can update those terms so that you track the RIB changes basically. And ?? well I would like to explain this on the architecture. So the idea is this: You have a router, and you are the forwarding engine, a very fast one, she is an ASICS basically, and the counting rules are in the form like those that we have seen before, and those are, hopefully, we'll talk about it after ?? hopefully as fast as routing, as packet forwarding which means that they also are interpreted by ASICS. And then we have that BGP D sends BGP updates and then updates basically of the association of the prefix with the next hop, with the BGP next hop to a piece of software that computes and optimizes for having this counting rules updates, which are then inserted within the as six, indirectly, but they are end up in the a six. And then you have counters that can be retrieved.
What we have now is a prototype that provides these interfaces that is within the router, is implemented within the router, and you have an external software that protypically performed this computation.
So some criticalities. First of all, you have all this strange filters, counting filters, and within your configuration, you really do not like that. Do you have it? No. Because as SDK allows for it in filters, so regular configuration is not touched.
So other problem: RIB changes very quickly. What do you have? A full commit of a configuration every time you got an update. Well, at the moment sort of. In the sense that we have quite ?? well, let's say we have old hardware and on this old hardware that we have, we took about 8 seconds to process a bunch of updates. We took batch every 10 seconds and 17 is the biggest batch of rule updates we have. And for a whole RIB it will take 40 seconds. And the hardware is this one with, M 7 I with old C F B F.
What about throughput? What about precision of the counters? Well, actually, no ?? we observed no packet loss and we observed the 3 seconds hole in which there is no counting. But this is a quite old hardware and actually with a new hardware, we expect that this goes to let's say 0. But of course tests on the new hardware would have to be performed. And we can not say to the final word about this problem of performance penalties, because, well we are in a university so we don't have so much funds, so our hardware is just to keep an Ethernet. We already found a lab where we can provide some tests on ?? well ?? if some of you is interested to be involved in just can talk to me.
So, second project is about BGP monitoring which means usually collecting best routes using Quagga etc., or collecting old updates using BGP monitoring protocol or other techniques that we have published here. Actually, for certain applications, maybe you don't want to collect anything. And then avoid the collector. For example, for BGP sla, you would like to ask one of your provider that I want to reach a certain destination directly from you, directly for 99% of the time, and this is can be really useful for people that their computational power remotely. For example, for cloud computing bandity software. So the objective is to analyse BGP updates as they arrive on the which are, to this, CT reconstruction, so BGP using open implementations. And express SLA service level agreements, with proper language in a router, and the report level agreement violation.
So, at present we have a prototype which is implemented with other techniques, so actually it is implemented externally using mirroring, so using traffic mirror, selective traffic mirroring. Someone said traffic mirror are actually hurt the functioning of the router. We didn't see that. The idea to have in the future this feature within the router so no need for mirror, no external additional system and report the service level agreement violations on syslog, let's say a summary by e?mail or variation on that.
So to conclude:
So basically, router programmability provides new opportunities for doing things that you previously couldn't do, or doing different things, maybe more costly, I don't know. For example, for monitoring, for example for custom services, usually without additional boxes. Usually with ago a performance because we have direct contact with the router so no need for moving data. And capability to talk with the has he can in some way. Of course there is a risk in doing this. And it is the risk of the new technology, basically it's the risk of the research. And this can be for example mitigated with with collaboration of the university, and Juniper sports this form of collaboration and we talked about this model of collaboration with our third parties, I mean university, Juniper and third party in a talk at my presentation in the /TER even a meeting this year.
This concludes my talk. Questions?
AUDIENCE SPEAKER: Lorenzo, Google. I'm intrigued about the performance characteristics of the next hop counting. I know you only tested on fast Ethernet, but have you done any back of the envelope calculations to figure out if it will scale at all? Because, it seems to me that you have to do a linear scan of the BGP routing table on every packet. Because the filter is not a try, right? It's a linear match.
SPEAKER: You'll have to ask Juniper for that. I don't know how this is implemented. It's a piece of hardware that does that.
AUDIENCE SPEAKER: Juniper hardware is not magic.
SPEAKER: I know. We asked Juniper. And do you think this is visible? And they said well it's a quite unusual application, but we think that it can be possible. And ?? but you have to test it. So, this is ?? well, the main ?? the main next step is to test this on hardware with real load, real sort of real load because otherwise there is no point in doing this research. But this is ?? we are at the time that it's worth putting some effort to make this test, I think. This is what I can say and I have the same curiosity that you have.
AUDIENCE SPEAKER: Surely a back of the envelope is easier than a test. A back of the envelope calculation, presumably you can say, okay, if it's a linear scan then it's going to be impossible because it requires bandwidth that's not available in the commercial ??
SPEAKER: No, you are right.
AUDIENCE SPEAKER: So then you can make ideas on, you can make hypotheses on ?? because it seems to me that it won't work at all. I mean because ??
SPEAKER: I think, that these kind of filters I are implemented as a try. I think.
AUDIENCE SPEAKER: But the try doesn't have the concept of ordering.
SPEAKER: Do you need a concept of ordering?
AUDIENCE SPEAKER: You do because the filters are in order, right?
SPEAKER: It's possible, it's possible. But ?? well, we should think about this.
AUDIENCE SPEAKER: Just a quick comment on that. As far as I know how it's implemented, it uses Juniper's destination class usage stuff which is not a linear filter it scans through. It's hardware counters that are implemented.
SPEAKER: But you have a limit on the number of counters, so ?? I don't know.
AUDIENCE SPEAKER: I suppose testing is straightforward. If you were using a platform which has forwarding engines for 1 gigabit or more to pass packets on fast Ethernet, so that you can't even get close to what it's supposed to be able to do.
SPEAKER: Yeah D no, I am sure. This is just a functional test.
CHAIR: Thank you very much.
Next speaker is Nick Hilliard on RPSL which has been a traditional subject in this Working Group.
NICK HILLIARD: Thank you very much. I am Nick Hilliard from INEX and I want to talk about RPSL and this is actually something that follows on from last year where at the Lisbon meeting, I did a bit of a hatchet job on IRR tool sets, some people might remember the presentation. I think it was actually the only way to wake people up on a Thursday morning. But actually this was some interesting data in the presentation. Very simple analysis of the frequency of the different types of IRR DB objects, that's the linear frequency, you can see there that INET NUMs are well pretty pervasive. Domains obviously you want DNS for your INET NUMs as well, but, you know, the things like route and INEX 6 number, they seem to be dropping off fairly quickly. If we plot that on a log RIS Mick scale, it's quite interesting. You can see that things drop off fairly quickly. It's more or less completely logarithmic. It indicates that the IR db in some respects is actually a roaring success but I don't think it's a success in the way that it was really originally intended to be a success because the initial vision was to create a system of writing a public policy which would allow you to, you know, do everything from configuring your routers to toasting your bred in the morning as far as I can see.
So, it certainly has uses and there are a lot of very, very legitimate use cases being an Internet exchange head I think it's wonderful because it allows me to do exactly what I want to do for route servers, but there are an of other service provides out there who use it to a certain extent. Some use it for limited means, just documenting existing routing policy. Others write really complicated RPSL expressions which will describe their entire routing policy. Pipe that into IRRToolSet and just bang it out to the routers which is quite cool as well.
But, the user experience is generally pretty poor. Most people don't understand RPSL terribly well and don't use ?? only use tiny amounts of it. Most people would also agree that RPSL is very necessary. We need some way of publicly documenting our routing policies. And I think probably because RPSL is too complicated, we only have one semi?functional RPSL implementations which IRRToolSet. Now there are several others. Marco Deitry has RPSL tools, Richard Steinbergan has written another RPSL whose name I have forgotten, it's written in php, it's very good. They only implement a very small pub set of RPSL and for more complicated stuff, it's not ?? neither of those is particularly good.
Getting back to IRRToolSet, we have a really serious problem with it it because we are network engineers and we are just a bit stupid about coding. Or maybe we are just a bit stupid, I don't know which it is. But none of us understands IRRToolSet. We can't figure out how to add in new features. Fixing problems is difficult. And debugging, I mean I have spend days trying to figure out how on earth to debug it and I just have no idea, absolutely no idea.
But more importantly, well there are two extant programmes in IRRToolSet which are still of relevance today,. PFL is an RPSL evaluater and RT Config is a tool designed to pull in your routing policy and spit out a routing configuration. But the approach is completely broken. The way it should be done is that you should have a sort of a reasonably complete library which would allow to you hook into a templating system and, you know, that would allow you to sort of you know take your RPSL, you know and use one at the moment place for producing your Cisco Ios configuration, another one for using Junos, a third one for using XR, whatever, but we can't do this because RT configure is witten in C plus plus, a very static configuration and it's a bit of a cow actually.
So, a better approach would be to start off with probably with script bindings and link that into a template tool kit of some form. Template tool kit by the way is also refers to a pearl module which is actually really nice. I like pearl. Does anybody else here like pearl? Okay. We have some pearl lovers. This is wonderful.
But, taking a step back. I think we actually have a really serious problem with RPSL as well because the specification is really, really complicated, and that means that it's really, really complicated to pars, you need to be a computer linguistics person to be able to write a parser which will do a good job on it, and frankly, that isn't our business and, you know, all we have left are monolithic dine source. We also have another ?? various other problems. Okay it's quite complete in its own way but there are lots of things that it doesn't have. It was specified more than ten years ago in its current form. And the original specification goes back into the early 1990s. Now, you know, things have moved on since then. We all use MPLS, we all use VRFs. It doesn't cope terribly well with interior routing, IPv6, it's okay, there is a lot of ?? the specification isn't too bad for IPv6 but the implementation in the tool kits is slightly messy. It doesn't do really, really funki modern BGP stuff like I don't know, max. AS limiting and enforce first AS and BGP default. You know the sort of stuff that you have in your original configuration, but, you know, it just doesn't seem to be in the RPSL specification.
So, where do we go from here is the question? We have probably got two options I think. The first is to simplify the language. And there are very good reasons to do this. The first is that we could write good quality tool kits and actually make them comprehensible, and also this would enable end users to be able to use the language but also to be able to use the tool kits. But of course, there are various deoptimisations. It's going to break the language as it currently stands, and it means that the original vision of having RPSL is the this system which would allow you to do complete router configurations. That just isn't going to work and maybe we need to ditch this idea as a precept for the project. Alternatively, we could go in, dive to RPSL and add more stuff to it and take something that's complicated and make it even more complicated.
This would make it even more difficult to write tools. It doesn't solve the problems for the end users who, like me, find is just difficult to get my head around you know all this sort of M P export and sort of long lines of configuration and sort of spreading all over the place. On the other hand, it would be backwards compatible and probably nobody would break.
I think we have to look at this. I think we have to spend time and probably a little bit of money looking at this. A good starting point would be to see what we are actually doing with this already. Whether the easiest thing to do of course would be just to trawl through the RIPE database. We do know that there are a lot of organisations out there who have private RPSL databases and who do, you know, interesting things but it's not publicised on the RIPE database. So to a certain extent, although there is interesting stuff in the RIPE IR db it's not a complete picture. We have to realise this. If we could get a handle on what sort of things people are using it for. Then we can get an understanding of how to bring it forward, how to make it something that's going to be more relevant and more useful for today. It's going to take a little bit of research. It's going to take a bit of work to look into it. It's the sort of thing that would be good to, probably good to hand back again to a university department and say well look, you know, we have this issue here, can you do this on the basis of a research grant of some form?
And as an end goal, I would like to see a language that end users and Internet operators can can use for managing the routers and for managing what they need to do on the Internet. We need to end up with a language that it's actually possible and relatively easy to write tool kits for. And I also think we need to ditch our 100% solution that we have today, because the 100% solution is actually only ending up as a 10% solution. We need to go for a 95% solution that will work for quite well for that 95%. And the people who need more complicated stuff, maybe they can write their own RPSL or something like that and RPSL tool kit and keep with with what we have already.
So, at this stage, I want to throw it open to the floor and solicit some input on what people think, what direction we ought to go in. Whether there is interest and whether people see value in having a look at this problem.
AUDIENCE SPEAKER: This is Shane Kerr from ISC. So, I couldn't not come up to the microphone. So, I have a lot of experience with RPSL. I started working on it it back when I joined the RIPE NCC about ten years ago, and it is actually one of the worst written IETF documents I have ever read, the standard for RPSL, it's confusing, hard to follow, incomplete, inconsistent, you name it, it's a mess. So, I think anything that we can do as far as standardisation, would be an improvement.
I think we can possibly do both of your directions without actually ?? I don't think ?? sorry, I am getting a little lost here. We can simplify the language at the same time as expanding it. As you mentioned it, parsing RPSL is very, very difficult. It doesn't have to be that way. A lot of attributes have their own language, which the beauty of the predecessor to RPSL, the RIPE 181, that was quite a simple language in terms of parsing and I think they decided with well we need something a little more elaborate and went in the same. So I think ?? I don't think the ideas of making it simpler and for feature complete are necessarily at odds. Of course, simplifying means changing the spec, but you know there is already some discussion in the database Working Group which I hope people will go to tomorrow, which actually I think will break compatibility with RPSL anyway. So, it looks like we are already heading in that direction. I don't think it's a bad thing necessarily. So, I am not ?? I don't think we should be scared of that. I think ?? there is other things that we can do as well. Perhaps trying to not just look as language, just a declarative language but also trying to figure out how it fits into the overall process, what the administrators use it for, that sounds a little weird, what I mean by that is there are things that a computer can look at the Internet and figure out and it can look at the routing system, you can look at your routers and things like that. You don't need to write declarationings of that. So we could set up systems that automatically do things that could be automated and just have this for just the core parts of public policy declaration that you want to do. And the final thing I have to say, sorry for taking so long is about the idea of doing it in a research environment. I am not necessarily opposed to that as a way forward, it's certainly a cost effective way to do things, but the current IRRToolSet was written in a research environment. And what I have seen throughout the years is that code written in a research environment is really good exploring new ideas, but in terms of making a production quality sustainable, maintainable software is probably not the best direction. So...
NICK HILLIARD: Thanks very much. If I could answer just a couple of the things that you brought up there, Shane.
The RFCs are pretty grim. There is no doubt about that. And in terms of the evolution from RIPE 181 plus plus to RPSL?NG which is where we are at the moment, there was just all sorts of stuff stuck in there. Some of which is really, really interesting and you know produces some really nice results but it's fundamentally difficult to manage. Like, for example, advanced regular expression, optimisation and stuff like that, you know, that's ?? that's beyond the limit of most ?? of an awful lot of coders in terms of its relationship to sort of theoretical computation and stuff like that.
In terms of the research project, yes, I would agree. The current implementation came out of research. And I am not trying to do down research here. It was a really, really good way of examining a particularly interesting problem, and it came out with, you know, the sort of the reference implementation which was intended, you know, to be just that, the reference implementations and then there would be other implementations which would be easier. Then everybody looked at the spec and thought oh my God, what can we do here? And as a result, we have some very, very partial implementations, some of which are very useful and produce, you know, 70 to 80% solutions for an awful lot of people.
In terms of the research I am proposing here, I think that an awful lot of people in the operator community are kind of too busy to look at it themselves. If we can get somebody else to look at it and give some sort of third party opinion in terms of, you know, what it's used for, where it is, maybe how to simplify the language itself or how to modify the language to make it more attractive to coders and to end users, we don't necessarily have to worry about them producing more code in the future. We can write our own code so long as it's simple enough for us to do so.
SHANE KERR: That makes a certain amount of sense.
CHAIR: When you say we can do this, are you volunteering?
SHANE KERR: Could you ask that again? What?
CHAIR: I think Nick could use some help here.
SHANE KERR: I saw this on the agenda and thought that sounds really interesting but I really didn't know what to expect. I am going to wait till the line clears before I make any commitments, so...
AUDIENCE SPEAKER: Daniel Karrenberg, RIPE NCC: Full disclosure, I think I am co?author of RIPE 81, RIPE 181 and I think my name is unfortunately on the first IETF document, it describes RPSL. So, I could sort of be described as having a vested interest here.
Just to enlighten you a little bit on the history is that, yes, your analysis that when it went into RPSL, a lot of stuff was added that didn't make it nicer or more easy to use, and I have to confess publicly that I was, I think, more or less assigned as the person to keep it clean in that particular Working Group. And I failed. My excuse is at the same time the RIPE NCC grew from like 20 people to 45 or something and I had my hands full elsewhere.
So having said all that, I think discard the thing. I fully agree with what Shane said and what you implied, it's way beyond its time. And it's probably a good idea to look at something new. What I am a bit worried about is that I don't think you make a distinction, or you don't look at the purpose first and make a distinction between two purposes that I see here. One purpose is to make a language that's vendor independent and multiple router to specify a system of routers and we had this discussion yesterday in the complexity part, where we say, where Geoff said quite rightly, we are configuring individual routers and individual ASs in the Internet and then expect the whole thing to be sort of consistent, converge very fast and be not complex. That doesn't work. So, one way to attack it is to say make a language where we can at least configure a system of routers. It may be even across Windows and my scope that I have in the back of my mind here is an AS, right. That's one purpose. Quite clear in my mind and it was sort of in your slides at least I saw it there.
And then there is the other purpose that RPSL, especially RIPE 181 was designed nor is to publish what you are doing is that you can coordinate with other ASs, so that other ASs can can see what you are doing. And I have a slight suspicion that one tool for both purposes might not be what we want. So I would encourage some discussion in that direction. So if you indulge me for another ten seconds. I think actually engaging academia is a good thing. You know, the people like, people who just talked like those kind of research people, they are interested in really applied research. So if you find people like that, then that can be useful and certainly I think if you guys think that in the RIPE NCC could play a role again, you know, after having failed slightly, then I'd be happy to talk about that, with you my main point is think about the purpose and maybe one tool to ?? to configure a set of routers and to publish between ASs is probably not going to be good serving both purposeses that's my main point. Thank you.
AUDIENCE SPEAKER: Dave Wilson, HEAnet. I am ?? this is an interesting one for me because most of my job is operational or development on an operational networks, but a bit of my job is research networking, and trying to develop the kind of stuff we think we'll need to use in the future. And we have run into exactly this. In fact we have run to RPSL itself while looking at this. One of the hot topics at this minute is virtualisation. I am not going to lay what we are doing out here, it will take all day and be very boring. But, we did find ourselves that if we were trying to get ourselves into a state where we can somehow use the virtualisation technology there is there today to somehow turn a handle and IP network POPs out, which is in principle doable. We need some way to describe T we thought great there is a language that does that already and there is even a toolset to turn out the configures.
SPEAKER: Oh how naive.
AUDIENCE SPEAKER: Then we saw your presentation and we had another look and our developers took a look and went, well, I would say what happened most of all is IRRToolSet and RPSL, as it stands, falls between two stools. It is exactly as Daniel described the it's, in principle, capable of doing both, but in practice not the right tool for either. What I did want to say is, we should think carefully about separating them. I think it might be the right thing to do but I am not yet sure, because one thing I would hate to see in the future, if there is mysterious future where we come back in five or ten years and our jobs are no longer about provisioning links and numbering them but provisioning networks and handing them out and we do this on an automated basis, it would be a pity if we have some interface between the description of those networks internally and what we end up putting in the RIPE database or whatever replaces it in the future.
AUDIENCE SPEAKER: I think I can say some of the problems were already identified when we were working with RPSL in the context of the RIPE database, I think it was pretty clear that RPSL is trying to solve two problems. It's trying to be a tool for documentation, which is supposed to be human readable, though it doesn't succeed in that way too much. And it's supposed to be a network configuration tool, right, and last year, from what I spoke to some network operators and apart from exotic cases I think this part of RPSL which is supposed to be for network code configuration is not that difficult. So, I think that might put this direction in simplifying RPSL and make paths that are supposed to be used in network configuration machine readable and easily parsable and still preserve some of the communication stuff but don't pretend that you can pars and understand in a manable way.
From the database point of view, I think we can look at actual practical uses of the RPSL. We can look at query problems and Fry to figure out how people really use RPSL today.
CHAIR: Thank you. Maybe a few of us need to get together after this and discuss. There is a question for you Randy. Do you want to expand on this or do you want to give your two presentations?
RANDY BUSH: I'll skip my presentations. I have given those presentations twice elsewhere, you know. Look them up.
RANDY BUSH: Could the people in this room who wholly configure the routers in their network so that you don't touch them with a keyboard in a multi?vendor network, please raise their hands?
There was one over there. Yes. Okay. Glad to see.
I have been doing this, I was the poster child for RIPE 81 in '94. I have been beating my head against this thing for over 15 years. The American idiom is: You are trying to make a silk purse out of a pig's ear. Okay. Dave Wilson touched a critical point. I cannot, for business reasons, publish my routing policy. So, the router configurations are built from internal data. They are not going to go on RIPE's database, period. Okay. So separating those two is, I think, critical words to making something that's useful. But then I think you have to say useful for what? What do I really need in RIPE's database as an operational community? I think there are two sets of things. Those things that RIPE operational community feels come on the public side of the line of what we are willing to publish to be able to run the community successfully. And those things which are discussed in other Working Groups to attest to ownership of resources. And that's about it. Could you pull up your second slide please ?? actually the third if you are counting the title.
Cannot say anything about that curve basic Matt 101, it is not a curve.
AUDIENCE SPEAKER: Ruidger /SROBG. Well, okay, I probably need some help to stop at an appropriate time, but ?? I think ?? well, okay, lots of true stuff has been mentioned. What I would emphasise is actually figuring out focused sub sets of functionality that are actually used and useful and finding and supporting tools for those places I think is a helpful thing. Doing the ?? doing all the complex stuff for all purposes obviously creates problems. And well, okay, what I am seeing is very basic ?? there are very basic stuff that we are using like just the simple set objects and route objects, and well, okay, dealing with them in context of RPSL expressions I think is fairly easy and pretty useful and easy tools for that can be derived from IRRToolSet and evolved, and put as modules into more complex policy tools, and well, okay, the policy and configuration tools, well, okay, need to split information like Randy indicated, between what's public and what's private, but also what is local matter and what is ?? how many I dealing, how do I implement what I want to do as service on the outside internally and well, okay, that should not be in a public language. Doing the full configurations out of policy defined in augnum seems to be very little used and extremely hard to maintain and well, okay, we got actually help from the university to do our next generation configuration and well, okay, policy definition thing, and dropped in evaluation modules for the bulk data from RPSL and well, okay, actually a lot of bulk RPSL data that we have at RIPE now, I would expect to essentially start to my great into RPKI because the certification that Randy mentioned is an essential part there and well, okay, certification authorisation is not done well in ops. There is something called RPSS, it is better to have it implemented but well, okay, it's not really security and not even state of the art for the 1990s.
NICK HILLIARD: Just there is one point for this. I think if we are going to write some sort of toolset, whatever language subset and whatever feature subset you know we would choose to go ahead with, IRRToolSet is absolutely the wrong point to start out from.
AUDIENCE SPEAKER: DANIEL: What Randy has just said, apart from the very last part was very, very useful. But I think, again, if we want to get, manage the complexity, we need two things: We need to configure a whole lot of routers inside an AS and we have to coordinate between ASs and I respect that most of us don't want to publish their full routing policy, and I also agree that RPKI and Roy as go along are very useful. But I think we need to explore what kind of coordination and publication we need to manage the inter AS complexity and I don't think the one point where I don't agree with Randy is like I can't publish my routing policy, so a publication mechanism for anything is not useful. That's what I heard and I don't think that's right. So, we'll have to have some way of telling each other what we do enough so that we can manage the inter AS complexity.
CHAIR: What you intend to do definitely. Just by having a router you can look at it ?? have access to a lot of information. It would be nice to have a way to check that what you see is what the person intended to have you see.
CHAIR: So, moving on. Thank you very much Nick, I think you'll have a further discussion on this.
RANDY BUSH: Just want to give credit to Nate Cushman who started working on this and didn't do anything for a couple of years and then finally tossed it to us.
You will remember back oh ten years ago we published a bunch of stuff to say don't do route flap damping because it does over damping Hoare plea. Essentially topology amplifies flap and so in a rich top rolling you are dead because we kill what we are jokingly call mice as well as the elephants. So, we still have ?? but we still have elephants, we still have prefixes, Geoff's presentation in I say tan bull showing horribly performing prefixes, flapping like mad, continually. So is there a minimal change that can address this issue?
Okay, here we see the mice versus the elephants. It's pretty clear, you know, 1,000 /?PBT of the prefixes, 10% of the updates. 3% of the prefixes, 36% of the updates. Okay. There are elephants in the room. So, there are two current techniques for trying to keep BGP noise down. One is MRAI which tends to operate in a very small time window. And route flap damping. The problem is today's route flap damping gets the mice along with the elephants. So, can we raise the suppressed threshold and save the mice, get some good churn reduction and just raise the threshold game is trivial, as far as code and configuration goes.
So we have a measurement set up here. So, we have a switch where we mirror, so we can record all BGP updates. We have a bunch of peers, this is in Dallas, Equinix peers, some of them full peers. NTT. Going into router 0. Week of measurement. A little more detail is there is there is modified code in router 0. Essentially, three modifications. It doesn't actually damp. The penalty is still assigned to the routes, so we can measure them and there is a very high maximum penalty instead of the current one that's in all the routers of 12,000. So it's raised to 100,000 or so. So we retrieve the damping counters about every 4 to 5 minutes. Here is what we look at. Here is the churn. Okay. Here is what would happen in the blue with the default. Watch out that is a logrhythmic scale and that is a valid logrhythmic scale, and so you would cut from 525, you'd get down to about 247. Unfortunately, you would get your cutting at this point on the curve, and you have got a lot of mice still living right here. Okay. Over the week's period, 14% of the prefixes in the Internet would have been damped with today's default parameters. Those are the dying mice. If we raise it had to 12k, or 15k, look at the drop in the curve and we'll go into more detail on this right here. Which is the number of damped prefixes during the week instead of that, if we just got down here, you'd be getting the elephants and not the mice. That's 6,000. This is K. So if you raise the threshold to 6k, 8k ?? this is the inverse, which is the percent of the churn, the number of announcements, okay, that you would reduce compared to no flap damping. Okay. So, again, you get a tasty win in this space.
So, we then said, well, this is kind of a funny configuration. Do these conclusions hold in other places in the Internet topology? I was in Africa, if I was in some big exchange point, so we took route views data for a number of interesting places and exchange points in Africa, the ten biggest ?? the people who peer with route views cod the biggest number of peers, the highest out degree, and we timed ?? Geoff get to the microphone if you are shaking your head ?? and we timed them in as if they were being announced. So essentially simulated other places in the topology.
GEOFF HUSTON: I am not going to argue with the methodology, I was going to argue with the outcome afterwards.
RANDY BUSH: Okay. So, RFD settings are too aggressive. It's turned off. Just do one little thing: Raise the max. to 50k and then we can tune our parameters. The maximum router is currently 12k.
GEOFF HUSTON: So, this strikes me as a weird answer to the problem insofar as route flap damping always struck me as a blind man with a hammer. What you are doing ??
RANDY BUSH: You are not doing anything to analyse the quality of the routes, why they are there, etc..
GEOFF HUSTON: Correct. Wouldn't it be nice to actually say, look, I am not going to smack things on the head, I just want to know what the final answer is. I don't care about this crap that sort of gets there. Tell me where the answer is.
RANDY BUSH: MRAI attempts to attack that.
GEOFF HUSTON: That doesn't because MRII ??
RANDY BUSH: And it's poorly implemented. There is two parts of that problem as we went to look at it. One is to actually retain the data in the routing structure. If we could analyse what the quality, in other words, what's causing that piece of noise? If we had that actually retaining that in the data structure, that's expensive and complex. I am asking for a major change.
Number two is, I have been unable to actually see, for instance, path hunting. I have been unable to differentiate path hunting in a complex topology from noise across the complex topology given the different MRAI implementations and settings from vendors and operators.
GEOFF HUSTON: So what we see right now, in response, is that there are around 19,000 prefixes a day that typically deliver 100% of your updates, that's being constant for some years. I don't know why the bad boy room has only 90,000 places but the bad boy room has 90,000 slots.
RANDY BUSH: Some of them are girls.
GEOFF HUSTON: I don't think so. They better behave. And what you actually see when you look down deeper at the microscope at a prefix is that its instability is tightly coupled in time. That it's not noisy all day. It's noisy for a small amount of the day. Now ??
RANDY BUSH: I am not seeing that. I am seeing that sometimes and not others. I think we have work to do.
GEOFF HUSTON: I suspect so. Okay. Yeah.
RANDY BUSH: And it's fun. Other questions?
Okay. I think we are out of time, yes?
So, those who want I will bladder on to a second short presentation. And you can stay or go.
CHAIR: We are going to run over. If you really,really desperate for coffee, go. But if you have ever configure BGP routers, I would advise you to stay.
RANDY BUSH: So, this is more work by the usual suspects. So, we assume that the BGP decision process is costly. It's taking a hot of time in our routers. We kind of have the intuition that as we go down the tie breaking, the further down we have to go to make a decision given an update, the more resources we spend. So, do we know where the decisions are actually being made? How many people even know how many BGP announcements they are seeing per minute, hour or day in their network? Right. Okay. So we said, it sounds interesting. So, we asked my good friend /KAEU you are to hack. And he made a nice hill hack to add ?? this is Cisco IOS running on a 72 hundred, show I BB D P internal ?? what he does is he keeps a table of the stuff, and I'll show it to you. Then the minute the Internet draft deadline came, some internal Cisco copy cats took this simple thing which we need operationally, and turned it into a very complex protocol proposal, which I suggest we all ignore.
Okay, just to give you an idea. There are going to be two set ups. One is at the Westin in ?? Seattle also connects to the Internet Exchange in Seattle. This is the router under test, okay, note the difference is this one has two connections to NTT. So therefore, there are MEDs. So, we are list engine to the MEDs. So we are going to see heavy MED decisions. Just to show you, here is the neighbour counts, who cares? This is prefix bundle comes in. How many attributes are different? Not where the decision is, this is just a gross measurement of which attributes are changing. Okay. And this says I got new. Okay. Here we go, MEDs, this is ?? remember the Westin, so I have got two facing things. A bunch of AS paths. So this just gives us a feel for what attributes are changing. Now, imagine what would happen if, when ?? and that ?? an update came in, we went through the decision making process and when the decision was made here, a counter was incremented. And that's what we have here. And unfortunately Exel got me. This is a massively big number, we'll get into it.
So this says where the decision is being made. This one says the decision that was made, the update in question was a new best path. And there is a third invisible column here which is this, minus this, which is the decision, was made but some of the existing best path beat it.
So, now, remember this is not an interesting place in the topology. The purpose of my talk here is to show you a measurement technique that I am hoping we all will start to use. Because it will tell us what's going on in our network.
So, this horribly big number, 60% of the updates is disgusting, and it's because the two links to NTT, deterministic MED was enabled. There is a BGP scanner that goes through the table every ten minutes because it might have made a mistake. There might be bad data in the table. So, every ten minutes we go around and we clean up the table. And we make all those decisions all over again and that counter gets bumped at least once for every net that's evaluated. But, if we look, we will see that we are spending 60% of our decision time there and it never actually affects the result. So, maybe we don't want to do that. Okay.
I can understand why. I'll tell you an anecdote which is: We always have this five nines things against the Telcos. The 5ESS switch that went so wonderful, it's got five nines etc., half the code in it is data integrity cleanup because of the denormalised data splattered all around it and if you turn that cleanup code off, the five nines goes to a crash in a single digit number of hours. Okay. So ?? what's interesting is, our data doesn't look like it needs cleaning up. So, that's got to go.
Here is the other setup. Much simpler, sprint out one side and NTT out the other. The communities are changing madly. A lot of new stuff. Communities, AS path, some MEDs, but big community changes. The actual decisions, most are being made on AS path. Most of those AS path changes cause a new best path. So, tentative conclusions:
This needs to be run in a richer topology. Currently the code is only on a 7200. 7200s don't live well in rich topologies. So other imimagination might be coming. I had hoped that we will see this implemented by the significant router vendors in all production code and we all can actually see how our networks are making decisions and we can adjust our policies so maybe they spend less of our money making unnecessary decisions. If anybody can run this for us, we'd really like the data as routing researchers, and of course we hope you will run it for yourselves.
I want to make one side point which is I cut out a third presentation. That's the end of this one by the way I think.
I had a third presentation, we decided not to try to run. As you can see we are a little out of time and one of the things we found, I don't know if you know Albert Greenberg and Shakaman's paper from about 2000, where they did very deep analysis of where OSPF was spending its money, was it in the protocol? Was it installing at the RIB? Was it the SPF calculation, which we all thought, just like we all think it's the BGP calculation? Or was it putting it in the fib? And 80% of the cost was moving it into the FIB after the decision was made. We have got some BGP measurements we did for something else and we actually discovered in these routers they are spending most of their time moving it to the FIB. So, worrying about the BGP calculation may not really ?? this may be a ?? what's the ?? but anyway, a false ?? red herring ?? good to have a local boy to fix American idioms.
That's it. Thank you.
CHAIR: So. With that last formal item on the agenda is AOB. I don't know if anyone wants to keep going at this point. I guess not. So thank you very much for being here. I hope you enjoyed the session as much as I did and coffee is out there. Thank you.
LIVE CAPTIONING BY MARY McKEON RPR
DOYLE COURT REPORTERS LTD, DUBLIN, IRELAND.