PLENARY SESSION ?? NOVEMBER 19, 2010 ?? 9:00 A.M.:
SHANE KERR: Hello. Good morning everyone. Welcome to the last day of this RIPE meeting. We are going to start off this morning with a Plenary Session, and first up is going to be Geoff Huston from APNIC. Welcome.
GEOFF HUSTON: Good morning everyone. I assume it was a really good social, because very few of you have made it through this morning. It must very, very early.
This morning, a talk about what I have been doing through the year or some little bit of it. Looking at IPv6 and rather than looking at the traffice in IPv6 and the domain names in IPv6 and this and the that in IPv6, I thought I'd have a look at the bits in between, the holes in IPv6. Look at actually what happens when you aim your router at addresses that shouldn't exist. So, this little bit of presentation on what I call background radiation in IPv6.
Most traffic is wanted, you know, it comes because someone did a SYN, is here is a SYN?AK and an AK back and so on and so forth. Most traffic it two?way interaction. There is a certain amount of traffic that's completely unsolicited and obviously unanswered. Because this is traffic that actually comes at you or anyone else for no particular reason because it's part of some other thing. Predominantly, what we see, or what we suspect is that this kind of orphan traffic, the traffic that nobody asked for is actually probes and scans. In other words, traffic that's slightly hostile.
Of course it's not everything is hostile. It's also from hosts that are badly configured. Someone that has dropped the wrong address somewhere and it's actually trying to start a conversation but it can't because the address is wrong. And sometimes you see bad traffic because either stuff is leaking out from private networks that whatever they thought, or maybe you have configured your own laptop in a private domain, take it out into the public, static addresses, all of a sudden you are leaking traffic from that old private domain.
There is this constant level of this kind of traffic in the network. This kind of background radiation. The next job is to try and find it. This is the first such rig in an astronomical context but it's a ripper, isn't it? This is how you discover cosmic background radiation and after digging this up, we thought this is an ideal model for the APNIC research labs. I can see myself in there pronouncing and looking for traffic from that.
So, we have been doing this work in IPv4 looking for toxic background traffic, there is this feeling actually that IPv4 is heavily polluted. So, this happened earlier this year when APNIC was given network 1, and after some bad experiences when the RIPE NCC, for us, set up as part of their Bogon testing, set up a route for 188.8.131.52 and instantly saturated their probe link. We thought maybe we should study this a bit more, so with some good folk at YouTube and at Merit and at NTT actually, we did an extensive test of advertising all of network 1. And interestingly, there is a lot of traffic. For those of you up the front you can see, it's about 160 megabits a second continuously directed at that /8. And there are these sub?second bursts of up to 850 megabits a second. That just seem to happen as a short sharp peak. And when we looked really, really hard, oddly enough, we didn't find a lot of malice inside that. Most of it was actually RTP directed at that just that one address. And it actually comes from demented phones, the number you are dialling, does knots exist. And that's a huge amount of traffic in network 1.
If that's network 1, what's really normal in v4? So, since then we have been testing every single /8 we have received from IANA, and actually putting it through the grinder for a week and actually looking at what's out there and network was abnormal. It was mutant. Other traffic or other /8s exhibit a different behaviour. There is a couple of signatures down there of what we see and interestingly enough, the first thing is that this kind of background radiation isn't constant. It has its sleep cycle. Folk wake up, emit background radiation, go home and turn it off. This is bizarre.
The other thing about this is there is more TCP than UDP. So there is an awful lot of souless folk out there sending since into the eater, hello, is there anyone out there. What is going on. So this is not always UDP, this is a huge amount of SYN traffic.
The other way of looking at this is actually rather than looking at it by time, I am now taking the entire week and looking at each /16. So on the left is 14. 0 /16. 14.1 /16. All the way up to 14.255 /14. It's a log scale. So when you move up every gradation is a factor of 10. The red is the average amount of traffic for the week. Who has an address in the low half of their /8? In other words, the first two bits in v4 are something?dot?something with that second digit is between 0 and 128? Come on. Right now, who has one of those addresses? Come on, half the room should have. Hello, who has a low address? Losers. You knew that was coming.
Because weirdly, if you are in the low half on any /8, you will get at least eight times more background traffic than the high half in every single /8 in v4. It's bizarre.
Here is 223. Exactly the same signature. The low half has more crap in than the high half by around, you know, a factor of five or so. So if you are in the high half, 20 kilobits a second on a /16. If you are in the low half, at least 100 kilobits per second every /16. Configure gets the prize for the best virus we have seen so far. It's a ripper. All of this stuff is conficker. It's the most virulent we have seen so far. A huge amount of shit in confifth err. So what we are actually seeing is that low half of shareholder /8 cops around 24,000 packets per second. And the only reason why it's in the low half is that someone got the algorithm wrong in than random address selection and bit 9 is always 0. Dam. They never bothered correcting it either which is weird. I think it's the same old confifth err. There is this just one version which is virulent and will not die.
I compare now the low half and the high half. So here is this diurnal thing, confifth err is because it's in some kind of day /night cycle. Folk turn it off. Machines keep getting reinvented and that's why in the low half there is this huge amount of TCP, these are all since going to port 445, going hello, SMB? Answer. Are you there? On the high half, which is that other end, it's all UDP. The red is the total. The green is the traffic type. It's all UDP. So if you are in the low half, you are confifth err ridden. Pretty much all toxic. Here is the traffic breakdown by TCP ports. Most of the stuff in the low half is port 445. In the high half there is almost no port 445. The rest of these ports. Look at what they are: The MS server confifth err. Port 1433, SQL server. That's slammer. Port 22 it's folk probing for SSH. Etc., etc.. viruses are fashionable. Have you ever noticed that? They are fashionable. A year ago, if you looked at port 9415 you'd find nothing but now it's Facebook. There is some virus wandering around trying to infect your Facebook profile that sits around on port 4915. All of this stuff, what we are seeing is that pro dominantly all that probing traffic is not accidental. It's maliscious. It's actually trying to discover vulnerabilities.
So how much traffic? Oh, about 5 1/2 gigs a second. So per /14 about everyone second or so. That's not true. Remember the those people in the low half, 5 packets a second. All those people in the high half, a packet every 2 or 3 seconds. So, when you next get an address from RIPE, take a high number.
There is a very heavy tail too. Some addresses are incredibly bad. So bad that we have decided in APNIC that when a /24 gets in excess of 10 kilobits a second continuous, we are actually holding it back at this point. Is for each /8 there are somewhere between 4 and about 30 /24s in the /8s we have been getting that are just rabidly toxic. The hottest point is network 184.108.40.206, it does give a 100 megabits a second and it's all RTP. It's ghastly. That's about the largest address we have seen so far.
We have always been told the mantra in the RIRs, got to do v6. It's everywhere. So if you are looking for crap in the network where else to look except in v6. We turned other attention to v6 and said v4 glows in the dark with five and a half gigs of this stuff, what about v6. How much does that glow in the dark?
So, I have seen people trying a /48 with this, mat forward did some work on a /48. That's for children. Let's go big. A year ago somebody announced a /3 in v6. Two months later somebody noticed. We can do this too. So, we have got a /12. We have been allocating from it. We have been working very, very hard and so far we have allocated 1.6%. So, there are actually only 0.9% of that block advertised on the global network. But, you know, most of the block is empty. 98.41%. So we thought let's advertise it. Let's just advertise the covering aggregate. So, nobody noticed, but in June we advertised the /12, coming out of AS 7575. We were very nice to you. We didn't answer. We are doing some answering on some network 4 blocks so that the next time you send your unsolicited sin words to some blocks we are advertising, you'll get a SYN?AK. We are interested to see what happens. But in this one, no, we are being very nice, we are not answering.
And here is a profile. There is some traffic in 2400 /12 and it's around 5 hundred kilobits a second. Now it's a /12. It's huge. That's an all of low little amount of traffic. The average /8 in v4, which is a much smaller span of address block, gets around 20 to 30 megabits a second. This /8 ?? is /12 in v6 is doing around 500 kilobits per second. Not UDP. Not TCP. It's all ICMP. And the largest peak we saw was that tiny 2 second burst at 3 and a half megabits per second. Great. Something to look at. What is it? Well predominantly ICMP, as I said, about 600 packets a second. What is it? It's pings. Hello, is there anyone out there? No, it's dark traffic. This is not advertised space. Forget it. But also some really curious things about this traffic profile that we see. So that's a snapshot of a couple of seconds, right. There is a huge amount of 2002. Massive. And some of those are CPE because Windows doubles up the v4 address in both the high and the low bits of the two bits 2002, whereas things like air ports and other forms of CPE tend to put a Mac address down the bottom. So some of these are CPE tunnelling in 6to4 and some of these are end system 6to4. There is a huge amount of Teredo doing pings. If you actually try and capture Teredo in the wild and flush it out, you don't get anything. It's less than fractionings of a percent of v6 traffic. When you do this kind of work all of a sudden all these Teredoe zombies coming wandering out of the closet. They are not even rendevous. These are ICMP Echo requests. It's not part of the Teredo protocol. It's something else. There is almost no Unicast v6. And the other thing about this is that I am not seeing uniform radiation. Whenever I look really hard, all I see is that magic prefix, 2408.
But, there are still some really, really bizarre things. I put these up because I love them. This is an ICMP v6 destination unreachable. You can't get there from here. But I am not seeing the inbound packet. I am seeing the result. In other words, the source address isn't reachable either. So, somehow someone with a crap source address is sending a packet to a crap destination address and the message going you are really, really stuffed is going nowhere. This is the kind of nonsense you get to see in this stuff. Whoever is doing this, well done, I couldn't have thought of that.
So anyway, the profile of traffic:
I broke out that /12 into /20s and plotted them across the week and oddly enough one /20 just dominates with a day/night cycle and all the other /20s are down there in noise. More like what you'd expect. If you do the profile again. What you find is one particular /20, 2408 just dominates. But also in this spectrum you find a number other peaks. And then you go and look at the allocation table. NTT east has that block. It's allocated to them, but they are using it as private space. So what we have actually seeing is a huge amount of leakage.
Similarly, when I look at all the others I find a bunch of allocations. So the traffic that I am seeing is traffic out of so called private networks. So in v6, whatever you do in private, leaks like crazy. Why is v6 special? It's not. Whatever you do in so?called private networks, leaks like crazy. Remember that next time you set up your private network. It just leaks.
So, realistically, I see it in v6 more than anywhere else because in v6 there is no net 10 right. So you just go and get an allocation and use it in a private context. That's fine. But it leaks.
So, I am not really interested in that kind of traffic because it's not my traffic and I am not ?? that it's someone else's problem. But in the really, really dark space, the space that isn't leakage, what I do see? So, I now separate out the two. The red stuff is leakage from private allocated networks. The green stuff, yes there is green stuff, seriously, it's down there on the death line. It's right down the bottom. That's the bit that's the real background radiation in v6. If I below blow it up an awful lot what I see out of that entire /12, is less than 1 packet per second.
So, realistically there is almost no background radiation in IPv6 worth a dam. The biggest thing I actually saw across that entire week was an phenomenal peak of yes, 16 UDP packets a second; and weirdly, it happened every 24 hours, regular as clockwork for five seconds. Someone is waking up and going must send dark traffic. I'll go to sleep again. What the hell are they doing? Across an entire week you see this bing, bing. No answer of course because it's dark you idiot. So, you know, get a clue.
Across the entire profile. One packet every 36 seconds. There is almost nothing there, predominantly ICMP and a small amount of PC P and UDP. There is enough to look at every packet. So I did.
Now, you know, SYN packets. I am there. Send off, got the address wrong, send some SYN packets I am there. SYN?AK, who are you? I have got my own address wrong. I send off the SYN to someone real. They respond with a SYN?AK off into nowhere. You are not communicating. You are not communicating 7,000 times. You know. Hello. And the rest which I find really weird, this is TCP, don't forget, I can't spend packets until I give a SYN?AK exchange but I see data. Well done. Bugger this handshake business. I am just going to send packets. I am feeling good today. And who is feeling good? Someone over there at UPIN.edu is sending DNS TCP packets all by themselves. I have no idea what they are thinking about our how they manage to do this. But it's a perfectly legitimate packet in the middle of nowhere.
There is a tiny, tiny amount of TCP probing. If you look very hard you'll find it's coming out of somewhere in China. They just AK probes and actually in v4 you find this too. Because oddly enough, most systems come configured with TCP black?holing. It's part of your CCTL, whenever you get a packet in TCP words to a nonexistent local TCP port it will respond with a reset SYN packet back. If you want to see if someone is alive and responding. If you get back a reset you have actually managed to find someone on that address. This is what I am seeing in v6 in precisely that. They are actually just probing into this space, sending AKs. We'll talk about the futility of this in a little while. But I tell you, this is futile. It's going nowhere.
The next one, this is Teredo, same source address, varying destination addresses. He seems to be probing in some sort with SYNs but it's a remarkably weird way of doing it. Can't tell.
This one I like. Who is running dual stack? How do you know your v6 is well configured? You don't. Because when the v6 doesn't work, it falls back to v4, doesn't it. Yes. So this guy got his mail server completely misconfigured. So, he is sending off, he has got the wrong source address in v6 and every time he wants to send mail, he tries to get in touch with the mail server at Hurricane Electric because he is part of their tunnel configure. He goes, hello, but he's got his source address wrong. So Hurricane Electric goes SYN?AK back to me. Every time he wants to send mail I get a burst of packets. So over that week he sends 780 mails. And he sent them in v4 and he probably was thinking why is it taking so long for my outbound mail to get out? I can tell you why. You've got your v6 misconfigured. But because you are running dual stack, it really didn't matter. It just slowed you down. That's what we are seeing.
The other one too, dark DNS. I am seeing DNS queries into the eater. You are never going to get an answer. He's got this dual stack. He's configured up some DNS server in v6 but he's got the address wrong. But because he has got v4, ultimately he'll get the answer somewhere but I saw about 3,000 queries from just four people who got addresses wrong. Of course they go unnoticed because the dual stack saves you. And the rest, dark ICMP. Most of it is ping. These zombie destination unreachable which I just love. It's impossible to do this deliberately. You have really got to be working hard somewhere.
Most of the stuff that I saw is really leakage from private networks. The big message is out there: If you really think that you can take and set up a private system by routing filters alone, it's not. Private networks really aren't private. This is pretty obvious.
Folk can't type v6 addresses. They just can't. They just get it wrong. They just stuff up the typing and I reckon there is errors in the DNS that cause errors in traffic. And errors in configuring your own interface statistically, that just cause this zombie traffic. It's not intentional. You just got it wrong and you don't see it yourself because dual stack saves you. It just reverts back to running v4. Even though you have stuffed it up it's not obvious. Oh the v6 network isn't working. You can't tell. Dual stack saves you.
And is there really scanning in v6? No. In v4 almost all of the traffic that we see in dark is scanning. And interestingly, apart from Conficker, which does random address selection, almost all of the rest of the scanning is plus 1. Not minus 1. Virus writers are not that inventor. They are inventor but to a point. I have only ever seen one backwards scanner that starts at a high address and goes minus 1. All the rest go forwards.
Could you translate that to v6. Let's take a simple exercise. This is a /48 and let's say you have got a really good scanner and you can process one million addresses per second. As you see there that will take you 36 billion years, billion, and just to remind you the life of the universe, the universe is a mere, a mere 13 billion years old by comparison. If you set off your scanner at 0:0 blah, blah, blah 1, and do a +1 address scanner, good luck.
So, the futility of randomly scanning into the v6 address place is obvious. Scanning won't work.
So instead, what do you think they are going to do? They start at 1 and go up to about 100 and go to the next. Who statistically numbers their static ones in v6 at 1? Well done everyone. Really good. Don't. Pick a number larger than 1 okay. Really big. Preferably random. Use some other function. Don't start at 1 because if there is going to be scanning traffic, that's where they are going to go. They are going to walk the reverse space. So, if you numerate statistically in the reverse space, that's what's going to happen. They are going to walk the forward space. But they are not going to walk the address space because that cannot work. And so far, no one has even tried.
So, at this point, when I look at v6, Hanlon's razor comes in. I really, really, really don't see malice. But what I do find is the most astonishingly amusing stupidity. It's great.
Thanks a lot.
GEOFF HUSTON: Anyone willing to own up to being some of these people? Or any other questions for that matter?
RANDY BUSH: Let them find my router at point 1, enjoy it. But I have lots of questions but will do them off line. But I have just one that I think is a public concern. Is your observation of leakage of space intended to be private? And we have seen that in v4 too. And I just want to underline and emphsize that these proposals for new private space and so on and so forth, every one of them tries to convince us that private space will remain private and won't leak. And every time we look, we leak. And here you are seeing leakage that far exceeds anything else.
GEOFF HUSTON: Precisely right. Private space is a complete misnomer. It's as public as anywhere else. If you think routing filters will give you security, not like that. Corral that path.
AUDIENCE SPEAKER: You mentioned these 5 seconds every 24 hour bursts of UDP. Did you take a look at those. I don't think you mentioned content of those?
GEOFF HUSTON: Of course I did. It wasn't that interesting in the end. It really was just a bunch of UDP that had random source and desk port. All kinds of weirdness happens.
ROB SEASTROM: While you rightly observed the futility of scanning the entire IPv6 address space, some narrowly scoped scans scans in places where you believe autoconfigured devices with certain Mac address ranges with certain security holes in them might be a lot more fruitful. And I wonder how far down into that possibility you might have drilled?
GEOFF HUSTON: In this case, simply because I am announcing the gaps around everyone else and I am actually trying hard not to look at leakage from private, because I still have some ethics left, not many but a couple, I really was looking into the absolute dark space.
Now, if someone is clueful enough to actually figure out that blind scan probing of +1 doesn't work, then they are actually going to target allocated space, aren't they? I would have thought so. So, in this experiment I really wouldn't have seen it. But I tell you what, if you are you running a prefix and you are running services on a v6 prefix, there is a small enough traffic that it's a useful exercise to set up your own dump of all the bits that you are not using and look at the incoming traffic for all the other addresses to see what's sort of circulating close to where you have got announced services.
AUDIENCE SPEAKER: While I agree with you, the allocation is something that happens at multiple different layers and that space is, in fact, allocated in the IANA sense of the word. So, if it is ?? if it's a virus that had a scanning behaviour that was supposed to be good long term but only looking for certain Mac address EUI ?? sorry, OUIs, that that's something that you might see even in space that you at APNIC had not allocated.
GEOFF HUSTON: So far we haven't seen that.
SHANE KERR: Thank you very much. Next presentation, a content provider's experiment.
JOHANNES ENDRES: Hi, Johannes Endres and senior editor for CT magazine in Germany and from Heise also. We did a dual stack sometime ago and I'll tell you how we made it happen, why we made it happen and what happened to us on the way.
So, who is Heise?Online anyway? We are a major German IT site. Major in a sense that the German term for /dotting aside, to bring it down by setting a link to it is to Heise it, okay. We run you, we run background articles, how?tos, best practices. We host on load of online tools that are valuable and we have a very active community. So there is a load of user?generated contents on our sited that community of course is technology savvy because we are an IT site. And they tend to complain to us if anything doesn't work. If they can't read Heise online with LINKS on plan 9 they complain.
And it usually, of course we don't have a hundred percent up time. It usually takes about ten minutes from Heise having a problem until we see the first e?mail complaints arriving. So we are pretty sure if there is a problem, we'll hear about that.
And of course as any major site, we are ad funded.
So, why interested in IPv6 anyway if you are a regular content provider, we wouldn't feel any pressure to go IPv6 now. It doesn't hurt us yet to run IPv4 only, because as we all know, the near future will be dual stack on the axis side, on the network side and to there is no immediate pressure on us to go dual stack. But of course there is reasons that hold for many content providers like learn about how to do IPv6 before there is a lot of traffic there. Debug your sites, find the address of dependent parts on our site and things like that. And of of course 15 years after the first RFC covering it IPv6, you could still be an early adapter. And Heise, is among the small number of people who really have an IPv6 business case, because we make money out of telling people how to do IPv6. We inform them about that. So, we lose street credit if we tell them about IPv6 and we don't do it ourselves. So for us there is a certain pressure to go IPv6, to do IPv6.
And among the things we are doing to inform people about IPv6 is the annual German IPv6 congress, the call for presentations is still open and up until Monday. There is a personal reason to go with IPv6 too. On that congress, and there is a presentation by Gert Dorring every time and he keeps telling us to go IPv6 and I want him to talk about something more interesting next year.
So, how did we do it? That's Heise Online before IPv6. We have a server farm, we have a load balancer, we are connected to IPv4 obviously and we have clients coming in over IPv4. Since we expected something like one or two percent of IPv6 traffic, we didn't upgrade the load balancer. That would have been expensive because the upgrade costs a lot. And that would have been a firmware upgrade but the admins looked at other people doing that firmware upgrade and it is buggey, so we decided not to spend money on something that's buggey. We did this instead. We put an Apache in proxy mode and connected that with an IPv6 address. And to run 1 or 2% of traffic, that's absolutely okay.
And that's everyone else, we first called that 6.heise.de. We started in 2009. We saw about 1% of traffic and we saw a sharp drop in traffic the second day because we posted that address on our main site and as soon as that posting fell off the first site, traffic seized because nobody knew about the address. That's the obvious thing that will happen if you do 6.example.com or whatever. And because that proxy had to do writing, not everything on www.6.heise.de worked over 6. We had about a million pages with editorial contents and there are absolute links in there. So we had to rewrite those to not throw everybody back to the IPv4 as soon as they click a link. And that breaks things. So we had to decide to either fix that and invest serious money into 6, or to go dual stack which we wanted to go anyway.
This is where management comes in and we had to convince them. So, we had an advantage there, Heise is a family?owned business, so we only had to convince a very small number of persons, two actually. They saw that there was no immediate ROI on that but they were used to that. You know, they were used to investing into infrastructure, to keep Heise running, there is no problem there. And it's a low budget project anywhere.
What they were concerned about is ad blocking. Your use content provider in Germany sees about 5% of users having ad blockers in place. Our users, if know a little bit more about computers, so we have 25 to 30 percent of people using ad?blockers which breaks our business model. Anything that might raise that amount is a problem for management. And as things are, IPv6 works as an ad?blocker. IPv6 works only as an ad?blocker because the ads come from different servers which are not under our control and they come in by IPv4 only. If you look at Heise, IPv6 only, not dual stack, on the client side but IPv6 only you don't see the ads.
But we could convince management that anyone who is going IPv6 only right now on the client side has an ad?blocker in place anyway. I wouldn't click on an ad or whatever. That was not the problem. The more important problem was broken dual stack access.
As opposed to what Lorenzo says for Google, for us time?out is not the major concern, because people want to read Heise Online. Who did read Heise Online today? Who did yesterday? Who did the day before? Okay. I could come back to that. People suffer in silence. And they know enough to fix their side of the problem. Or if there is a time?out, we can still tell them what to do. If they come in some seconds later, you can still tell them how to fix the problem. Our problem, our major problem are people we cut off entirely. And that is ?? that includes people with Mac OS computers behind black holes, because that's a time?out of 75 seconds and that's cut off. Nobody waits that long. And I think there is another problem that's the IOS 4, it's not the IOS you are dealing with, it's the iPhone OS. And I have seen it, it doesn't fallback to IPv4 at all. I have only seen it once so if anyone here has an iPhone, please ask me after the session.
So, why didn't we do our own measurements? There are several reasons. The first one is there is already excellent research done into that. And the main part of doing that research is to interpret the data, to learn what those data mean. And we didn't want to recreate that EIX in our house that wouldn't have been too useful for us given the other reasons.
The second being that we had implement problems. We run a commercial CMS. It's either costly because we have to talk to the CMS company or takes a long time for our web developers because it's kind of complicated.
The other thing is that we have to change our logging system in the server cluster because that is hand knitted, and to aggregate it out, it drops part of it early, which we would have needed to see the difference between dual stack.
The next reason is that management could not have interpreted the number. Total set that the sites he is working with will go dual stack as soon as they're on the long tail of the Mac OS problem. But how is management to decide that they are on the long tail? Actually I didn't want to discuss with them about 0.0 1% or 0.2% or anything like that.
And the last reason we want to tell people about IPv6. We want to tell people about IPv6 problems. So we are interested to know why things break. And the number alone would not have told us that. So we decided to do something else, we decided to just try it for a day.
And we did inform our users about a month before by postings, in the news stream, by how?to articles, how to identify the problem, how to switch off IPv6 if they had a problem. We did press releases and we had to count down timer on our main page counting down to the day. We communicated an e?mail address to complain to if anything went wrong on that day and we set up an additional virtual host on the patchy machine already running 6.heise.de.
So, this is what our site looked like about a month before the day. I think there is no way to miss the IPv6 logo over there. And you also see the ?? you also see the count down timer.
What did we do on the day? It was Thursday September 16, 2010. We choose a Thursday on the basis of our regular weekly traffic distribution. We just add the the AAAA record and the DNS TTL was minimal already before, if you do your own test you might have to reduce it before for switching off AAAA in the evening. And then we monitored the reaction on the net. Like Twitters, blogs, or any place where people might complain of site.
And we had sites that discuss Heise Online off?site. We watched those too and we waited for e?mail reports coming in. What did we see? That looks good. We didn't have any off?site complaints. We had something more than 2,000 comments on our site. Some of them containing problem reports but these obviously are not too relevant because people could reach the site to post their comment. We had 0 calls to our hotline and from the e?mails we identified four dual stack problems. Well, actually on the day. We were off by one as we learned later, but okay that's still five problems we identified. Given that we had about a million wizards that day, five problems is not in the 0.something percent range where it's useful to calculate that at all. In the evening we considered briefly to forget to switch it off. As we try not to lie to our users and we said we would switch it off, we did switch it off. But what we did later, I'll tell you in the next slide.
As we learned from those a little bit more than 1,000 postings, we made one major mistake. That was calling the whole thing the IPv6 day. Our audience considered themselves geeks or nerds but they think that IPv6 still is something for the supergeeks and supernerds and they are not interested. As soon as there is anything technical like IPv6 in that, maybe even AAAA, they are not interested and they actively ignored that. We saw that from the postings. People saying, well, you could have told us in advance. You saw the slide. We did. But they actively ignored the information. So, a choice of something like more generic like the Future Internet Day, Internet2day, The Next Internet Day, something like that I think would have been the better choice.
So, with this experience, we went dual stack two weeks later in production. We started silently and we had posted that on our news some days later and we had very positive reactions. We had lots of cheering crowds who really liked it.
This was the fun part. Let's look at the problem reports or the reports we got via e?mail. The largest part of those were just supportive. That's okay. Everything works with me. Go on like that. Second part were people who had different problems. Most of those were people who tried 6.heise.de on the IPv6 day who were just confused. They didn't have IPv6 access obviously and they didn't have an idea of what was going on anyway. I really love the one I exchanged four e?mails with to find out what the problem was and he told me, well, I had a problem the day before. Okay.
So what remains are 25% of maybe dual stack problems, okay. Those are problems that are not identified to be other problems. Let's look at that distribution. Of course the largest part is Mac OS problems. There is broken CPE. There is some things we couldn't identify and I tend to tell those they are not problems. Because those are people who didn't reply to our e?mail questions. We have people who know a little bit about computers, so they can ask them for their IP configurations, for their routing table, for a traceroute, via 4 and 6. That's useful information. They are people that didn't answer to that. These are people, I think, who tried to reach 6.heise.de and didn't dare tell us. There is this problem with an ISP proxy there and there is a peering problem. That's why I am here actually.
So a little surprise here with the Mac OS problem. It's not the 6to4 bug that Tore sees a lot in his measurements. I think that's a space situation in Germany because we have a 70% market share of one CPE manufacturer. And some OEM products and these didn't do 6to4 for you on our test day. Okay. Actually, in the evening, they release ad firmware update.
So, in Germany, even Mac users used those devices that don't spark the 6to4 problem that Tore sees a lot. What we saw, all those problems were the Rogue RA problems and they came from university campus problems.
Now the broken CPE problem. There is one product line by one manufacturer that cuts you off dual stack sites. And actually it's a broken DNS cache in the device. It handles every DNS reply as an A record. It puts it in the cache as an A record. So if you do A for a name first ?? an AAAA for a name first and A afterwards, you get the first four bites of the AAAA record, which usually isn't the correct answer. We knew about that. We knew that would be coming because we saw this mistake in our review of those routers. We told everybody don't buy those devices, you'll have problems later. They bought the devices, they had problems later. Those are ancient devices. We saw that problem in our review for the last time in 2006. But, they are already broken from the start so obviously they don't break again and there is no need to buy a new one.
Now, for the mobile ISP, that's in Spain, a major European mobile carrier. He has a transparent proxy in place to reduce the amount of data to send and that proxy has broken dual stack. It sends the sites not reachable, back to the mobile device. The user who told us about that opened a ticket. I haven't heard from that ticket anything since then. Let's wait and see.
Now, for the last problem, I only show the problem distribution up to now, these are are the actual numbers. These are the number of error reports we got. Taking away those two that are possibly not real dual stack numbers, these are 11 identified problems. But, let's have a look at the number of people affected by the problems we were told about. It's this: Most of the reports were about one person having a problem. But the peering problem affected 80 people there and it's pure luck that it was only 80 people. What happened?
A company buys IPv6 connectivity from a major European carrier. It's in their business offering to do IPv6 for them. He couldn't reach Heise, he did a tracer out words to us and he saw that IPv6 traffic would not leave the AS of that carrier. He opened a ticket and he waited and he waited and he waited. And on the second day he asked us for help. And we tried to find somebody to contact, who is on the v6 providers mailing list might remember my cry for help who to contact and we brought in an someone of DE?CIX who might know somebody. We asked around. In the end we got through to a person responsible for the router configuration. So he fixed it within ten minutes. Some minutes to do the traceroute, open the ticket at the start of the problem and some to fix in the end and nine days in between. So, I think I don't have to comment on that. I do anyway.
Obviously, that provider didn't have a process in place to escalate the ticket to the correct person for IPv6. And that's unbearable.
At that point I said, well, interesting, that kind of thing happened. I was slightly annoyed and that feeling grew stronger when I arrived here on Wednesday, because we are blackholed again. If anybody, that's why I asked, who read Heise Online on Wednesday, Thursday, Friday? There is a time?out problem for you. You are reading it anyway, you are suffering in silence, only two people told me. But there is a blackhole close to the network we are in. Most of you will know better than I do what exactly happened. I heard there is talk about this incident. And it's been going on for IPv6 for three days now, okay. For IPv4 it was fixed in hours and this, for us, is the worst case. It's a blackhole close to us. It affects almost all IPv6 user apart from Gert Dorring, who is, I think, is doing direct contact with PlusLine. Which is about 2% of our readers. We don't have to talk about .001% if such an incident cuts off 2% of our users. And this is not something to look at in a monthly average, you know. If they don't reach us at that moment, we have a 2% problem. And as I said about IOS 4 in the iPhone, this cuts us off almost all devices with an Apple logo on that, and I'll tell you why that's especially bad for us.
This is the reason. It's the iPad. For the whole publishing company, which we are a part of, the iPad is the hope for the future. We see declining sales in printed papers and every publishing company is going towards to the iPad to sell our contents there. You know the Ap Store model and that's exactly what we are going for, the whole industry, to have people buy our contents for the iPad. And that has to be instantaneous. It doesn't work if somebody goes to the store and says I'd like to read CT or maybe in three days or maybe in nine days when v6 is fixed. And I think that that is really what will happen because the iPad will have IOS 4 shortly; and if I am right, if it's right what I saw for the iPhone, it will choke on blackholes entirely. It will not fallback to IPv4. What's even worse, you can't switch off IPv6 on the device. So, there is no way to fix that on the client side.
So, we have a system of our server and we have the device and if you continue to ?? and we need a reliable connection between these. And if we continue to see those blackhole problems and we can't fix them by switching off on the device, we might have to do something we really don't want to do, but we have to have a reliable connection and if IPv6 continued to be ?? continues to be as unreliable as we have seen in the last seven weeks, we will have to go back to v4, which I don't like and which we'd have to explain to our readers who are using IPv6 already. But this is where all the fun stops, where we have to make our business case with the iPad.
Okay. Now, going dual stack. It was easy to do on the technical side. It's needed somebody in the organisation to push it. Our worst problem we are experiencing are routing blackholes actually up to now, that's what we are seeing in the seven weeks we are doing this. And the mantra of us all is offer the same service over IPv4 and IPv6, we tried to do that in every part that's under our control directly and we see that to the reader, we are currently not doing this.
Okay. Thank you.
LORENZO: Two comments. On the peering by calling issue, I think that's not acceptable. Your transit providers need to be aware if such an outage affected v4 for three days you'd be taking your business elsewhere. So actually I do have a suggestion. Can you put your money where your mouth is and tell your existing transit provider that you are leaving? There may be people in this room that can offer you real v6 service that does not have this problem? I am very, very serious. We can not fix these problems unless the transit networks that are providing this service understand that this is unacceptable.
So, there are people in this room that have good v6 connectivity. I suggest you talk to them. This will be the most effective thing that you will ever do to fix this problem.
Number two, in our experience, I know going back to your problem reports, I heard people in the room laugh when you said, okay, six people e?mailed us, one person e?mailed us about this. In our experience, or at least for the numbers as good as we can make the numbers about brokenness, we, in our experience, the ratio of people who have problems and people who e?mail us is about 1,000 to 1 and so, yes, it's nice ?? I mean, we have actually seen this in real networks, that we have enabled Google over IPv6 for and we got a handful of e?mails. But according to our best data there were thousands of users affected. So, take that with a grain of salt.
JOHANNES ENDRES: I think that the ratio is a little bit better for us. It's certainly a high number of people who suffer in silence. And that's why we are not turning this into percentages. Okay. This is just to give you an impression. It's not numbers we can use to calculate how much people really have problems. As with the peering problems you know. I didn't count how many people were affected right now.
GEOFF HUSTON: Thank you, that was very interesting. I was wondering, as a server, there is kind of two things you can do to minimise some problems that may occur in the path. One is to drop the NTU subtly below 1,500. 1280 is an extreme but quite safe setting but generally low enough that you are not facing the whole ICMP issue. And the second one was to actually do an outbound 200216 relay so that your reverse traffic to 6to4 effectively tunnels to v4 right the way through. Did you do either of these in your experiment? And secondly, if you didn't, is it a routing blackhole or an MTU blackhole?
JOHANNES ENDRES: It's really a routing blackhole. So, we didn't change the network configuration as for MTUs or anything. We just went on and I don't ?? as we saw the problems, what we can deduct from the IP configurations and everything, I don't think that we saw MTU problems, okay. And after all, the routing blackholes are our main concerns currently.
As for the gateway, as I understand it, that's the opposite side for Teredo 6to4 to give them better connectivity to our side, correct?
GEOFF HUSTON: When they come at you with a 2002 ?? if you have a local relay, it just goes straight back, yes.
JOHANNES ENDRES: We didn't ?? we considered it and we didn't for several reasons. First, 6to4 and Teredo I think are are dubbed the IPv6 of last chance. And we didn't want to put things up that shouldn't be there because it's only there to come around real problems, okay. The other reason, as I said, we almost didn't see a 6to4 problem there. So, it wouldn't have helped us with the problems we really heard about.
GEOFF HUSTON: Thank you.
GERT DOERING: Publicly, thanks for letting me talk you into this, and I am sort of feeling embarrassed with you coming here and then running full blast into a blackhole problem.
Answering to Lorenzo actually, they do that, they put their money where their mouth is and they have a regional provider that's providing proper v6 service but they connect to a certain international provider that's using 6PE or MPLS and they have no valid monitoring of everything. So we only notice if customers yell at them and the chain of things is a bit slow if everybody is here and nobody from this specific provider is here. And this is a real shame and we are customers of them as well. We had breakage v6 from here too. My net on network on Wednesday morning and they went about and rebooted random routers and then the v6 came back.
So yes, there needs to be serious monitoring of things or turn off v6. Right now the routing layers advertising the path and the packet there is just dropping. This is completely unacceptable. If routing would have said this path is broken, BGP would have routed around it, but this way it's just annoying.
AUDIENCE SPEAKER: We'll make sure we complain to these people. And they are peers of ours and we will tell them that it is not acceptable to reach their customers through broken networks.
A comment to the earlier 6to4. Geoff, I think that really installing 6to4 relays it the job of their transit provider. I don't think that they can be expected, as the customer of a hosting company, to have to install their own 6to4 relays, it's a maintenance burden. It's a reliability issue if there is no proper coverage. And it's operational issues that they are not stuffed staff or the expertise to deal with. Remember what we are trying to do is get to the ?? add the force 9 to the v6 Internet, right. We cannot expect people to do ?? to have, you know, to use half?baked solutions. It has to be dealt with by their ISP.
Installing of local relay, if you are a content provider and you don't operate a backbone is something that you don't have expertise with, right, that's my point.
AUDIENCE SPEAKER: You said on an earlier slide that you thought that the Mac OS bug you were seeing wasn't the same as the the one I am seeing. It sure did look the same as the one I am seeing. It's from most universal networks and enterprises, where there are shared layer 2 with Rogue RAs, and those Rogue RAs, from what I am seeing are following the 6to4 prefixes which are tickling the Mac OS bugs. Where are you being told of Rogue RAs were not to 6to4 prefixings or other invalid prefixes?
JOHANNES ENDRES: No, what we saw was Mac OS with no global IP address at all, not 6to4, not nothing else but trying to reach our global address with their local address as a source, the link local address as a source, so we looked into the IP configuration and it didn't have a 6to4 address.
AUDIENCE SPEAKER: That's interesting.
WILFRED WOEBER: Wilifred Woeber one of the local IPv6 back home. First of all, I really sort of, in the public, I would like to say thank you for this presentation, not only for the test and for the experiment, but thank you for the presentation. I am taking quite a few bits of information back home to consider when we expand our IPv6 thing.
Coming back to Lorenzo's thing about, and also Gert's thing about the close?by transit or backbone or upstream provider, that's correct what was said, but my line of thinking is, even if you get your end fixed, and even if you happen to select another ISP upstream provider or backbone provider, if we, at the other end, are connected to a broken one, it wouldn't solve your problem. We would just sort of get half of it maybe under control or we just may shift the thing. Because it is not one blob; and if you replace that blob with another one everyone is happy, it's just a meshed network. The message I am taking back home so there is a lot of things to do which should be our responsibility to make the IPv6, as a whole more reliable, because you can do some stuff but if we are shuffling, if we are dragging our feet, it's not going to fly on a global scale. I see Geoff getting up and wanting to react to that.
GEOFF HUSTON: Yes, I think at this point if you are relying on the kind of strangers to help your IPv6 packets, you are introducing more dependences than you need. If I was a server today, running dual stack, the observation is that there is still an awful lot of 6to4 out there, and I would be tempted to run a 6to4 interface on my server that frankly does then encapsulation locally and sends out the v4 packet which was what I was referring about Lorenzo, because that removes one more dependence from someone else. And what you are trying to do inside all of this exercise is to minimise the risks of externalities destroying end?to?end and that's the same comment that you are saying. Whatever I can do as a server at this point, do it.
LORENZO: If you are a transit network or pass packets through from somebody else, putting 6to4 relays inside your network can be risks because they a will you people to tunnel through and do things like spoof net ten addresses. It really is, it has some security implications, but yes putting 6to4 interfaces on servers, that's something that I would like to do, but my security people are telling me that this is a list of all the things that...
GERT DOERING: 6to4 on the servers is somewhat risky and you need to know what you are doing. If you are sending out the encapsulated packets with the UNICASTS normal address of the server, they will not pass the firewalls that the open the return protocol 41 packet only if the source destination and protocol pupil matches in the opposite direction on the way back. So you need to source the addresses from 1929881 in order for that to work. And if you do that, then your upstream provider will ?? might just filter that as that not part of your range. You really need to check if you are doing this right if you want to do it on the server.
MARCO: Just the really short answer, yes or no. Would you recommend going to other content providers who are watching this to do this or would you say no?
JOHANNES ENDRES: I am sorry I can't answer that in yes?or?no. As I see it, we are a medium sized content provider. We have our own infrastructure, which is we have the service under our control. We can do that proxy thing ourselves. Okay. On the other hand, we are not using a CDN or we are not doing DNS load balancing which is the other side that may be problems on the scale of technical size, okay. That's one.
The other one is, as I said, people suffer in silence, here in the room, to read Heise Online. Okay. Other content providers might be in a different situation, if you are one of 60 providers providing essentially the same service, you can't do it I think. So, there is more than yes or no. Basically, I'd recommend it, but look what you are doing. Okay.
I have one question to you. If anybody is here with an iPhone, I'd like to see what happens if it tries to reach heise.de with Safari while we are still blackholed. I am not quite sure if it really breaks or falls back. Thank you.
SHANE KERR: Okay. That was great. I really appreciated that. So, for our last presentation, we are going to move away from IPv6 I think. I hope that's okay with someone. If not, you know, please bear with us while we try to do something other than IPv6.
STEFANO PREVIDI: Good morning, and welcome to Rome. My name is Stefano Previdi. I work for Cisco and I hope you enjoyed Rome as much as I did the last eight years now since I live here.
So the scope of the presentation is not focused on v6. Focused on content and network infrastructure. It started by a simple problem that has been asked by both service provider and content providers, how can we optimise both sides of the pictures in terms of optimizing what we used to call user experience from a content provider perspective, and how we optimise from an infrastructure provider and network provider, a service provider, how do either mitigate the impact of certain content, either be sure that you use the operate amount of resources in the infrastructure.
So, in parallel to this, there was an activity in a forum called P4P and later this activity has been shifted to IETF the Alto Working Group, because a very similar problem was experienced by service provider regarding P2P traffic. It may not be the same situation today, but at that moment, I am talking about a couple of years ago, it was the case. And the idea of Alto Working Group in IETF, was to define an API that would have been used between service providers and content provider in the sense peer to peer community. So that the peer selection in P2P clouds would have been enriched by topology derived information, whatever this would have meant at some point.
So, this is very similar to the problem we were addressing in the content, I would say, private content provider space, and why do we need this? From peer to peer prospective, it was quite easy to understand, if I connect with B torn cloud, I want to download any piece of content, the selection could have been half random and select a peer from Rome to Australia while another peer maybe was available close by and that's it. So of this incurred bad transit traffic for service service providers.
So from a content provider perspective, the problem is mostly, if not the same, very similar. How to select a given service, how to select a given cache or stream or whatever, based on different criterias. The location, the IP address, the profile, the connectivity type of the user requesting this service content. How to do this? How to implement a mechanism that will help both service provider to optimise his resource and the content provider to get better service from the infrastructure? Well, Alto Working Group in IETF doesn't work on the specification of the mechanics that would help this. It's Alto scope. So it's an implementation aspect and in our implementation, what we have done is that we leveraged the routing information you have on LINKS state and BGP databases plus some other components that I will mention later.
So, I have more slides than time, so I will maybe probably skip some slides and, but you can download the whole presentation for your use. So this is the basic picture of the architecture. On the top you have the application layer. At the bottom you have the infrastructure. And in the middle you have what we call the natural position, which is an Alto server that collect topology derived or topology based information from different sources. One of those sources is the set of routing protocols. So we collect a databases.
We also have geolocation API to collect geographical coordinates information. Plugs the state and performance of the network itself. Here we are not talking about applications. And so the IETF API, what is being discussed in Alto Working Group, will allow the different application components to request and obtain a service and so let's see what this kind of service could be.
I am just skipping this because I think it's trivial. What I am going to talk about is three different use cases: The CDN, because this is what we have done in collaboration with content providers and service providers, here in Europe and in the US. We collaborated with a few of them on the content networking context, Cloud Centric networking. Again something that is under development and how to integrate those technologies in the cloud computing, the cloud networking context. And also the peer to peer, because from I would say from historical reasons, it deserves to be mentioned even if now maybe it's less a priority for not all but most of the people I am talking to.
So, how does it work. Well the service is quite simple. It's a simple request reply model. Very simple packet types. The request is based on end point addresses. Obviously there are plenty of extensions that are coming out. But for the moment the service is very simple. You know a certain set of IP addresses. You want to localize them. So understand where they are and what is the distance between them and distance may have different meanings. May have the meaning of routing distance. Geographical distance. Policy distance and so on. So, if you take the picture here from the left to the right. The left is the user that connects to the CDN, to the content network, requests, I don't know, a movie, and the CDN has to redirect the user to the closer streamer and so the CDN will send a request with the IP addresses of the stream err and the user, with the user, the Alto sever will reply saying from the user perspective, this is the best list of locations.
And then the CDN may or may not take this into account. So, the job of the Alto, of the MPLS stops with sending back the reply. It does ?? there are no methods that enforce the use of the information. It's something that is outside the scope of Alto and MPLS.
So we have two?way in the architecture there are two ways of operating this. One is by embedding the Alto API inside the content ?? for example, the CDN inside a server that will proxy the service to its clients. This is most of the cases for CDNs, where you have controlled boxes somewhere that do the control functions for the CDN and so among the control functions, you will also have and you do also have today, the Alto NPS service delivery. So the application client is completely unaware about the interaction between the CDN and the network layer in terms of transactions. And just get the normal ?? just operate as usual. Another way of operating this the is operation client itself has the Alto clients, so the application client will be able to request the service directly. We are working with a mobile operator because the context is slightly different from a CDN says. The mobile operator would like to have the Alto client inside a mobile device directly for some applications that you have on your mobile phone, I don't know, your Facebook account or your YouTube or whatever, in order to be able to select your best entry or exit point.
So, here a very basic example on what has been deployed with a couple of content and service providers. So you have on the top, the contents network, so with streamers and a box that does the control functions of those streamers. A user that requests the content. At this point the CDN obviously knows where the content is located. But before redirecting, it takes into account different aspects. You have the load of the different streamers. I don't know, the current open TCP sessions with the streamers and so on and also the CDN wants to know exactly the hoax of the user, the location of the streamers, what is from a routing perspective, from an infrastructure perspective, what is the closest streamers? Once this is known, then the CDN takes all the different components and decides which stream err to redirect the user. So this is a very basic example of what has been done in the past two years, in order to optimise the redirection of user without having to heavily configure the CDN by installing fights files with zillions of IP addresses and giving streamers, but just taking the dynamicity of the infrastructure, so leveraging the routing state of the network.
The same here, I'll go through this quite rapidly, because it doesn't add much. But in the initial work of this technology back in 2008, the target was peer to peer. And in fact, the picture on the bottom right of the slide explain why did we start to work on this? Because the transit traffic between service provider was quite high for peer to peer while with the more intelligent peer selection you could have reduced transit traffic and so optimized hopefully both sides of the context, the peer to peer side and the infrastructure side.
Another example here is the service provider. Again, the transit link is what caused most and so this is where you want to save traffic. So if you have something, a mechanism that allows you to prevent transit traffic without obviously putting a price on it, so without impacting the performance of the user experience, this is exactly what you want to do.
The cloud computing ?? or the Cloud Centric networking, it's also another interesting case. You have different data centres spread out on the network and at some point you have the resources request, and you want to locate those resources and once you locate the resources, you want also to be sure you point the requester to the closest resource and by closest you mean the one that has the most sense for the user location, the user profile the user connectivity and so on.
So, again, we are now developing the concept of cloud VPNs where you have the three pieces of the architecture. First, the cloud specific, finding the resources, where is the resources? The NPS Alto component that tells you what is the best location for that user and that resource and finally, the provisioning, obviously, of the, I would say, the VPN mechanics underneath that would give you the connectivity to the resources that you have requested.
So, how does it work? As I said in the beginning, it's not in the scope of Alto IETF to standardise the mechanics through which you are able to compute the location and the distance between end points and what are the mechanism? This is more an implementation thing because there are no real needs of interpretability between implementation. But what we can say is that obviously our approach is to leverage what you have in the infrastructure, in the network layer, plus all those policies that you usually have on service provider networks, especially if you think about BGP.
By introducing the concept of policies, obviously in BGP one of the most powerful tools for implementing policies are BGP communities, and I don't know a single service provider not using BGP communities. So, leveraging this was a kind of a natural thing to add to the implementation.
How to rank addresses, how to determine that A is closer than B than to C. Well, we have developed a few algorithms based and derived from what you have already in the routing layer, because the choice you are going to make, or the choice you are going to suggest for a given ranking should reflect what is going to happen at the end of the day. So the packet will be routed and forwarded according to the routing layer so cannot stay too much away from the routing decisions.
One of the key requirements was confidentiality. So, you don't want to leave your BGP database up to the content provider especially if the content provicer is peer to peer cloud. The opposite is true you may not want to leak application specific information, who is doing what, down to the network layer. And so the confidentiality is mostly, I would say, granted by the architecture because you have just a transactional model where you request some ranking, you get some ranking but you don't get the topology information or any sort of topology database.
So, how to reconstruct the network top rolling? Well, again, we have developed a set of algorithms and mechanics that allows you to recompute the topology of a set of networks, because now one service provider can represent one autonomous system but in many cases one server provider represents many autonomous systems and so you have thousands routing boundaries in the network layer. I mean, even inside and autonomous system you have an IGP that can be deployed in multiple areas or levels, so you have those boundaries you have to take into account if you want to reconstruct the full view of the topology. So you have to go a graph of the whole network. Those mechanisms exists. This is not really rocket science, it's something that's derived from the well known routing algorithm that you have in the others.
So, very simple example here. If you have a user on the left side that wants to access content that is located in different places, well, if you rely on a server that gives you the location, the Alto service inside one autonomous system, it will be able to locate and compute distance between the user and the different location inside the autonomous system and for the locations that are outside the computation will stop at the exit point, which in many cases is very good, because one service provider may want to optimise traffic flows inside the service provider doesn't care about outside. So if the computation ends at the exit point, it may be just enough. We have practical case where is it is like this.
Now if, you want to compute distance between end points that are spread out across different providers and even different service ?? sorry, different autonomous system and service providers, you need some collaboration between Alto server or NPS servers and we do have all the technology of doing this. So we have implement and deployed what we call the inter Alto communication or the inter NPS communication and so that you will be able either to share information in order to compute the end?to?end path and understand more precisely where you are, how far are you from the location you want to reach, or in a way which is most, I would say, used today, you want to redirect requests. So if you think that another server will have a better view of the topology than yourself, the server would redirect into just a redirect of the request.
Now, this obviously, it's more a deployment issue. It's not really a technology issue. It's how far autonomous system and service provider are willing to share information about their topology and if they agree to share topology information, which is already the case because you share your BGP databases somehow, but you may want to share more details of it and you may want to control which level of details you want to share. I doubt the service provider wants to share the internal details about how many links, what are the loopback address of the I BGP routers and that, it doesn't make a lot of sense for this purpose. It introduces a notion of grouping.
So you want to group prefixes and you have kind of natural way of grouping prefixes, which is done by considering summary addresses and so on. But those are really linked to the routing technology itself or you have also a more arbitrary way of grouping addresses which is also used very widely today, which is BGP communities. You can assign a BGP community to a set of prefixes regardless which class, which address family or whatever they belong to, because the BGP community doesn't need to reflect anything from the routing layer.
So, an example here is what has been deployed on a service provider that owns multiple tonne must say system. When a content requests come to the CDN, the CDN issues a request to the NPS server, to the Alto server and if theality so server considers that he can't compute correctly the distance between all the addresses, so the user addresses and the streamers or the caches location addresses, well he will just redirect to another server, the other server will then reply and the user can get the content from whatever the CDN tells him. This is an example where, in fact, there is a sharing of information between the two autonomous systems but very minimal. So just enough to understand to which other servers you want to redirect the request. Not the content of the topology.
One of the very first requirements we had at the initial tests we had with a couple of providers was obviously policies. I know that if I ask my ISIS link state topology or my BGP topology, the topology will tell me if I am located in POP 1 well the closest will be POP 2 because POP 3 has some routing costs which is higher, so ISIS will only tell me that POP 2 is close are than POP 3. But from policy reason I don't want to do this. So I need another way of kind of forcing or overriding the decision of the routing layer.
Again, what we have done is, a mechanism that works in two parts. One, the first part is grouping addresses and we leverage BGP communities because it's a tool which is already widely deployed and in most of the cases, at least in most of the service providers I met, you already have a BGP community numbering scheme which reflects the location of the prefix, in some cases not complete. In some cases it hasn't about the granularity you would expect but you already have something that goes in that direction. But the point is to come from the left side of the slide to the right side of the slide. So you want to take, as an input, the full detail topology of your network and go to the right side where you might have group of prefixes with a kind of a cost matrix in between them so. You know, from the prefixes in group is, what is the cost to reach the prefix in group 2. Obviously taking into consideration the direction because traffic may flow on different directions.
So in the implementation we have something that works like this. So if you have a community scheme that represents somehow the location, well, addresses that share the same community will be considered as co?located, and the different communities have a cost from one to the other that can be overridden. So either you let the NPS server, the Alto server to dynamically do the cost between groups or you can override this by configuration commands.
Obviously the goal of the grouping method is obviously to scale, to give a visibility with a granularity that represents what the application needs to know. The application doesn't need to know a router's ?? P routers loopback addresses or link addresses. The application needs to know what is the cost from going from one place to another place. The cost is given by the network layer. It represents the routing costs. It represents the state cost, performance or even the monetary cost. What are the links you want to use for a certain type of traffic and so on.
So, the two components are groups and the cost between the groups.
And the idea is to come to this. Once we have, and since we do have it now, the mechanics that allows you to virtualise your topology, so you can define an obstruction level where the topology is represented, well you can do it in multiple time and multiple different fashion so that at one application you may give a certain visibility and to another application, you may give another visibility. Not everybody needs to see the full network topology. So, this allows you to manage different cases for different applications.
And I will conclude with what we are working at. So, all I presented so far is what has been implemented and initially deployed. We have now to deploy both sides of the of the ocean I would say, the US and Europe. What we are working at is to extent the NPS computation, to take into account network resources utilisation, network performance, geolocation with integrating the geolocation to the policy system that a service provider may want to operate. And layer 2 topologies, in some cases there are applications, I am thinking more about the clouds computing case, where you really want to know the exact topology. In that case you really want to go into the details. Up to the switchboard, that connects the given server on the data centre and so, luckily for us, we have now ISIS extensions for layer 2 that may be leveraged as well and so we are working on those extensions.
Obviously the support of all existing past, present and future address family, so before v6, before VPN v4 VPN v6, whatever, we are completely agnostic on the packet caller that you have on the wire.
I hope I didn't go too fast. I mean, if you have any questions you want to no more, just ping me or send me an e?mail and i'll be able.
So just as a summary. The way we see today NPS Alto initially deployed is mostly for service provider that wants to deliver the service to content providers. In some cases, the content provider and the service provider is, in fact, the same company. So I thought initially that this would have made the things much simpler, in fact, it's not always the case. There are still good isolation between network and applications. In some cases you have a content provider by leverage the infrastructure off another service provider, so two different companies. And they want to interoperate because the content provider wants to operate its costs and using only what he needs to use and the service provider wants to reduce the impact on the resources.
So, today, the implementation delivers a ranking service. So you sent a bunch of IP addresses and the server replies with the same bunch of IP addresses but ranks it by distance, whatever distance means from policy and routing definition. Obviously it is shipped and deployed.
We have the two ways of computing the ranking. There is a dynamic, by default, which of computing the distance and the location between addresses but you also have this policy based system that allows you to leverage your existing policies, especially if you think about BGP community. And obviously we are extending this to other information sources.
The service runs on a dedicated platform but we also start to work on an implementation in the router itself. So the NPS or Alto server will be a function inside a router. This helps in the way that you could enable the service mostly everywhere in the network. And the client is supposed to be embedded in the application, so whatever any element of the application, it could be a control box or it could be on an application client directly.
And I will, think, stop here, so I will try not to exceed too much on the time slot.
RICHARD BARNES: I have just a couple of questions for you. Yesterday in the DNS Working Group we talked about Alto as an example of one way that reverse DNS might be used for discovery and you have discussed in your presentation a couple of scenarios where different organisations are using different Alto servers. I was just wondering, in the deployments you are seeing, these early deployments, how are the consumers of this Alto information finding the Alto servers they are getting it from? Are they using dynamic discovery or is it all manually configured?
SPEAKER: Networks at this stage the deployments we are doing and seeing are on the CDN case, so a CDN operated by either the service provider or a separated company. The way it is done, the interaction between the CDN and the NPS Alto server is done by a control box in the CDN itself. So, you only have a kind of a static configuration on where is the NPS and there is no need for having any dynamic at this stage. Obviously, when it comes to having an NPS client and an Alto client embedded into a mobile device or web browser or whatever, those mechanisms currently discussed between ?? in Alto Working Group would obviously be useful. DNS seems the most trivial solution to me, but there are a possibility of many others, like, Anycast or whatever.
RICHARD BARNES: It would also be necessary if you wanted to have a content provider that wanted to use Alto servers from any different networks; not necessarily be pre?configured, but in any case.
The other question I had is you mentioned a few times in your presentation the use of geolocation information in addition to topological information for this ranking of information services. I was just wondering if you could say a little bit more about how that information might feed into these algorithms?
SPEAKER: It's a very specific problem that most of the large providers have. It's off net addresses. Usually in a large service provider you have obviously the perfect visibility of your own set of addresses. But the addresses of your friendly competitors are coming with a very strongly aggregated prefixes. So if you rely on your own BGP database to understand the location of your peers addresses, well, that won't work that much. So one way is is to collaborate between providers and give more information. But maybe not very practical. The geolocation ?? I mean, we had it in the implementation so that's why I mention it. The geolocation gives you a hint on where is on earth the address? In most of the cases, it does the job in the sense that okay, if you tell me that the address is in Rome, it already helps me, because for me the address could be even in Sydney, so, if you till me it's in Rome already it may be helpful. The idea is to correlate this with the local BGP information that you have on the Alto server itself. I mean, in our implementation the Alto server stores a private copy of the BGP table and we kind of hack it in the sense that we add more information on it. That is derived from geolocation is one example, and from, also, state information, link utilisation and so in other words, it tries to combine or translate the geolocation information into a BGP community numbering scheme that will tell you the prefix or part of the prefix where it is located. We are quite at the beginning I would say of this part.
SHANE KERR: Okay everyone, that concludes the first Plenary Session this morning. So we'll take a short coffee break and then come back here for the Closing Plenary session. Thank you.
LIVE CAPTIONING BY MARY McKEON RPR
DOYLE COURT REPORTERS LTD, DUBLIN, IRELAND.