Archives
PLENARY SESSION ON THE 16TH OF NOVEMBER, 2010, AT 11 A.M.:
CHAIR: Welcome back everybody. Sorry for the short delay. That's what you get when you have a Working Group Chair trying to sort of run a Plenary Session while standing out there organising another Working Group. We have a couple of I think fairly interesting talks now. The first speaker, let me introduce John Quarterman. He's been doing, well everything related to IP protocols, IP research since about 20 years, so he is one of the really wise guys, and wants to, well let us take part in some of the recent research about transparency. Welcome.
JOHN QUARTERMAN: Thank you. First RIPE meeting I went to had like a dozen people. This one is bigger.
Today, I am representing a research project funded by the National Science Foundation in the US operating out of the University of Texas at Austin. And it's actually all described in that figure on the right. I guess we are done now.
We all know, this is what we'd like to have. You know, we connect to the net. We get e?mail and other services. And software vendors get paid for selling the software and, you know, it just works. Everybody wants to use this for one thing or another, banks really want to send statements, Professors communicate with students, and so on. And the fly in the ointment is the vulnerabilities that come with the software, because there is a whole criminal economy that's built up, you know, exploit writers, write exploits with vulnerabilities, sell them to bot herders and sell bots of tens of thousands of bots, and rent those to spamers who soon spam and the few users actually pay for the products through the spam that funds the whole thing. And the real problem is indicated by a notice on the left. There is the technical layers, and on the right there is the criminal economy. The root of the problem has to do with with that. We are trying to solve this with technical solution when it's really an economic and organisational problem. That's what this talk is about.
The bad guys, they are in for the money, they make profit. The good guys, the people doing Internet security and IT security in general, they are mostly considered as a cost centre, something to minimise. And not only that, the organisations that are paying people doing security, they are mostly going it alone. Secure this organisation, secure this computer while the bad guys are all cooperation because they make money doing it. So how do we change this?
We have already tried some things to deal with this. Block lists, for example, where spam traps collect spam which were then used to collect lists of IP addresses put in a block list that some electronic mail servers provide use to block spam. That works pretty well but doesn't really solve the problem. The law of course has existed longer than the Internet and is now involved in the Internet and it does some stuff, but the thing about the block list is 90% of e?mail is still spam, according to [ENISIS] survey last year, and it's a stand?off. Meanwhile there is a lot of effort, you know, resources, time, people, equipment going into blocking that spam, and the existence of a spam is a routing trust and e?mail that banks and businesses want to be able to trust. And law enforcement, it's great if you can finally arrest a spam gang but it doesn't happen very often. There is not a lot of funding for it and usually another BotNet pops up.
What we have, going by that great profit, Scott Adams, is a confusopoly, because not only is this going on, but I have tried asking around at multiple conferences, I have been to NANOG in Atlanta lately, [Metrocon] in DC and Internet2 had a conference, a little workshop invitation only in St. Louis, asking questions like which organisation send the most spam? I don't mean intentionally, I mean coming out of BotNets that that organisation? Who knows? Who can answer that question? Exactly. Nobody knows. Which means that the organisations where that's happening, they are going to keep it quiet because they don't want it to be known, they are not trying to let spam out. But, if they are, they don't want people to know it so the users, buyers, investors can't really distinguish between the organisations that are and the organisations that aren't.
But we know. This is the answer worldwide for one particular one?month period. And it's probably not who you would have guessed. There is kind of a common pattern here, don't get your hopes up just because your network isn't one of these. There is a common pattern to this particular lot which is the top ten worldwide, according to volume ?? we are getting some custom data from the block list, that's not just list of IP addresses. It's also number of spam messages seen from each of the IP addresses. So that's what we mean by volume. And then we are ?? you know attribute can the IP addresses to the ASNs that they belong to. And you will notice these are mostly national or state scale networks, many of them actually owned by governments. Which means not only ?? they don't have any competition. You will also notice that a lot of them are in countries where a lot of the software, a lot of operating systems, particularly Windows, is stolen, which means they don't sign up for the automatic updates. So, you know, networks that may be ?? they have no competition, may be somewhat clueless and a lot of parroted unpatched software.
What about North America? Now, some of the names here you would guess. But what order? And you can't guess and be right along because it changes over time. It can change rather rapidly. This slide is actually slightly misleading because that's not just volume, that's volume normalised by dividing by address space per ASN. That particular graph, the one that ?? that one, is level 3. The other ones you probably never would have heard of. The unnormalised version looks like Comcast shows up about four times, so, I would call them the winner. But, even that changes over time. And the reason for doing this kind of graph normalised, is some of the big guys don't really like it that they show up in the top ten fairly consistently if you don't normalise. If you do normalise, then you are emphasising tiny ones, this is where we could use more interaction with the organisations involved for the evolving different rankings.
But the point here is, what if everybody knew where the spam is coming from? A lot of customers would not want to be associated with spam havens. Now, remember, we are not just talking ISPs here; we are talking about any organisation that sends electronic mail that be can be a service provider and any such organisation is a target for bot. Bot herders don't care where they put their bots, as long as they can get them there. You want to do business with a bank that's a spam haven? Probably not.
There is some potential that if this started to happen, if customers would avoid spam havens and go to companies that were clean, this could turn IT security call centres into profit centres. And the spam ESPs might start block outbound spam. What are we proposing to do about this? We are proposing to build a reputation system, reputation system organisation, you can see there, which will produce rankings similar to the ones which you have just seen. And there will be multiple rankings, normalised, non?normalised, rankings by different classes of organisations, and so forth. The drill?downs are for ?? suppose that your organisation shows up as being really spammy, you probably want to know why is that? Well, the drill?downs can provide some clues as to why that is. Because we have got a lot of information in the database. To build a reputation system, which is what we are talking about here, letting people know, you know, what the actual reputation is of organisations, there are several things that are quite useful, to be comprehensive, to cover the whole world every electric mail service provider. We can do that because the block lists cover the whole world. It needs to be frequent. Daily is probably the minimum frequency. More frequent than daily might or might not be useful. We also need longer periods so people can see how it changes over time. As accurate as possible but extreme accuracy is not really the point. Accurate enough to rank organisations, is the point. It needs to be transparent, you know, normalised by dividing by IP address space per ASN. I mean, that's simple enough that anybody can understand that. Normalised by dividing by AP address space plus number of users plus market cap, that might start to get a little confusing, and multiple rankings so you can compare different aspects, plus an overall ranking.
And once you have gotten this kind of thing running, it's also possible to build certifications to, like, certify a company as, think of them as, like, bond ratings. You get a class, AA certificates where somebody else gets class BBB certificate, and this is done by, the certification authority will look at the rankings and determine which ones, typically, over some conventional amount of time, tend to be in the same class, and then can issue certificates to the companies that are in that class. There is a lot of precedence for this kind of thing. The Financial Times ranks business schools and every business school pays attention to that. US news ranks colleges, there is even things like in the US, there is the Kelly blue book that says what price cars should be, used cars, new cars. And for certificates, bond ratings, there is companies that do that, also things like underwriters laboratory, and, you know, I just love business jargon. Reputations systems indoginise external alternatives by making ?? indoginise, I love that word. Basically, outbound spam at this point, most companies don't care, somebody else's problem, but if everybody can see which organisations are letting it out, then it becomes a reputational problem which gives each company incentive to do something about it. Indoginising the externality. This can have a broader effect than spam or BotNets. BotNets have broader security implications to start with, but reputation can produce economic advantages and even different settings. If you got a certificate that says, you know, you are class AAA, then people will maybe think that reflects on the rest of your activities. I talked about what we are specifically talking about doing a little more jargon. The purpose of the reputation system with rankings in certification is for market signals about ESPs and security, so to produce economic incentive for more effective information security.
Now, for those national telecommunications companies, I don't know if they really care about economics, but maybe they'll care about policy incentives and this all helps the users, the banks, the ISPs, the law enforcement, everybody cooperate for a more secure Internet.
But, you know, do organisations really care about that? According ENISA 2009 spam survey, yes, they do. Because some of the questions were: Is spam prevention a factor? More than half said yes. Less than a third said no. And this interprets this to mean that generally all providers consider it necessary to have effective anti?spam measures for the sake of attracting and retaining customers.
And if you look even more generally, this is from a 2007 thing by PriceWaterhouseCoopers and the economist intelligence unit. 28% of financial services executives felt that reputational risk was a significant threat. Now, what if reputation for sending outbound spam was suddenly visible? I bet even a larger percentage would think it was a significant threat. And they think more than half of the survey respondents look to risk management to contribute to improved shareholder value. What we are talking about is what's not traditionally called risk management but it is words to the same goal. Also in business jargon, it turns out chief talk is a technical term in contract theory. And, you know, everybody says they are doing good security, but what if you can actually see it by some sort of semi?objective measurement, like metrics taken from the spam block list.
Measurements that are comparable across organisations. One thing I noted at [Metrocon], a little invitation?only security workshop in DC, was that, you know, a number organisations had lots of really good metrics about exactly what security they were doing inside their organisation and how effective it was in their organisation and I asked how does that compare to your competitor? No idea. This compares everybody.
So, reputation and certification to turn cheap talk into effective communication.
How many of you ever heard of Elinor Ostrom? That's about four or five people. That's about the usual percentage. Winning a Nobel Prize is not enough to make you famous. But she does really good work that's quite related to a number of problems that have come up in this conference. Managing complexity, that's what her work is about. The commons, think about a village commons and medieval European village where lots of people graze their livestock, one traditional view of this has been, well, that's just going to end in tragedy, the same as tragedy of the commons because some people try to graze more livestock because they get more benefit. If everybody gets does that it gets over grazed. The catch to that idea is village commons existed for centuries. The people who live there found some way to work that out. One of the keys to that is everyone could see who was grazing how many cattle. Right. So, two of the solutions that are usually proposed are let the Government handle it. That really only works if the Government has perfect understanding and monitoring which costs more than most governments are willing to use. Or pure private solutionings which have similar problems unless you have got a really transparent market, they tend to end up in monopoly. What really works, she has looked at a number of things, historical and current, is hybrids, and one of the key features to make this work, to make management by the users themselves which is a key part of that, to make that work you need transparency, everybody needs to know how everybody else is doing. In other words, a reputation system.
So, Elinor Ostrom's work provides a theoretical basis for why this should work. And you can build more on top of this. You can take the certificates or the rankings and use that in addition to service level agreements to turn that into self?insurance with independent external audit of the certificates or rankings to say that you are actually reaching certain levels of free service level, for example, that you have a class AA certificate. And, organisations that are doing that, and by wait they say organisations don't have to be ISPs, they could be banks. I mean banks have sort of a service level agreement in that we are going to be available every day, so you can do your banking. So, organisations that have done this, used certificates to turn SLAs into self?insurance, could resell that to insurers who could sell that as an insurance policy, or insurers could roll their own insurance policies based on certificates and rankings. And when insurers do that, they tend to also have requirements that go along with the policies, you know, in order to get the policy first you have to agree to have you know certain kinds of security procedures. Just as in most any large hall, you'll find that there is some kind of fire prevention measure because you have to have that in order to get fire insurance.
Here is more business jargon. That reduces moral hazard. Moral hazard is where you might buy the insurance and then burn the building down in order to get the insurance. Well, if you get the sprinkler system in there, that's less likely to happen. And adjustments of the insurance could also use the drill?downs from the reputation system. This is basically what I just said.
So, we are proposing three new organisational levels. The basic reputation system, which is the rankings and the certificates. The SLAs, the self?insurance and the actual insurance policys. You could count them as four, if you like, whatever. The beauty of this is you don't have to do them all at once. What we are going to do first is just start rolling out rankings and the others can build on top of that.
The figure on the right. If you look at the RIPE labs, there is two papers we have put in there recently thanks to Mirjam, which spelt this all out in more detail than we are talking about here and that figures in there, so you can see how they all fit together.
But the next question is: We claim this work. How do we know it works? Well, that has to do with this guy, Leon Festinger, whom I never heard of until last week. Social comparison theory. It turns out there is 50 years of social science theory and experiment that demonstrate, amongst other things, that people really do care how they are doing compared to similar people and they will change their behaviour if they know they compare to people that they consider similar. Now it turns out this also works online, the Chen ?? the BA 2002 paper looked at eBay and eBay have reputations of sellers, they really do work. The Chen 2010 paper has to do with movie rankings and it works not only with individuals. It also works with organisations. The free 2010 paper, the that's more about this in the actual references in the paper on RIPE labs. Look for Internet experiments. Stephen Free works for an outfit called Secunia, which collects data on software as running in the field and which software has the most unpatched vulnerabilities and it turns out since they have been publishing this kind of stuff, it caused some companies that turned out pretty bad on that reputational system to change their behaviour.
So, how does this apply to what we are talking about? We'll need to do rankings by organisation types. So each organisation can be ranked with its peers. I mean, a bank in Belgium probably doesn't care how it compares to a co?lo centre in the US for example. But banks in Belgium probably care how they compare to banks in Belgium and could he low centres probably care about ?? I don't know, there is a lot of people from co?lo centres, do you care? Well, you don't yet. But will you when these rankings come out?
Okay. So, how do we actually conduct experiments to find out if it works. We don't want to do the usual social science thing of, here we tell you this part works for the control group but we are actually feeding you fake data. That wouldn't be good for what we are doing. That would not be a good idea. Rolling out multiple rankings is going to take time anyway, so ?? and please note these are just examples. I don't know that Belgium and the Netherlands are actually comparable. I just picked them out of the hat as vaguely similarly sized countries, I have probably made everyone from Belgium and the Netherlands mad by doing that ?? yes, they are angry ?? now I have done it.
We can roll out the rankings for one country as in make them public first while we actually can see the rankings for both countries internally and we can see if there are any changes going on within the rankings that are made public that are not matched by similar changes in the once that are not yet public. Then we can roll out the other country and dot same kind of thing. Plus, remember, there are multiple rankings that we are going to be rolling out. So we can do this more than once for each country. Like, for the whole country, for banks in the country and so forth.
And speaking of Belgium, since I have made everybody in Belgium mad at me anyway, this is the actual top ten rankings for Belgium for October 2010, who is sending the most spam? By sending, I mean probably it's BotNets. It is mostly BotNets but these are the organisations letting most outbound spam out. And by the way, this leads to a couple of other questions like, Uganda Telecom, okay, I didn't know Uganda was part of Belgium. Actually, I asked Mirjam about this and she inquired at the RIPE NCC, there is also one that's in the Congo, which is even more obvious colonial history. But Uganda Telecom is apparently because AfriNIC didn't exist yet so they registered through RIPE. Now, this doesn't quite explain why they didn't register as Uganda.
And those are ?? that's one reason that we would like organisational participation in these questions, in these experiments and questions, because we'd like to tune these rankings. I mean, it's all in a relational database. We can just say okay not that one or yes this one, but on what basis do we decide that? Colonial history? Network topology? Headquarters location? Phase of the moon? How should we decide that? We can just roll out multiple rankings and see which ones people like the most, but it would be nice to rationalise this in some way to that makes more sense than that.
And there are also other kinds of experiments that we can do. The normalisation by ASN size number by IP addresses that actually is partly motivated by some significant ISPs that are already saying we don't like the peer volume ones because we come up in the top ten all the time. We can do both. And another kind of experiment is an organisation that's on the rankings, they can just change what they are doing for information security and see if it changes their rankings. You don't have to construct a detailed chain of this affects this, this affects this. Just change something and see if it affects the rankings. See what information security actually has an effect. And the reputation system organisation provides drill?downs which, as I mentioned, can give clues as to which organisations rank as they do. Combined that with the one about changing information security and you might get a feedback loop. Then, a more long?term thing is pricing correlations with rankings or certificate changes. According to the business theory, these kinds of reputation systems should eventually have effect on prices of the services, the thing is there is just so many different variables and you need to watch for a long time to find out. But hey, it's a research project. So, that's another thing we can work on. We are probably talking on years for that one.
Right. You know, I could talk about this forever, but I am probably over time already. We have a few minutes for questions?
CHAIR: We are a bit tight on time because we started too late. My fault sorry for that.
JOHN QUARTERMAN: Thanks to a whole bunch of people. This would not be possible about these people. Interestingly enough, the registry records of ASNs mapping to IP addresses, we have not found a single RIR where those are reliable. And to a whole flock of other organisations, particularly Mirjam in RIPE labs for publishing things, and of course to RIPE itself, which, even though it apparently doesn't exist and none of you are members, maybe you can take at least one question.
CHAIR: Thanks, John, for bringing in, well, non?technical insights into this, which is definitely helpful, since there is too much technical content in here, so we need to have an outside view. We are a bit short on time, but I think we can take one or two questions. So if one of you wants to clarify or discuss...
AUDIENCE SPEAKER: Just a couple of quick things. First, normalising by size of ASN isn't going to work when we switch to IPv6 because everyone gets the same network. I don't know if you have ideas about that.
JOHN QUARTERMAN: There is a lot of things that have been suggested by a number of users, by number of employees, and if we are comparing, for example, the fortune 500 by market camp. The problem is finding something that's available for every organisation. But, if we are going to be doing rankings per organisation type, then it only has to be something that's available for every organisation in that type.
AUDIENCE SPEAKER: Okay. That's a good, my next quick comment, which is that I don't know that you actually necessarily needed to spend so much effort defining the different categories of rankings, because once you have the raw numbers, you can make rankings all day long, right and you can slice and dice, right?
JOHN QUARTERMAN: Yes. And we will do that. We will have multiple rankings, we'll just keep adding to them. However, we need some of them to become sort of standard, think of bond ratings, they don't just make up stuff ?? okay, maybe they do... you know, there is a whole new kettle of fish for you. This thing would have to be funded in a different way than the bond rating agencies are where they are paid by the people they are rating and nobody else. You know, some of the rankings need to be become sort of standard so people will be used to looking at those as the real comparisons.
CHAIR: One last question.
AUDIENCE SPEAKER: I am George Michaelson from APNIC. I am glad you just mentioned the bond rating agencies, because one of the outcomes of the GFC is that a lot of eyeballs have gone on the notional independence of Moody's and other ratings. There is a huge collective dissatisfaction with that relationship, that, for money, they will certify anything, and clearly junk bonds were certified.
The second thing is, I am drawn to something, I think, I didn't see there, which is a parallel with countersigning company paper in debt where, as well as moral hazard, you have a financial hazard. The discount you give and the yield on that sign essentially is a reflection of your trust in the person's money, because if you think you can't ever recover on the paper, you have to take that into account. I think that class of financial risk, while less socially equitable to the model of the commons, it has that nice balancing property. You don't countersign something that's too high risk. There are other models that are similar in mental spirits that you are promoting but more aligned to incur a risk when choosing to validate your status. Interesting talk. Thank you.
JOHN QUARTERMAN: In the actual renewal there is about five different business theories that come into this. That's one I don't think we have explicitly listed, but we will now.
(Applause)
CHAIR: Thank you again. Now, let me introduce the next speaker, Bengt Gördén, from Resilience AB in Sweden, and he has been active in the networking field in north unit, S unit, Royal Institute of Technology for a couple of years now. And went into a small spin of company, Resilience, which is well focusing on making that work more resilient, which is adding complexity, but in a good way, sort of. If I may ask you to try to do this in 25 minutes, instead of the 30 we promised you.
BENGT GORDEN: I will try that. Hello. As I was introduced. I am trying to talk about open source routing today.
Just a short introduction. We focus on routing and infrastructure in this company. And we also do IP registry, but we also have some products and that is open source router, we have filtering software, we do system development for web directory services like the old ?? what sometimes was referred to as a the SUNET web catalogue. We do health care systems and small stuff. We are ??
I formerly was employed by KTH, which is the Royal Institute in Stockholm, where I worked in a network operation centre and we did the operations for SUNET and [Norunet]. We also did a lot of development for these networks. In the early days we did EBONE, if you remember that one. It's a collaboration network all over Europe. And we also did the ?? we hosted the first Internet exchange in Sweden and that was a distributed exchange. It should have been done, but it never came to that.
We also did the, what was it called? .se domain.
A few links to our projects that we have had for this open source router. I will present the links later and you can download it from the website.
The three different projects are actually, they are sponsored by II S, that is a foundation for Internet infrastructure in Sweden. And we have sponsored, been sponsored by intell for the cards that we were using for the open source router. The first project was an open source routing, how to try to forward in a PC platform in 10 gigabit per second, that is sort of interesting to do.
The second project, it emerged from the first project and that was multi?queue with multi?core CPUs, and the third project was actually to try to separate the flows and that emerged from the second project. So the findings we had in this was actually driven to next project.
The hardware we used in this is two things. Two platforms. It's an Intel?based and it's AMD?based. The first one was actually the AMD. So the first project was AMD with two dual core processers and it was a mother board on the 2915 board and the other one was a [sayon] based 5630 and that was two quad core processers. And that was a tie an also and it was a 7025 mother board. The Intel cards we used, they had the chips at 82598 and 82599 and these two cards are highly usable for forwarding because they have multi?queue, which means that they have multiple queues for RX and TX. And the differences, the main differences between these cards are that the 598 there, the first one, that has 64 receive queues and 32 transmit queues and the 599 has 128 receive and transmit queues. So it's ?? they are a little bit different and the second one draws a little bit more, more power. It's about 5 watt and the first one is 4.5 watt power consumption.
The first project was an open source routing, as I said, and this is just a schematical, how it looked. And we had the two processes. It's two notes and it has a hypotransport in between it and it's hypotransported down to the PCI bridge.
The first findings we had was that a small packet is hard to sustain those kind of rates. The bits per second here shows that we have full wide speed, about 384 bites per packet. And then it's down to 3 gigabit per second with 64 byte packets.
The packets per second seemed to level out at 4.5 mega packets per second, but you should know that this is not real forwarding. This is fan out how much it can handle. So, it's fan out from the ?? it's a Linux system and it's fan out from the Linux system there and we say it's about 3, 3 and a half mega packets per second we could hold.
And what we actually ?? to go to the next project, the second project, we found out that when we profile the kernel, with a single can CPU, we were always ?? mostly we were a long the K free, we were a lot of K free in the 14, 15% of the time in the symbol for K free and as you see, it's ?? this is how it should look. And we went to multiple CPUs, we actually spend a lot of time in the queue disc and the queue disc is when you try to feed out the packets from the system and that is we were caught up in the situation where we actually were serialising all the packets in the queue disc in a refill and we spent a lot of time in the queue disc and this is wrong. As you see here you see a disc K free for three?and?a?half percent is K?free. This should be, as in the single CPU case. So we had to do something about that.
So, the second project was sort of, it looked almost like the first one, but it's quad core and it's hypotransport and the [NUMA] nodes and stuff like that.
The multi?queue tried to separate the flows and send them to different course. And that needs multi?core CPU and it needs multi?queue on the interface s. This was the findings we had there was the transportation to this third project, so it's not very much to say about that. We found that when we patched the kernel, yes, we can do this, we need more time and we need more input for this to ?? more tests to actually go there. So, the third project was, now we come to the Intel?based system here, it's a little bit more potent for the bases, it's QPI, and I have written this wrong, it should be a QPI in between the Tylerburg chip sets there also. And we can have more cards in this platform, we can have five cards that is so we are not oversubscribing the bases, so we can have five cards with dual boards on so we actually have five times two 10 gig boards, 10 gig, so it should be able to use 100 gigabit for this PC platform. We are not really there, but we actually quite near.
And what we did in this third project, we tried to separate the control plane here and this is crucial to router. You need to have a separation to do the control plane from the forwarding plane. And the control plane consists of routing for BGP, OSPF, SSH, statistics, whatever, that is not forwarding. So, that the router isn't hurt if it's overloaded with packets, if the lines are filled. Of course there are security problems if you, if you try to bang on the SSH port, something like that. It's not very good. Anyway, this is taking care of over the NAPI system in the ?? which means that you have an interrupt, and when you link up the interrupt, you just pull the queue and when you are ready with that, you turn on the interrupt again. And that's how you do the packet receiving on Linux. But the control plane, we separated to go to CPU 0 and in this very case, we had eight CPUs, so CPU 0 was to take care of the control plane and the rest, 1 to 7, was to take care of the forwarding plane.
General forward something done on CPU 1 to CPU(N) and it should scale. The first bosses is essential. So you need QPI and PCIE second generation, and now we have the third generation coming and it's quite more potent than the second generation, I would say.
The classification on the 82599 is threefold, actually: It's RSS, that is the receive site scaling. That is specification from Microsoft, and this, I think it was a 6.0 specification. And that is you get a packet. You do a hash of it and then you steer it to a queue. It's 6 specified, if I remember correctly, it's 6 specified versions in the RSS. The different vendors that make the cards have actually done progress on that. So it's ?? for the Intel cards they have two more RSS hashes to separate the flows from and we have the [] N?/TUP else, how to programme the cards. And we are focused on the Intel cards, there are actually other cards with classification on, but we are focused on one card, and that was the Intel card, and we also focused on the driver IXDBE in Intel kernel. P2P did this. And we have the flow directory in the Linux kernel to programme for the received packet steering and that was done by the [NetDev] and we are part of the NetDev. We have a team specialised in this. Robert has done a lot of the work for this in the NetDev.
RSS can be programmed. This was actually done for the hardware in the beginning. This should be done on the hardware, but not to be programmed but it could be programmed. So, Jens [Loes] and Robert [Oltis] found out a way to fill the redirection table but skip the index 0, so what we did was, we patched the driver to skip the index 0 and fill the rest of the table so that we could just, the other queues were steered to CPU 1 to CPU 7, in our case. And that means don't use RSS for CPU 0.
And here are the facts for it. The hard limits for the platform are actually we can hold about over 90 gigabit per second for fan out of the whole system, of the whole platform. It is actually 92, 93 gigabit per second, but every time we try to measure that it's up and down but it never goes down more than 90, 91, 92. We can have the forwarding. The first one was the fan out. The forwarding is 25.8 gigabit per second forwarding. And the max. packet, as I said before, is 3.5 mega packets per second. It's quite much I would say, really, really much. But, that is for fan out, as I said. We can use some packet generation inside the box and fan out this for the cards. What we did was forwarding in each CPU, if I remember correctly now, is about 200,000 packets a second. And it's seven of those. So it should be somewhere around 1.3, 1.4 mega packets per second.
The conclusions for the projects are:
It is possible to do forwarding in gigabits per second and above on a PC platform. I would say not any PC platform. It's actually selective hardware that is crucial to say. You can't really take in ?? yeah, you know, a laptop to do this with, even if it's a PC. You need to have selected platform.
Use hardware selection on the packets. So, need to use right type of interfaces as we did, we used the ?? it should have multi?queue, very important. So you have the older 82576 for the 1 gigabit per second nicks could be used and you have these 10 gigabit per second and they are very good, and you need the flow separation to separate the control plane. So there, I should say that the IETF specification for the force, force the router for, for separating the control plane and the forwarding plane actually proven that it's working. This was all done with forwarding. What we are using is, for routing we are using Quagga, and VRRP with a keep alive. You can ?? of course, you can use [EXORP], you could use BIRD for routing. It doesn't really matter, because it's not the forwarding that's interesting in this case.
So, there is the links again. Download, if you like to click on those ones. The first two ones are for different papers. They are not gathered, because they are presentation, so they are sent to different conferences, so, they are little bit here and there.
Okay. Did I manage to ??
CHAIR: You took far too little time. So we have lots of interesting questions for you now. Thanks for bringing up this very technical insight into how fast hardware forwarding can be done. I have a question of my own. And that's pretty much this 25 gigabit forwarding rate. Is that independent of forwarding table size or can you do it with 10 routes or can you do it with 300,000 routes? How efficient is the Linux kernel route look up these days?
BENGT GORDEN: It's independent up to the full BGP table today.
CHAIR: So what's the scaling limit of a Linux kernel regarding routing table size?
BENGT GORDEN: I would say that as we have seen, it's actually the 25.8, that seems to be on this platform, but on the Linux, we haven't been able to prove how much the drivers, how much the driver can actually push through. So we haven't had the platform to ?? we see that ?? we have a need somewhere there nor this platform.
CHAIR: So the limit is pretty much what the hard carry with put through to the interface board and not table size, number of routes and all these.
BENGT GORDEN: As we see today, yes.
CHAIR: What about IPv6.
BENGT GORDEN: IPv6 has been tested for the last project, but we have no figures to actually present there. But there was ?? the problem we have there is the forwarding mechanism for the IPv6. It should be sort of rewritten anyway.
CHAIR: It's still the kernel routing lookup and decision thing.
BENGT GORDEN: Yeah. We did see that the IPv6 wasn't a problem as far as this third project was. But, we have to investigate more about IPv6. The IPv6 is the same thing there, there is a forwarding and there is a routing and the routing should ?? isn't a huge problem but I mean, Quagga isn't ?? there is, as I said last night, there is a minefield.
CHAIR: I was actually thinking more about forwarding and raw forwarding power in the Linux.
Okay, questions from the audience please. We have actually time left so... Shane again.
BENGT GORDEN: I didn't think I was talking that fast. I am quite a slow talker.
AUDIENCE SPEAKER: Shane Kerr. Did you do any cost analysis comparison in this? Because you said it had required carefully selected hard way and things like that. I think the thing that will make this interesting is if it's cheaper than a commodity ??
BENGT GORDEN: Yeah, and the dynamics for what you can do with a router. But, the cost analysis is not very much done with ?? except that we have, we did buy all the stuff and we tried to look ?? we had quite a small budget, a limited budget, but the cards from Intel, they are in ?? they were actually, for the first project, 20,000 Swedish crowns, it should be about €2,000. And you have ?? and those were populated with S of Ps from the beginning. The second ?? the prices went down there for the cards, for the next projects. We were down to about 1.?? about 12 hundred, I think it was for the cards, the dual ports, there are single ports, but half the price. And the platform itself with the PC, with a CPUs, the first one with the AMD, we, I think we landed on 22, 23 thousand Swedish crowns, 2,200 euros, so, if we take two of those cards there and if we go with a prices today, I think you will end up with 4,000 euros. Probably you get it cheaper somewhere else, because Sweden isn't the cheapest country to buy hardware in.
The second platform is 10, 20% more expensive.
CHAIR: So, what I am hearing is you can build a 40 gig bit router with commodity hardware for 4,000 euros. I think some of the vendors should better improve on their software because the hardware is quite expensive then.
BENGT GORDEN: It's quite expensive, yes.
CHAIR: Another question or?
Okay. As far as I can see, everybody is happy with the talk, or already waiting for lunch. Thank you for coming up.
(Applause)
CHAIR: The next speaker you might have ?? ah, another question.
AUDIENCE SPEAKER: I am sorry if I am asking a question that was covered since I have a little bit distracted in the middle there. Have you looked into filtering because when you are a router you want to protect your router from all kinds of log?on attempts and so on. I understand that when you just load the IP tables module into the Linux kernel, the performance kind of gets dramatically reduced, so I was wondering if you have actually tried this?
BENGT GORDEN: Yes, we did. In the second project, on the links there you can see that when one of the graphs there or when we load IP tables ?? sorry, I don't remember the amount of rules we did have there but it wasn't significant. I would say that in our case, we had not more than a few percentage of the load. So it was ?? we were quite surprised on that. An earlier platform that was used had actually thousands of rules. I wasn't involved in that, so I can't really say anything about it, but I know it has thousands of rules and it didn't wind up on its back. So I think it's ?? the IP tables nowadays is, what would you say ?? yeah, it can handle that.
AUDIENCE SPEAKER: Did you try with a connection tracking?
BENGT GORDEN: No, no connection tracking. That could be the culprit of it, yeah, that could be a real problem.
AUDIENCE SPEAKER: Thank you.
CHAIR: Next speaker, Nicolas Fischbach. Here, you might have noticed this morning already, he is working as a network engineer for COLT since I don't know, ages, 10 years, 20 years. We have seen him on the complexity panel this morning and I think this talk is pretty much trying to get the complexity down in the COLT network and they are looking forward to it.
NICOLAS FISCHBACH: Thank you. I am actually not really in between you and lunch. And I love lunch so it's going to be difficult for me too.
You will see some of this talk is actually going to piggy?back on some of the discussion we had in the panel with Michael on complexity. And let me start with one thing. This thing about layer 2 integration. You have to embrace MPLS. If you are against this, like some people in the room it seems, forget about this and go for lunch now, because MPLS is actually the enabler that allows us to do that.
I should actually be able to fit into the half hour which is left. Explain the difference in what we call carry Internet services today and IP services. What is the driver doing this? This is not just a PowerPoint; this is a project that is started deployment internally. This isn't something we did for the fun of doing it. It's in execution mode. I will show you what we plan to do. Because I used to run security engineering for 10 years, I have a little bit of a focus on the security side of things, because you know the more you integrate the networks, the complexity goes up, you know, goes down in other places but also the security implications and your security requirements go up.
So, quickly, I think you guys all know what these are. And what the benefit is, this is what you see on the right?hand side in blue on the screen. I mean I don't want to go into these. You really know what this one is about. What I quickly want to say is a couple of words about the Ethernet services. When I say Ethernet services, in COLT's view it's carry your Ethernet services, private wire, basically think of SDH applied to Ethernet with the same constraints. 50 minute second protection time. Reserve bandwidth, over?subscription, a static path, a predictable back?up path, run through time which is static. So virtually 0 or no ?? really, something that used to have from the old school isn't services and a lot of the customers have actually accepted the move from SDH to ?? SDH to some sort of Ethernet services. People call that an MSP platform because those requirements still exist today and are maintained by the service providers. The big difference is you have no routine in such an environment. Everything is mostly static. Intelligence is not routing process inside the box, it's actually being hauled in the USS system. So all the paths, information being in the primary backup path is being computed and pushed out by the USS tool into the network. The network it virtually dumped and is there just to carry packet but not make any routing or forwarding decision. That's why the difference. What we are going to do is actually merge those two networks and you will understand that why this makes sense to us.
Today, if you look at this picture from left to right, on the right?hand side you have the customer premises, we own the fibre usually, so we have dual induction into the buildings, we have a transmission CPE that provides the service. We have the Ethernet network which is depicted in red and then if the customer has access, IP transit, whatever, we have an NNI to the cold IP/MPLS call, then the peering and transit edge words to the Internet. What you see here is that for IP services the Ethernet is transport of choice. We also have for an IP access and DB transport service, plus the IPv N transport service on top. All the NNIs are physical NNIs, so not NNIs are actually Ethernet to the boxes.
If you look at the bottom, there is a clear difference between the Ethernet and the IP space, very static and predictable and the IP being more flexible.
The Internet solution we use, all the ones we see on the market today are actually relying on standard MPLS. Some of the vendors had appropriate solutions, this is gone now. All IP backbone is MPS enabled since early 2000. This is something we have been running for quite sometime. It's also something that's going to enable us to do a 6P in an easier way.
And you'd ask me, you probably deployed the ear Ethernet solution long ago, that's no so old. It isn't. It's been deployed in 2007. But they were quite a am in of constraints from the care Ethernet service that we couldn't resolve with you know the MPLS functions or additional functions in the IP MPLS core. BFD was not Ethernet supported or we weren't able to deploy T the same with fast reroutes. Just looking back the last ten years, I haven't spoken at RIPE for years and years, but a presentation like the one where we seen this cutting sub second for IP networks, we are talking sub 50 milliseconds in those spaces for years. So that's something we couldn't do in the per cent backbone back there. There was no relevant automation, care Ethernet is automated. There is a lot of tools. As you know, most of the service providers in the IP space, there wasn't so much automation. There was no tool that could manage the system end?to?end from ?? service activation to service assurance. This is also changed.
Bandwidth. I mean, today, the care Ethernet you run is ten times the size of the IPMPLS backbone, which isn't small. Nevertheless, back then, there was no way to scale up the IPMPLS to actually serve the traffic forecast from the [] care Ethernet services.
Then there is this fear of the bad Internet, right, and all the Ethernet originated attacks. Something today ?? and we discussed this with Michael long ago, the security. Most of the service providers today should have you know security in the hands, properly managed. The boxes properly secured. Proper plain separation, proper dedos detection, mitigation mechanisms in place so there is no excuse any more today. If you get hit by a DDoS, cannot manage T you you are probably unprepared.
So I think today, you know most of the limitations for not running and integrated network in situation like was mostly psychological. You have the project management teams are different and the operations teams. There is a lot of change in the way you do your products. The way you operate your two networks. The resistance is more human resistance than technology. What you want to make sure we don't weaken the layer 2 services and the characteristics by introducing a more kind of flexible and cool IP services on top of it. So that's something we had to keep in mind. And it's also one major change in the way we operate the Internet side of things, so today the IPMPLS is actually publically IP addressed. It's somehow exposed to the Internet, we use MPLS co?hiding, so the Internet is going to be carried inside a VRF, which is quite a change for the ?? we have been operating the Internet for the last 15 plus years.
So what are we going to do? Today, as I said, we run two separate networks on the to much after optical layer. At this stage we are not touching the optical layer, I am going to talk about that later on. I basically want to take those two layers, the Ethernet and the IP layer and combine them into an integrated IP Ethernet layer running on top of the same optical network. From a service point of view, as I said, if you look at the top here, I am pointing to the Ethernet layer here. On the customer premises in green for Ethernet services, there is always an Ethernet CPE and if you have an IP service on top and the Ethernet is just used as a transport mechanism for IP, you have this blue layer. And I think as most of you know, this layer 3 CPE is quite dumb, what does it do except routing from A to B. Maybe doing some Nats and that's about it. Route v6, and that's going to go away anyway. I don't believe it, but, you know, this is what people push for. A lot of these functions can be moved away from a CPE on customer premises. What we are looking at: If you look at the picture here on the top right?hand side, this blue CPE is actually very light blue, what we are trying to do is get three of the layers on customer premises if you don't need it because of the product features.
So, a very colour ful picture. Let me start on the left left?hand side which is CPE. As I said, for AS layer 3 service we have to put a layer 3 CPE connected to dual layer CPE, which is one point we want to have a look at. The second one is the middle, so we still have a layer 2 provider edge and a layer 3 provider edge with an external NNI between the book. S so every time we have to go and create new NNIs when some of them are getting full, it's using ports on both ends. Not something that scales very well. On the right?hand side, we have a separated layer to core and a separated layer 3 core. Then you have the Internet sitting over here.
So the first thing we want to simplify is the CPE side of things, so you see here, the blue CPE is gone. So for all the servers that only require an on site layer 3 CPE we are going to remove it going forward and put the features on the layer 3 provider edgewise in a VRF, in virtual routing instance, you name is depending on what the vendor calls it. There are some limitations. It's going to be limited to static routing, some ACLs, maybe some NAT, if you need a physical portnd on the customer premises to connect the PBX for voice?over IP, you need a CPE clearly. If the customer wants dual access, you know, with the same service provider, you need CPE on site to manage the IP locally and things like that. So I was saying it's going to to away for all the services but for all the basic services it's a nice way to save money for your equipment that you have to put there, maintain, replace, manage and so on. That's kind of one of the first steps.
The second one we are going to change is actually the provider edge device. So the provider edge device is going to be a combined one. So, most of the vendors today on the market, if you look at all the three, four, five largest players in the Ethernet and IP market, they have provided actually devices that can do that and can scale up to be able to do that. So that's the main change. This device is actually increasing in complexity. We do much more things on this box, term layer 2 services, 3 layers, blend layer 2 and layer 3 services. Start to virtualise some of the services to simply man the CPE. So a lot of things are happening there but you see, we know you use a number of boxes, reduce the number of... so quite a good thing to do.
The next step is going to happen on the right?hand side. It's going to be the kind of the collapsing of the layer 2, layer 3 core. That is probably going to be the last step in the project. So, he objected ending up with a core that can serve layer 2, layer 3. And this core will actually be pretty dump only, IPMPLS only. All the intelligence will be residing outside that core. We don't want to do any IP routing in that core. Everything should sit outside should be LSP based.
So the end view looks like that. Serving in red, layer 2 services to customers, layer 3 customers in blue, and extending the layer 3 services you know either to IPv N or Internet services in grey. What we have on the left?hand side an appear in the layer 2 access aggregation, it's not going to be one layer of device. You have to be clear that in larger nodes like tier 1 cities, London, Frankfurt, Paris, to name a few, you are probably going to have between 2 and 4 layers of aggregation and edge, right. This is something that you need because there is no box that can handle you know that much bandwidth today and you still want to keep some service separation, you don't want to put everything on to one box. What we have is probably only 2 ?? any more rights of moving down from 4 to 2 because we getting read of rid of dedicated ones. Still keep the external connectivity as we have it today. But make sure that the external connectivity will still be residing on the dedicated devices, so MPNNIs on dedicated devices, peering transit, that's the blue box up here ?? you know at least one such box per IX, maybe two, being multiple IXs. So the peering transit side nothing is it is going to change. What is going to change is more the core of the network and the customer phasing side of the network but the external phasing side of the network all the NNIs being to the Internet or to other providers to extend the MPLS reach is not going to change.
A couple much other things to keep in mind. It looks simple you know and cool, but you know there is always, you know, things that that you forget about or that you don't think of. What we want to do going forward is to go dual vendor. It's something people can debate. It's dual vendor good? Is it bad, is it nor complexity? Is it just for the commercials? Things can be discussed. But what you need is some sort of end to end OSS. Without that there is nothing you can do: So the OSS system, this will be able to have a full view end?to?end for delivery, activation assurance, that's clear. What we want to do is make sure the vendors we pick are flexible. We want move away from ?? implementation especially in the access side. The Internet side of things, or layer 3, there is a lot of standards and people have been working together, we see it in the rooms for years and years so this is pretty open and the interrupt is kind of fore begin. On the left?hand side the access PC was quite different. What we were pushing vendors for was the ability to use any CPE. Any CPE that is living up to a number MEF standards should be able to interrupt with the provider edge we are going to put here. That's going to open the door going for seamless MPLS. From the CP to the CP on the other side of the service.
Other things to keep in mind, and this is you know the tricky ones. That's I didn't put this blue boxes sitting somewhere on kind of on a stick connected to the core is how you do route reflection. So you know doing v4, v6, etc., how do you do that? How do you make sure the core can stay routing free without it being sub optimal. Multicast. Multicast is a pain in the back when it comes to things like that. So if you look at draft today it's already an ugly hack. What we are really looking forward is what are the vendors is going to do between MLDP, some multicast LDP and point to multipoint traffic engineering. I think we are still waiting for the vendor to agree and IETF to come up with some of the published standard on that one with. But it is something we are also looking at which can jeopardise a project like that.
That's also one of the reasons those two things that we are actually keeping the core integration at the last phase of the project. Because the phasing by starting with the CPE, doing the provider edge and then doing the P or the core in the end, gives us more time and buys us more time to look at some of those issues.
And then the security requirements. This is kind of a list of things you know that came to mind that I actually want tone force. This is actually longest list of some of the requirements we have to review going forward. As I said, it's going to be an MPLS only core, is has to be routing free. It's not just because we think it's cool. It's still because also there is a lot of education to do with all our customers, you wouldn't believe how many large customers, specially in the financial industry, have policies that say that for any private wire type services, the service provider is not allowed to use IP anywhere in the network. Right, so, you know, this is also kind of one of the drivers but at the same time, you know, we are working on educating those customers, so even though the core will be routing free, we will still have management. But it's not on the path. It's not on the traffic path. So that's also one of the reasons I said that the plain separation and proper couldn't err plain management is very important.
One big change is also agreement by driven by the fact that we are removing the ASP on customer services. Your layer two broadcast domain, is it going to extend from staying inside the customer premises. It's actually very important that you have the right mechanisms to manage that.
The other thing on the customer premises, even though we are actually get a read of the ?? there is still one feature you need on your layer 2 CPE. It is XS. Even though the layer will be a layer 2 CPE only. One of the expectations for that to be able to layer 3QS. Nothing fancy but you need to be able to protect the access link and have also some of the layer 2 security features in that CPE instead of having to deal with broadcast storms being pushed out on the customer network out to your product HD device for example.
One thing that was limiting some of this, especially the CPE integration, is that, up to now, all the functions and the security functions, especially when not available inside the VRF. It was also a general config statement but you couldn't have it per customer or you didn't have the same flexibility. And this has changed so that, today, most of the features you need to do some sort of ?? is actually available inside the VRF.
AAA will be very, very important, because the boxes will have many more customers on them. It will be much more complex. A lot of it will be done by tools. You know, but even though it's driven by tools, you still want to make sure that the tools can do the craziest things on the network. The AAA policy we are going to push out, being in ?? north bond interfaces whatever, they need to be there and exist and actually enforce some of it.
You need some are IP D v? 9 support indicator your traffic management tool for your deknots detection tool. The way we do DDoS mitigation is not changing. I don't remember if you remember the presentation I gave seven years ago, we use BGP triggered MPLS to do these mitigation, same thing will apply here. And the other thing is, which is ?? making sure that all the security features are actually available in hardware. Right. And, you know, today, there is still a lot of stuff being done in software. I think most of us have been pushing vendors hard. I think they got the lesson and under constituted stood the message. This plied to IPv6 and today it's too easy to blow up an IPv6 enabled network because a lot of things are still happening in software. As I said, the whole management is changig. The fact that you are mixing care Ethernet services that need dedicated and committed bandwidth with services which are more flexible like IP access, the between doing CIR and EIR, bandwith management network is very complex so the umbrella access tool we need will be able to understand that, manage that and provision that. So that's kind of one point where the complexity is actually going up, you know, so I am not saying it's just a win win game. We are actually losing somewhere elsewhere we are coming up with more complexity.
To summarise. And that's going to be the last slide before we get a couple of questions if you are not starving completely. The continuity of the requirements of the care Ethernet servers are mandatory. So, you know, the push bike we had internally was the care Ethernet people being the product management side or engineering operation side, come on the Internet it's too flexible, too weak, it's bad, you are going to weaken the services, customers won't be happy about that. So we had to do a lot of education internally to actually explain how we are going to maintain the service quality or the quality of service we have for the old school SDH type services. The good thing, it's going to enable us to blend IP and Ethernet services, a lot of customers today that we have, have a mix of IP VPN services and Ethernet services and this is going to be make is much more easier for the customers to deploy that and make it evolve. It's a benefit for customers ing to go down that route.
As I said, security is key. And security not so much any more as, you know, security box itself, because I think people got that, and I think security network, I mean Michael has been preaching that for I don't know, ten years now, if you didn't get it yet, I don't know what we can do. So it's really about making sure the network is available. You know, availability is going to be key, to have the proper segmentation, because when you start to virtualise, there is always leaks in between domains, you have to make sure it's not happening. And visibility. Don't do that if you don't have tools to visualise stuff. The RIPE people have been putting up great tools about visibility. And still a lot of people, especially in the event network side of things, even the large networks, they have no view into the network. You wouldn't believe ?? the only thing is traffic on the interfaces, I think it's also been preached along, you know, for years and years that you know, visibility is key. If you don't have view, is it [netro] base, is it a sniffer somewhere? Are you blind? It's going to kill us. This is very important. And then as I said, network automation is a must. You don't want people to fiddle around with end?to?end services, delivery by hand. A lot of this will have to be automated.
The next steps: As I said, this is something we are doing and rolling out. So it's not just, you know, some Powerpoints. It is actually happening. We are looking, we started it you know the easy way by picking vendors we already knew, we had intro us do ed to the network and turned them into the this next generation combined layer 2, layer 3 network. We are not forgetting our dual vendor solution. We are looking at possible second vendors, probably introducing, you know, third party PCEs on the premises and having the proper umbrella assessed because the OSS is going to be one of the key enablers for having a dual vendor solution.
And what you are going to do next is, I don't have a picture of that in the slides, but do the same with layer 1 integration. So, you are running packet optical. You probably know that in the history, there has been a lot of the ?? about MPLS TP and especially the ?? side of MPLS TP, what our vision is, is that between three to five years from, now the optical backbone that we run today will be packet enabled. Is it going to be optical or packet? Packet over optical? I don't want to start that debate. But nevertheless, what we see is that, you know, we'll have a very, very intelligent service edge in the damp ?? you saw in the slides today will be running on top of optical gear in the future that can talk you know LDP and LSP. That's kind of where we think we are going.
Any questions?
CHAIR: Thanks, Nico, for giving us an insight into your network, why you are doing and what you are doing. Please don't run away all of you yet. We have another very small talk coming up, that got sort of ambushed on me, so it's on the agenda. I think if we ?? the next talk is going to be very short so we can have one or two questions, please.
AUDIENCE SPEAKER: Derek Friedman. Just a quick question, you talked about Internet in side of VRF. What sort challenges do you have ?? I take it you share reflector infrastructure?
NICOLAS FISCHBACH: That's one of the things we are looking at at the moment, because the fact that we do this is going to change the way we do it, because, today, in, I think, some of the instances it's actually a shared service on some other boxes. So we actually have a running project of moving to the dedicated route reflectors. Some of the vendors have some solutions, some of us are working on them where we actually position the route reflectors in the network will be key. I don't have the answer yet. It's actually part of the project. Unless you have something more to say...
CHAIR: Okay. Thank you again, Nico.
(Applause)
The next speaker will be Lorenzo from Google. I think, you know, all know him so there is not much I have to say. He is running around and telling people to turn on IPv6 and a Google is great or something like that, or that IPv6 is great or tun on Google, whatever.
LORENZO COLITTI: Sorry for standing between you and lunch. Sorry for doing this so late. What are we talking about? As we have talked about various times before, Google and other can't turn on v6 because there is issues with sort of persistent low level broken ness scattered around the Internet and that would basically bring the performance and reliability of our websites down to a level that's unacceptable to us and to you. So, a little bit of detail and you know also one of the ideas that we have to try and tackle this problem, which I see is a real problem that's facing us in this industry as we really look to transition to IPv6.
So, the problem ?? the fundamental problem is that, you know, most of our stacks today, if they have an IPv6 address and they are fed a AAAA record they will try to do over IPv6. If that doesn't work, they will try to connect over IPv4. If you don't have IPv6 connectivity, it will just go very fast. It will try to connect the socket stack will give you local area and you will immediately fallback to a IPv4. The problem is if this doesn't work properly, for example if your computer thinks it has a v6 connection and it doesn't or your router is telling you the same, it's just throwing packets away. What happens there is depends on the operating system. You can expect to wait for 20 seconds or on the MAC it depends if you get unreachables or not on Linux if you get unreachables, it's instant but if you don't it's three minutes, so you can expect to wait a time that's not acceptable to you or to us in loading a web page these days. We want web pages to come down in 5 hundred milliseconds not in 20 seconds. So, this is multiplied by the number of AAAA records you have to the DNS, for example, if you have four IP addresses, if you look at Googles v4 addresses we have 4 or 6 or 2 depending on where you go. You have a 6 IPv6 addresses you are going to wait for two minutes if you are using Windows. You can sit there and try to connect to Google and it will spin for two minutes. On Linux I called my box sinning faithfully for I think 19 minutes before I finally loaded. It was timing out three minutes for each one.
So, here is an example. This is my favourite, and please don't look at the MAC address to say find out who it is. But anyway, don't do that. But this is a really user and this is a real user who e?mailed us saying I can't get to Google, what's the problem? What did you do? So, you'll see that this address looks kind of strange and we'll talk about the bugs in the next slide, is connecting to this address which is one of the addresses for Dublin got Google dotcom for IPv6. It's sending in the router is saying unreachable, I can't get there. It's ignoring that. It's sending another SYN, unreachable, unreachable. After four tries it tries to do neighbour unreachability. Then it tries again on the next address. So, yeah...
So that is what's happening and you may laugh, but this is what's happening in the real world. So what's the problem?
There is five bugs here that I can count. Number 1 is the home Gateway is sending out an RA for a prefix that the not even designed by the ITF as Unicast space. It's reserved for IPv4 compatible addresses and the router is saying this is my prefix please have added.
Number 2 is the host is actually believing that prefix and it's choosing to use it.
3. The home gateway doesn't have any v6 connectivity. However, it is telling the host it does have v6 connectivity using the protocols prescribed for this purpose. Then the host is attempting to connect to a global address even though it doesn't have a global address.
Finally, the host is ignoring the unreachables until ?? for four seconds it keeps trying anyway and then it gives up.
Also, there is user error here. User misconfiguration. The user put a home gateway in a 6to4 mode behind a NAT. That's not the way it was intended to work. It will never work and now you can, you know, however way you pass all the blame you could say the host should be smarter, you could say that home gateway should be smarter. I suppose you could say that the user should be smarter. I don't know. There is a ?? the point is that, you know, if this user tries to get to Google, he is going to wait 24 seconds no matter what he does, he is going go to map, news, every map tile is going to wait 24 seconds before it pops up. We don't want this and you don't want this and the users don't want this. So, how do we fix this? The year of time that we have before we all run out of v4 space and we start thinking oh, what are we going to do tomorrow? Because that's the way the transition is going to go, because nobody has really done much yet.
So, what's the real problem here? The problem is that the real problem is that the home network is broken and the equipment is misconfigured, it's broken and it's not operating the robust fashion. What looks like to the user is Google is broken. Google is slow. So we asked them, can you get to Hurricane Electric, can you get to RIPE, whatever, no, I can't get to those sites either. But you know I didn't notice that, I noticed it if Google doesn't work. So essentially this becomes a Google problem and it becomes our problem. So, and you user doesn't think there is a problem with his network. He thinks that just the Google network doesn't work. He he suffers in silence or goes to another website or complains that Google doesn't work. And you know we have data and it shows about 0 .05% of users, they have this problem today and it's not really getting much better except when OS vendors make it better. So ?? and these users, they have the problem all the time. Every time they connect, they plug in, it doesn't work, they try to connect again, it doesn't work.
This is the number one reason why can't turn on v6 today. Assuming we had magic wands we would able all our infrastructure. We'd still be stuck and we don't know how to fix this and we can't fix this by ourselves.
One of the ideas we have been kicking around in collaboration with the ISOC is for a day in which we try to call attention to these problems. Essentially, for one day, we would turn on AAAA records for all our main websites and try to get users to understand that their set up is broken. Try to get them to call their SIP and have their SIP tell them your set up is broken please disable v6 or please fix it like this, right.
And so, why a sort of a global flag day? The idea is, basically, right now it's Google is broken, and so, these might ?? the user might not notice. If Google and Facebook and Yahoo and if they are all broken, then user might start to think my Internet connection is really suck ey today, maybe I'll call my SIP and find out what's up. And then the SIP can first of all find out who the user is, because remember we know who the user is because we monitor this, but we can't tell SIP due to our privacy policy. We can't them this user has a broken Internet connection because we never got our users permission to do that. The lawyers will come down on us if we do that. So, yeah, we share the impact. We can essentially point to a day in which the Internet community as a whole can target as a flag day if you will, where, for example, we can tell vendors if you don't fix this bug, you know on day X, 1% of your users will break and they will sort of have problems and they will, you know, tell contact you. And so, we can warn users in advance, we can do marketing, we can do information and sort of in the runup, we can do things like try to warn users if we see they can broken connectivities and so on. There a lot of outreach. That's one of the benefits of a global day.
So this is the current thinking and why am I talking about this so early and so late and why have I not got slides? The reason is that this is not fully baked at all. It's an idea that we are kicking around and we are discussing but, many of us feel strongly that the operational community, the SIP community needs as much as possible. As much advance warning as possible that somebody like this is coming. We are kicking around a date of the 6th June, just because it looks nice and the anniversary, five?year anniversary of the deprecation of the 6 bone and it's 6/6 and also we will be well into IANA runout then, so we'll hopefully have some mind share.
Various major consent providers, if you look at that list, that's basically half of the top five are already in there. Nobody has committed to anything yet, but we are ?? people interested, and we are trying to see how to do this.
So, what can the various actors do? Consent providers can deploy IPv6 and participate if they one their own infrastructure. If they use a third party CDN, tell your vendor that you want to participate. Tell the CDN provider that you want to allow at the moment to allow you participate. They are listening, they are in this discussion. Go tell Akamai I heard about this AAAA date, I want to do it on my website, can you do it for me? And marketing wheels will be set in motion and things will happen.
If you are an SIP, you might have, you know, a higher than normal level of support calls in that day. You can help by providing an easy script that tells your support personnel how to deal with this. Some ISPs and rather major ones have said, how many broken users, one in 1,000? No problem. We'll buy the CPE for them and we said we can't tell you who they are, sorry. But if you take this view, then by all means feel free to you know give new CPEs to your users with proper v6 support. For OS vendors, this is the only long term fix. The OS vendors have to make stacks more robust in the face of failures. And bear in mind, I don't know if there are any O S vendors in this room, you have to patch old systems as well. These bugs are fixed in the most recent versions of Windows and Apple licences, but there is an install base that.doesn't have these fixes and until these numbers go really down, we can't do anything.
For home router vendors, please stop shipping 6 to /SRAEUBLD by default. It doesn't work. The evidence is out there. It's just doesn't help.
So, also yes, please provide proper v6 support NCP s. For everyone that says there is no v6 capable CPEs go by a D link, they have 20 models on the store shelves that do proper v6. Hopefully that will trickle down to the service provider, OEM market and it will trickle into DSL as well.
With that, please feel free to ask me any questions now. This is essentially being run by the ISOC, fill Roberts is the person here but I am happy to put you in touch. If you have contacts at Microsoft or Wikapedia, we are still trying to reach, please let me know. And, that's it.
CHAIR: Lorenzo, thank you for bringing this up. It's have' important message. We have a talk on Friday in the Plenary, which is one of the wig est German IT news portals and they actually did this, turn on v6 for one day, look at the responses, turn it off, evaluate and they have not permanently turned it on, and they are giving their reasons why they think it's save to turn it on, which is actually (safe) an interesting point.
SPEAKER: We have those numbers. What we concluded is that users suffer in silence.
CHAIR: I can see the different points of view and I find it interesting to have both views presented here. So this is quite good. I see a long queue. I would say that well, if you are too hungry, then just run away, we are not offended and I am happy to see actually people willing to discuss this. I have no idea who was first, so please sort it out among yourselves.
AUDIENCE SPEAKER: In the spirit of talking to your vendor, if I remember correctly, Google produces a web browser. You introduce all sort of nice little stuff in there, you implement a mechanism to dean next domain to avoid SIP to say show you adverts when in reality the domain you are asking for doesn't exist. Why don't you do something that helps the user?
SPEAKER: Absolutely. This is a great idea.
AUDIENCE SPEAKER: Rather like don't wait for June.
SPEAKER: The question there though, is, if you fix is ?? really the web browser is the wrong place to fix it.
AUDIENCE SPEAKER: But you control that one.
SPEAKER: And we can fix 10% of the problem right of most, so yes this is certainly on our list of things to do, and we would like to make chrome smarter about doing things like this. We can fix 10% of the problem and also, it would be best if we could contact the OS vendors and get them to fix the OS as opposed to putting ing these things in every application, but yes...
AUDIENCE SPEAKER: Daniel Karrenberg. I am also an ISOC trustee, so I doubt very much that there is ISOC activity that in corporation with Google and all these other CDN people and I think this is an important thing. I am really not in charge of the RIPE NCC's we object presence but I'd like to pledge that the small content that we have will participate in this.
The other thing is, if I was organising this, I would be very concerned that when the day actually happens and the complaints start rolling in that people will renege on their earlier promises. There will be, what I would advise you is to actually make this as hard as possible to do, because if one or two of the big ones do it, we will not have the chance of ever organising something like this again. So, it's really, you know, we all, or most of us hate lawyers but I think you need to get lawyers involved in this.
SPEAKER: We will need to get lawyers involved. We need to get marketing involved, product. All these people that have so far been ignoring IPv6 completely, we will have to talk to those people. As regards the backing out, I think some of the ideas have been tossed around are the ISOC will have a dashboard, a web page that shows green if the each SIP is currently handing out AAAA s and red if it isn't. And there'll be probably be /KOERT /TPHAEUTing back out procedures, things like that. And also, we were (coordinating) one of the things that's also being kicked around is /TH?G be a joint press release, at least from the major players significant this day, you know, it's going to be /OUFPLT so at least in the court of public opinion, if something goes wrong, then at least it will be documented, who is in for it, and...
AUDIENCE SPEAKER: Martin Leavy. Can you go back a couple of slides to your list of companies that you had on there. So, on a scale of 1 to 100, what's the commitment level today from Google, face work, Yahoo, and all those companies you have there?
A. None. As I told you. Look, there is it is. Nobody has committed to anything. We are talking about this. We are kicking around a date. We are thinking about how to do this.
AUDIENCE SPEAKER: Second question, which is really a completely different subject. You have talked about the broken ness for a couple of years now, and you have a large amount of traffic and testing to prove it. Would it not have been more interesting to have basically just turned this on a year, two years ago, and for want of a better way of saying T put your money where your mouth is and just let the broken ness sit in the industry so that more people were aware of it? I think that this audience agrees with you, and understands, and knows the /SULT tee, I disagree with the 6to4 comment, but know that anyway. But by having it broken, if it had been broken for let's say even a year, we would have seen a lot more recognition of this in the general public, not in the specific community that we live in. So, counter to the one day idea, and asking you to put your head on the block because that's what you would have done, it would have cost you money to do this, what, except for losing money, what were the arguments against simply saying to Google, saying at Google, right we are going to go, do this and we are going to take the flack and we are going to make people realise by hook or by crook that many elements in the path are broken?
SPEAKER: So, unfortunately for me that's not a decision that I make. And the people who make this decision have made it it's /KHRAO /* it clear that this is not an option, absolutely. It's not to do with Revenue loss, if you run the numbers, run 0 .05%, what do you come out with? It's a fluctuation on the balance sheet right, but it's about reliability, it's about image, it's about providing the best service we can. And talking to other contract providers, we are not alone in this. Talk to I can't /HAO and see what they tell you, and (/KWRA* /HAO) and also, I think that even if we did this, I don't know that it would ?? I don't know that if we ?? if Google by itself did this it would actually help this. I don't know. I honestly don't know.
AUDIENCE SPEAKER: I just want to hit that last point. Let's say you did this a year or so ago and I am saying this whether you go into a Dixon s in the UK or a Fries in the US, what if the store manager put a sticker on a box that said "This works with Google, this doesn't work with Google" that would affect purchasers, and money is the thing that talks. I am just ?? this is my last question. Just for you to think, ponder about and maybe respond.
SPEAKER: That would be evil. We do no evil.
AUDIENCE SPEAKER: John Quarterman. Since a lot of source of this problem appears to be old unpatched operating systems and a lot of those exist in places like China and India and Brazil where they are probably not going to call Google either. This leads to a thought: Have you considered contacting the Linux distributors? Because this, if you do this flag day recollect it's going to be a great marketing opportunity for Linux.
SPEAKER: Actually, Linux has its fair share of bugs. I am trying to fix one right now. So, ??
AUDIENCE SPEAKER: Another reason to talk to them.
SPEAKER: As I said, I just committed one patch last week, but yes,
AUDIENCE SPEAKER: Hi, Gus to say, I'd like to share our experience a bit. We run dual?stack VIX pill /OT and the first thing we did was in order to offer the user some IPv6 content we contact /TKPWAOL /* Google to white list or resolvers to actually get the AAAA records. Well we sent the /TPHEURS e?mail and then no answer. The second e?mail after two or three weeks ago, still no answer. (First). I think we didn't send anything now. We tried to escalate the thing to the CE O of Google in grease, to personal contacts. We didn't manage to do anything. And in the end, from what I know, the solution was given from our up stream provider, somehow communication channel got established and we actually have now the AAAA records. The question is is this experience ?? did this experience have to do with all those problems and did you revisit your policies for IPv6?
SPEAKER: So, the ?? let's see, there is a lot of questions in there. So, the ?? well I apologise in the delay, for the delay in response.
AUDIENCE SPEAKER: We actual received no response.
SPEAKER: You go /* did, you got the AAAA records.
AUDIENCE SPEAKER: Yes, we got it, but I really don't know who talked to whom.
SPEAKER: So, okay, so that is due to ??
AUDIENCE SPEAKER: I also send awe personal e?mail.
SPEAKER: I will state here that I am unable to respond to all e?mails I get, I am sorry. I will state ?? please, stone me now, but ?? that is ?? so the route of the issue is that white lists /* ing does not scale. That is the root of the issue. And this was /* everybody tells that you say white?listing doesn't work and we agree, but we still think it's better than nothing and as imperfect as it is and the fact that we have to process these requests manually so as far /* far is really slowing things down. And we are hoping to fix that. But we also need to be careful about automating this because turning v6 on and off based on broken ness and based on user numbers, could lead to sort of switching traffic between v4 and v6 in your network and in a time when we are pushing v6 traffic natively and lots of v4 ?? and v4 traffic through carry grade NAT, if we fallback from v6 to are v4 we could over load those NAT boxes and things like that, so we have tore careful, once this goes to production phase. So, we are working on awed mating this, though, we hope to find a way to keep it safe. And as regards ?? there was another question there ?? so, again, we have changed the requirements. The requirements changed in response to direct evidence from users that asking SIP to say support their users is not sufficient to avoid these problems. Sadly, we were convinced that the user would call their SIP sand complain, nothing. We exposed a large number of users to quad arcs for a large period of time. They did not contact their ISP. They e?mailed us, and the number of people that e?mailed us was tiny in comparison to the users that were (ISP) soft wearing the problem. So, maybe you one out of a thousands. So that is why we have to be so careful and now the policy states, and actually what ISPs can do, what US ISPs can do, as you are running out v6 to their useers please put them on separate DNS servers. Many other ISPs have not. That is very important. But from the perspective of content providers, handing out AAAA to their resolvers, and there is only v6 capable users, at least you are saying well I am handing out AAAA s to 1 hundred thousand users, and they will all have v6. As opposed to saying I am handing out AAAA s to 10 million users and 3 hundred v6 users in there. That's not helping. So yes, thank you for doing that.
CHAIR: Last comment.
AUDIENCE SPEAKER: I think it's a good idea to raise awareness of this problem, but I really think it only works if you really combine it with media attention, warning users in advance like your connection seems to be broken /?BGS, on this date it will fail. But, yeah, I see a lot of potential here.
SPEAKER: I know we have a lot of work to do in warning users and we have to run through legal hoops and marketing hoops and things like that, yes, so, again, nobody has committed anything yet.
AUDIENCE SPEAKER: If this is going to happen, I'd be happy to try in the Netherlands to get as many websites there as possible and to get some media attention.
SPEAKER: That would be great.
AUDIENCE SPEAKER: Suzanne Wolfe, I don't think this is ever going to happen for all of the reasons we have been talking about, why it would be difficult to explain to people why it would be difficult to get the lawyers on board and so on. I hope you continue to have the conversations anyway. Because it's very obvious there is a lot to be learned by even thinking about what's involved in doing something like that. And some methods of increment ly improving things will come out of that discussion.
SPEAKER: Well... hopefully, yes. One of the ideas of having a target date is that we can tell people, look, you know, this is when we are all going to make a concerted effort. And having the conversations, I mean we have been having conversations, but...
AUDIENCE SPEAKER: This is the thing. Is that having a target date and a target goal, a state of affairs you are working words to for that date, does provide some opportunities that, as you say, having less focused discussions might not.
SPEAKER: It's only true as long as the date is far enough away because it's not going to happen anyway.
AUDIENCE SPEAKER: That's the problem. As it gets closer, it's going to get more difficult.
SPEAKER: One thing we have to avoid as an industry is we have to avoid v6 scare mongering. We have been to many parties have been burned by doing too much v6 work in the past. Look at the Japanese industry, look at all these people that put a lot of money and work into v6 and then nothing happened. We have to be very careful to avoid that. Extremely careful. And that is one of the within /* reasons really I think why we are in the mess we are in now is that nobody is listening any more. V6, v4, didn't /* wasn't that going to happen in 1998? And so, /HR?PTS right ?? talk is cheap, right. We have to try and get it done or something like this done.
CHAIR: Thanks again Lorenzo. This is obviously it's such an interesting topic that you managed ?? it was interesting enough, otherwise people would have left. So thanks.
(Applause /STPH?FPL /* /*)
CHAIR: Thanks to you for your feedback and enjoy your lunch.
LIVE CAPTIONING BY MARY McKEON RPR
DOYLE COURT REPORTERS LTD, DUBLIN, IRELAND.
WWW.DCR.IE