Archives
The MAT Working Group session commenced on 18th of November, 2010, at 4 p.m. as follows:
CHAIR: Hello everybody. Good afternoon. Welcome to the MAT Working Group. We have got three co?chairs for this, myself, Christian and Richard Barns of BBN, who is obviously a hard?working man, he is upstairs in database at the moment. We will have him for the rest of the session once he has finished. We have got a scribe, Chris is manning the Jabber. So if there is any indications from there.
A quick note on microphone etiquette. The session is being webcast so if you come up to the microphone, please give your name and affiliation. That is going to be particularly important, I should say, during the last thing on the agenda which is a discussion where we are looking for a lot of participation from people. We have got a very packed agenda, we have to finish on time because this room is wanted for the next session which is the RIPE neglect BoF, so I would be very keen for the speakers to stick to the time as much as possible.
We have to start with, we have got Alberto Dainotti from the University of Naples, we have got new staff, Owen from Hurricane Electric. Sean is going to give a update on what the team have been doing, Claudio from CAIDA has got a talk on visualising Internet measurements. And Robert will fill in some of the technical details of the new Atlas system which was introduced by Daniel earlier in the plenary and Daniel is going to give his vision for a measurement framework and we are going then to have a discussion within the Working Group about what we want from that measurement framework, how we can make it happen, any other issues around privacy and sharing of data, it's going to be a free?for?all discussion, we have to finish on time. If there is any time, if there is anything people want to talk about, we will have any other business. I think that covers everything. So I'd like to invite Alberto to come up and talk.
ALBERTO DAINOTTI: Thank you. So, good afternoon, my name is Alberto Dainotti, and because we don't have much time, I thought it was a better idea to start from the last slide just to be safe. So I am from University of Napoli, so we are research group working on network measurements. So, the purpose of this talk was to meet people from the ISP world, from the operators, people at RIPE. It's my first RIPE meeting, by the way, and the objective of this presentation was to talk to you about our research activity on traffic classification and about a platform for network classification.
What I really would like as feedback, all feedback is obviously welcome, but as a feedback we would like to know what are the problems that operators, ISPs, really, feel are important about traffic classification, how do they think we can help them make with joint collaborations on traffic classification.
So back to the first slide. I will go very quickly. I guess all of you know very well about traffic classification. So basically traffic classication means assign traffic flows to network applications that generate them, and there are several motivations for doing this. Some of them are just relating to understanding what is happening on the network links, so how people are using the Internet, what is the killer application, if it's important to model specific application in specific context or not. But there are some more motivations which are more related to operating problems, liken forcing quality services and policies, for example, or doing accounting, network provisioning and so on.
Basically, the approach is that are available for traffic classication can be very ? associated grouped into three categories, we have the traditional port based approach which is to consider the very inaccurate which is based on transport level ports, then we have technique which are considered quite reliable but have issues related to privacy and increasing use of encryption and protocol obfiscation by network applications. And in the last years, especially as we can see in this graph, in the last five years we seen an explosion of research activity, in the field of machine learning techniques, partner recognition, applied to traffic network classication.
So right now, this situation is quite interesting because there is a lot of good work but most of it in experimental stage, there are some issues related to discuss availability of data and tools, specifically about tools, I mean there is a lack of implementations of the techniques that have been presented and there are some traditional problems related probably in general to the Internet measurement research community.
But there are also a lot of opportunities. This is quite interesting field. We foresee that it will be a hot topic also in the future because of the trends and network applications have and there is interest by scientists providers, industry and also society, problems relating to traffic classication are also related to network neutrality for example, or users privacy. We see there are promising approaches.
I will start talking about traffic identification engine. In this years we have been doing research on network traffic classication, several techniques. One of the contributions we think we made to the community was to develop these platform which is not a specific traffic class fire. We call it multi?approach framework for traffic classication. The idea is to support different techniques for traffic classification, all the categories that I showed you earlier. And it's an open source proproject, we claim it's quite fast, it's modular, quite easy to modify and update, specifically it supports multi?classication which means building traffic class fire that can combine the results of different stand alone traffic classification techniques to improve accuracy and also to gain other phishes and supports on?line traffic classification, which means it's not only a tool for experimenting, with new techniques, it's a tool to be used on?line. To classify traffic on the fly, listening to live traffic from a network link.
The development started in 2007 by people from the research group, the traffic project inside the ?? research group, but in these four years it's been subject of collaborations with several research groups in the world like USA, Canada, also from Asia and Europe, some universities, but there are some collaborations with the industry, with some manufacturers and also companies doing consultancy. And obviously, there are some research projects European co?funded research projects in which TIE was involved in collaboration with other groups. You can check our website and to read more about the implementation of TIE and the technical details T here we give you very quick overview.
As I say TIE can operate off line, so looking at traffic races that you have previously captured from a link or you can use it in realtime mode so looking at live traffic and try to classify what we call a session, usually it's just a network flow, as soon as it is possible to do that and give this information to another system that can make some operation related to this flow. You can also use in a cyclic mode which is almost like realtime mode. You can use this for web reports like the one shown here in this diagram, in this sphere. So generating some output at regular intervals and updating databases and network graphs that you can check.
So for this thing, for example, you need, also, we use the framework for generating web reports. About the architecture, some few words to show you the modularity of the project. So in this diagram you see the information flow, so packets are filtered and then they are aggregated into sessions, so sessions can be like network flows, the classical 5 time out but they can something different defined by the user usually we work with by directional network flows, we have some fast Heuristics to ?? better recognised, you can have different kinds of sessions. Then when a session is formed, we extract fissures from this which are classication, properties like packet size or statistics related to the flow or it can be even payload welated to the first packet or several packet, assembled or not. After we have these fissures there is a decision combiner which is a model that manages different classification plug?ins, so the basic idea of TIE is to make available to researchers an API for developing plug?ins that implement the specific traffic identification technique. So you can forget about all the other problems related to implementing a new software application for traffic classication and just focus on your specific problem and at the same time you have several advantages: You can combine it with other collar fires and compare it because it's inside the framework in which we have well?defined form it's a, definitions and so on. So at the end we generate an output. One of the important aspects is really, the classication plug?ins, so here we have a partial list of plug?ins that we are developing and we have developed, some of them are made available with the platform, like we usually give it with port based classication plug in which is just used ?? we have deep payload which uses the Linux filter project, signatures and the code, and we developed a light weight payload inspection techniques and we are also going to release beta version of plug in based on the open DPI classifier from IPOC. You are welcome to contact us for developing new plug?in if you are interested in a specific problem for example.
The output is usually given in text format, each line to a specific flow, session let's say, it's possible to send the output quickly as soon as it's available about a specific flow, through a socket, for example, we used it for a project related to quality service, so as soon as a traffic flows was classified, let's say after two or three packets, this information was sent through sockets to another system that was signing the specific quality service class to that flow.
And I don't think I have enough time to go into the details of this case study. This is ?? a project in which TIE was heavily involved, but you can check the paper that we presented this year at the working ?? this is also a patent pending on this project. Actually, the idea here was not to show the details about the project but just to show you how we used TIE to compare a new technique on real traffic and to come out with real numbers, like performance, numbers related to the performance of the approach that we proposed. So the idea was basically to take the best of two different worlds, like they now are reliable port based approach which is fast and privacy friendly and something from the DPI world which is still considered accurate but it has problems related to computational complexity, for example. I will go directly to the results, basically.
What we showed, thanks to TIE, was that in terms of accuracy, this approach was able to reach high values compared to a classical deep payload inspection approach as L?7 filter. So here you can see, for example the accuracy in terms of the correctly classified sessions, the percentage of correctly classified flows and here we have weighted flows by the bytes, so one or two don't count allot. We can approach very high accuracy values compared to L?7 while the port based approach doesn't achieve good results in terms of accuracy. In this line instead we show performance values, so the approach that here we proposed was just took like seven micro seconds, approximately, by average, to classify a session, while the port?based approach takes about 2.5 microseconds, while DPI approach takes 200, more than 200 micro seconds. So the idea is to have a fast classifier with very good accuracy, obviously doing a trade?off in this, and this was possible thanks to TIE because we implemented the 3 of them inside the same environment, it was quite easy to compare them.
So last few words about TIE: How can you use it. Basically you just need a Linux or a FreeBSD box, and a network adapter and a 800 DAG card is preferred, if you want to generate live reports, web reports you need the correct ?? right now we are running TIE on live monitor on 200 megabit link, with FreeBSD and one of the cheapest in these cards. So again, the last slide, we are very interested in to collaborations with ISPs, we think TIE could be useful to you but we want to know how you think it could be useful for you. So that is it. Thank you for your attention.
(Applause)
CHAIR: Thank you very much. Are there any questions? No.
AUDIENCE SPEAKER: Steve Nash from ARBOR Networks, so we have a DPI tool and I would be very much interested to talk to you about the plug in concept ?? I will come and talk to you later but I think the idea of the plug?ins to pick up on new applications and new novel uses the particular operators might want, does sound like an interesting concept, but 200 megabits per second is rather a low speed compared with what some of these guys are running at, that is part of the problem.
ALBERTO DAINOTTI: That is just an example.
CHAIR: Any other questions? No. Thank you again. This is a talk that he had was supposed to be giving at the last RIPE meeting so thanks very much for coming back and.
EDWARD LEWIS: I am giving this talk basically for somebody else, which also makes a little more difficult for me, besides six months of remembering what it was about. So, I am going to talk about the statistics report ?? statistics gathering we do, we have been doing for about a year, improved system for keeping track of all the DNS traffic that we manage in response to what customers wanted and the fact that we actually need to have it ourselves. What was the hard part of the problem and probably many of you are familiar with this. The hardest thing here is we have these devicings all over the world and we had to bring it back to some central location. There are a couple of ways: One is to sample the data, just take some of the data back, look at it over time, summing the data up, looking at it locally and bring that back, or just capture the packets and trying to bring everything back.
We could store and send meaning you take packets locally, you store them locally on a disc, you then process that, then you send them back at some point or we do some intelligence out there, do we put processors all over the world and let's look at the data we see going by? What we are looking for is activities by per hour, per week or so on or by some event. We have some of our customers when go through events they want to see what was the impact of that event when they reconfigured something. A rough approximation of what is going on and trending, people want to know is traffic going higher or lower what is happening to their traffic. We see a lot of data, it's important that we compress things and what this was about even though he get about 7 terabits every day total around the world we can compress it down to about 3 terabits and as you know as DNSSEC was mentioned twice whenever I speak DNSSEC must be somewhere in the presentation. For those, who don't know, I do DNS a lot more than this stuff.
When DNSSEC comes around we can't compress nearly as well. Keysings signatures, random types of data does not compress inherently.
So, to get this done, we had to focus on what was important, and the ?? one thing we want to know is what are people asking for, what is the name of the data and what kind of data they want, the address record of a web host, for example? We are also interested in knowing who asked, where did the request come from. You want to know where people are looking up your information and as far as granularity and time, we try hourly was way too coarse, seconds was would too fine but minutes are what we can handle, and that gives us usually good enough data to get along. And also important to us is which of the servers answered, where ?? in the you want to know where the queries were being served.
Accuracy is an important topic. I added this slide at the last minute so I hope I don't make any mistakes here. I remember when we did this transition we were concerned a lot, when we do statistics gathering, did it agree with the billing data, because we charge people by the activity, and that is a big concern but we weren't so much concerned about the billing data and the statistics being correct; it was just correctness in general, and I think that is what a lot of people tend toor get. When you start collecting data you have to make sure it was collected correctly, I think we overlooked that a lot of times when we talk about statistics gathering and the ultimate test is would we use this for billing, we do now. We reconciled what happened and found places where things were miscounted, we were tapping the network in the wrong place, we missed some traffic coming certain ways, got everything ironed out and then we went live with this.
So our approach: This is kind of a high level idea of what we do. It's nothing really surprising I think. We have ?? we categorise our sites into two, high volume and low. High volume are our big DNS servers that are ARIN the world, we have got probably a dozen or more. We collect data off the box. We don't collect on the same box as the DNS server, we don't ?? it can't really work that well. Low volume we are forced to share hardware between the collection and the serving so we do that together. Every minute or so, I think it is, we send back data to a Netezza box, commercial name for database that handles the stuff, and an engine where we use to access this ?? the Netezza storage.
We collected some locally. We do process some of the data at the sites before we send it back, every second actually, we do also send back a sampling of data. We look at the unique names and queue names every so often. And we ?? let's see ?? we bring back all information back to your central location. Every minute we get a lot of detail. We bring back the protocol that is used v4/v6, query name and the type, that is important thing to many people. Response codes, we track, when we send back surf fail, getting data we shouldn't be getting or queries we should be getting. Response buckets, how fast do we answer. And we also get back a lot of NXDOMAIN traffic and that is important to some of our customers and also unanswered questions. We look at questions that come in to us that we don't respond to and, for example, there are certain requests that come into the DNS system and because it's ?? source of an attack, it's just discarded. Some of our name servers filter away certain source ports, we know they are never going to answer. Sample data we track at 1% of all traffic, also, and whenever we try to announce on that 1% we find it does match the 100% tracking we are looking for something in particular. We look for the source IP and the response in that case. 1% of the data we get everything; everything else we sum up and send that back.
Compression is a big part of this. It's because of a lot of data coming back and forth. The Netezza special product compresses numeric data really well, six times, what is not numeric are the queue names so we save that somewhere else, we try to mash down all the data as much as we can. Messages as bit maps where we take a DNS message and a lot of the ?? we have one bit where it was UDP or TCP, one bit representing whether it was an A request, AAAA request, more efficient than storing the type field. We count how many times the same message appears in a minute, so ?? a lot of things get repeated over again. We also notice that minute by minute we have the same traffic appear again so we keep that as ?? summed up, also. Generally we can compress 30 to 40 times the raw packet stream by just binning all the stuff into certain bit maps and sending them around as opposed to entire pcap traffic. This slide ?? this is a comparison of some of the features we have, the only reason why I want to look at it is, we have reporting for ?? by account, customers have for us the zones they have and the queue names and we just add more stuff as time goes by.
Some of the things that we have seen in terms ?? running this, one of the first questions is every query answered? We have to keep track of all responses. We want to make sure there are queries to match them all to make sure ?? we need to measure, how fast, in our instances our DNS servers aren't the other things out there, there is a bunch of other devices, we want to make sure the packets get through as quick as possible for the appropriate answer. A lot of people like to have dean in a server do their own statistics, the servers themselves are too busy to actually measure themselves and if queries are being dropped they don't even know about it, it never got to them.
Problems we have encountered in monitoring: The QNAME population size is number one thing. For TLDs, we have a number of TLD customers, they get a lot of NXDOMAINS, people ask for anything. We want to keep track of what is being asked, some want to know what are the names not under ?? on the DNS they are being asked for. The second two here are interesting, there are some providers ?? some resurfers servers out there to defeat cache poisoning put in random prefixes to the names they wanted, we get a lot of different queries for things that aren't wanted, just ways of making sure this is a unique request back to us and also the curser circuit tell, did I ask for that name. It's like stuffing a little bit of DNSSEC into the ?? salting it on the way up. We have had some DNS serve customers use random names or unknown names for a reason in their services to help stop poisoning attempts in other places. Recently, we thought we had ?? we thought we saw evidence of the Keminski attack in places. We were investigate, thought there were 6 places, really good news to have for the DNSSEC ?? it turned out the customers were having their names purposely mangled so that they were defeating cache poisoning in this way. Kind of unusual.
DDoS is the other thing we have to deal with because when you are doing a packet capturing you are capturing endire D do, you don't want to keep all that data around.
Things we have seen; we have called sudden traffic growth syndrome. Why did my traffic just go up? And it does at times. We have had different reasons for that, different ?? some people are mad for us for seeing it going up. Rogue recursive servers, can identify servers that are going wild, basically targeting one or two names. Writing issues, we wonder why certain IPs sending traffic to ?? DDoS there have been some small DDoS attacks that we would have never noticed before because they didn't overwhelm us or trigger our sensors and we know the world isn't all that big, there are thousand DNS recursive servers out there that send over half of our queries to us and some stats. The stats are a little weak on this. Most everything is UDP and v4, that shouldn't be news to everybody. Three?quarters of the queries are still A records, 10 percent are MX and AAAA and what is interesting was the AAAAs were ten percent but the v6 traffic was only a quarter of a percent. So it's not really good measure of the two.
So, this is a slide added by some other department. There is no mistakes only lessons and we are going from here. And the final, these are things we are thinking of doing for the future. The person who added the little picture on here, this road goes to nothing, the road to object live I don't know ?? we had a little argument about that. So these are things we plan to do with our reporting as we get it more developed. Basically making these interface better. I don't want to read through all this in terms of sales pitch, maybe so you can look ?? see your data better from what we have, have it e?mailed to you and have access by the customer to write into it and also mapping to geographic sources and so on. So, I will leave it there for questions.
CHAIR: Thank you, Ed.
(Applause)
ROBERT: Internet citizen. So you mentioned that statistics should be accurate, right and you actually compared them to reality. How do you do that?
EDWARD LEWIS: Statistics, you have to ?? we have been seeking comment internally we know we have done sanitary checks on our statistics we have other people saying something didn't jive right but we went back and made sure we have good stuff coming in.
KLAUS: There was something about one person separate and I haven't understand that completely. What does it mean? Does it mean that the only 1% of the whole traffic ??
EDWARD LEWIS: We did two things, for 100 percent of the traffic we look at it and we will make up these summation bit maps and reduce the data down to small pieces and then also reduce those minutes to small and small packages but we will also return back 1% of every packet, not every ?? every query response pair, so we have a 1% of everything full packet capture plus 100 percent analysis of what has gone on. So if you look at 11:00 something weird went on we can go back and see what packets did we see and what we can find out details, we will also know how many times somebody asked for the A record.
CHAIR: OK.
DANIEL KARRENBERG: RIPE NCC. I assume this is a proprietarty package that will never go outside of ??
EDWARD LEWIS: It's not just proprietary package, it's built into our core.
DANIEL KARRENBERG: I had to ask.
EDWARD LEWIS: There is no deliverable software, we are using these other devices and so on.
DANIEL KARRENBERG: Seriously, would there be any possibility to actually at least give us an insight into the spec of how you return stuff so that maybe other people could build something like this as well?
EDWARD LEWIS: My salespeople would love to talk to you. Seriously, though, I think it would be a sales presentation that did that. That is what you are asking for what does the product do and if you want more than just a sales brochure ?? we don't even have them right now. It would be sit?down type talk.
DANIEL KARRENBERG: Thank you.
CHAIR: OK, thank you very much, Ed. Next up we have got Owen Hurricane Electric talking about IPv6 network measurements.
OWEN DELONG: This is not the slide that you should have.
CHAIR: This is Sean from the Information Services Group.
SEAN McAVOY: So my name is Sean McEvoy, I am from the Information Services Department of the RIPE NCC and I am here to give a quick update on what we have been up to.
So since RIPE 60 we have done a couple of it. TM deployments in Nepal and Cambodia, they are both on?line and active. We have Portugal and Ukraine coming up, should be active fairly soon. As well as the RISes RRC are being replaced, just recently we have done Stockholm and also one of the big things is the resource database is being developed, the INRDB.
In the first services to benefit from from this will be RIS in that sense and also in the meantime, we have done a release of the DNSMON beta.
NetSense progression: The new back end, the IRDB is looking to give us 60 times insertion speed that we currently have so it's much, much faster, searches are going to ablot faster, as they are paralysed now, parallelised, and the availability of the data is three months currently and that is being increased to ten years so every piece of data we have in RIS will be available.
All of this is enabled by the new technology we are using, HADOOP, which is based on the file system HDFS, this is an article published on labs.ripe.net. Regarding the new INRDB if you want some more details on that and DNSMON, the subscriber beta was successful, it didn't crash. We received actually a lot of great feedback from the community and actually implemented some of the ideas that people had and some of the bugs they discovered. Also, we were always looking for feedback on the projects and any idea people may have enhancements and we have also increased the speed with the new DNSMON and we are still looking at some further performance enhancements so it's, as quick as possibly ?? as quick as possible.
The public beta was announced last week which announced I think on the MAT Working Group on Monday and the system has been sable, we are really happy with that, the new design is working out. And we are also of course looking for new subscribers, any ccTLD providers or dtZLD providers, you can check with the beta it's live now, maybe not all once, but beta dot ?? I wanted to show a couple of exampleses of the new system. So here, you can actually move your mouse on top of the row and see what the row is instead of having to look down at the number and figure out what it is and what is going on and you can actually click and go to whatever that may be, whether it's a probe, a domain or a server. And the other is the click and drag zoom where you can see an event and instead of having to figure out where the time is beginning, where it ends, you can just highlight the area and it zooms in immediately. And the other thing is this should be a realtime saver for some: For subscribers we have trace routes that can be run as you please from any of the probes to any host and it's just a simple couple of clicks and you have a trace route for yourself. So you don't need to e?mail.
We are working on now is we are working on creating a unified suite of tools so trying to integrate all the tools we have into a single suite that come under the headings of routing, connectivity and DNS, and making sure that the information can, that makes sense to flow between them, does, and that the interface is kind of common among them. Also, looking at integration with Atlas and TTM, as Atlas matures we will have more ideas on how it fits together and also we are looking for, now, of course, RIPE Atlas sponsors, TTM hosts. If anybody has any TTM questions I would be lappy to take them at the dinner tonight or tomorrow and DNSMON subscribers, of course. So questions?
CHAIR: Any questions for Sean? OK. Thank you very much.
(Applause)
We have the right presentation, so Owen.
OWEN DELONG: Sorry about the confusion there, I found out I was doing this about a week ago, so it's a fairly fluffy presentation but here goes:
So things we don't want to measure in v6, bit fields. We don't want to bother measuring headers but things that might be interesting are how many countries are on IPv6, currently that is at 123, that is compared to 225 on IPv4 so we have still got a ways to go there. We are seeing lots of growth in IPv6 actually, the top graph shows you the number of IPv6 prefixes over time, and the bottom graph is the number of IPv6 autonomous systems over ?? over the last 240 days, so actually a fair amount of growth, it is starting to pick up. We are starting to see enough traffic that we are seeing discernible diurnal cycles, UDP is interesting at the top, we have got DNS open VPM and lots of other stuff, and just http and htpf there accounts for a fairly large amount of traffic.
We are finally starting to see some IPv6 attacks. These are fairly meaningless attacks because they are limited to the gigabit ethernet output of the tunnel broker but somebody was actually trying to watch a DOS attack over a tunnel broker service and they managed to pump a good 900?and?some megabits out of a gigabit tunnel broker, so that is not bad. But everybody has been looking for the killer business case that will drive people to IPv6. We may have actually found it, because here are our results, actually:
When we announced that we were handing out T?shirts if you got Sage certification, we experienced dramatic growth in the number of people that wanted to go get their certification. And it's not just a US fad; we got quite a few people distributed all over the world, even six people in Africa have managed to get Sage certification and actually are sporting their T?shirts, some of them were wearing them at the last AfriNIC meeting, and we are still accelerating towards the wall. So in 6.5 days we got 101 days closer to IPv4 exhaustion. This is literally last week and this week. And these are Geoff's fine numbers that we are using as the basis for this, so you can ask him about why that jumped so dramatically just before we handed the /8 to AfriNIC.
But we have also added an IPv6 top level domains, 46 added and more than 3,000 added IPv6 records to their zones, so that is all good. And we are making progress but not nearly enough, we need to really get moving on v6. And with that, I will open it up to questions.
CHAIR: Have we got any questions for Owen? OK. Thank you very much.
(Applause)
We have got Claudio.
CLAUDIO SQUAROELLA: And today I will be using my weird Italian accent to talk a bit about the things that I have been doing at CAIDA the last three months. It's about visualising and geolocated Internet measurements. So when we talked about Internet measurements in these days, we tend to think about systems that get bigger and bigger so more and more ?? for example, starting from RIPE services like RIS and TTM, and for example, the ARK meiters at CAIDA and of course, Atlas which is the new which still seems to be already getting bigger. So and we also have of course, geographical information, so some of which are very explicit and easy, for example, where are the probes and the nodes that we are using. Some others are can be inferred or, for example with a lot of the state?of?the?art tools and techniques at that we have nowadays.
Why do we need to visualise this data? The answer is always the same: This seems to ?? we can look at databases and tables about this much easier and simpler to use something which is more easier and clever so visualisation in a way, and of course with these tools we can have analysis and of course even observe geographical trends, for example and how do we do this? Geography in this case helps us because it's not only data, it is something that helps because it should be something that we already know when we look at the world map we know where the countries are so we can play with that. Of course we need to create interactive and dynamic tools.
So, the first example is this idea of cartigrams that I have been working on there. So this is nothing new in itself, it's still a world map and applying some distortion on it, such that the map itself shows some data about the countries or about distances between end points, so for example, in this case we have data coming from the ?? it's simply number of AS numbers per country, so of course, as we might expect we see the US inflating and also many of the European countries, while other continents are shrinking like South America and Africa, and so this is one of the nice ways to actually visualise this data instead of just adding numbers and the only problem, though, is there is ?? I mean, the state of things at the moment is that doing these things automatically takes a bit of time because the ago grates are quite complex so it is not really made for realtime tools.
So, another option is this idea animated maps. So, in this case we think of a kind of YouTube movie so the user can drive the animation and can also interact with the interface, countries and getting additional data on what is actually visualised and it's a web ap so it works with Java script, the hard work is made server side. If you feel like this now, I think I am going to switch to my laptop for the ?? for a brief demo.
So this is nothing new because Daniel already showed it in his presentation, but anyway.
So the idea here is that we have the map of the world, at the beginning. And the data, I am using here is coming from the DNSMON service and in particular, we are using measurements done with the A?root server, the server is placed in the centre of the map, though this doesn't really make sense in this case because it's of course an Anycast server but anyway it's just a landmark there and so if we press "play," we actually see countries moving around and the idea is that as time goes by, of course the RTT values are changing so in this case we are using like upgrades and values and if we hover countries we can see the RTT values, so this is not exactly like it should look like, like Sweden, Italy and so on, so if we took ?? looking at we can see a few things happening, in this case US gets further away and also quite a few European countries meaning there was either some problem or ?? in connectivity, their RTT values increased. And so yes, this is one of the many ideas that could use with Atlas or with whatever measurement system. So, of course it is open to suggestions and improvements and that is it. Thank you.
(Applause)
CHAIR: Thank you Claudio. Do we have any questions?
KLAUS: I wonder what kind of software have you used for the visualisation?
CLAUDIO SQUAROELLA: Google Toolkit and Java script, called Rafel, and there is this jabber port to use it with the framework so it's nothing more than Java translated into Java script.
CHAIR: Thank you very much. I am sure everybody has got their RIPE Atlas probes by now, so Robert is going to tell us how they work.
ROBERT KISTELEKI: Indeed. So, you have already heard that from Daniel about the why and the what is so I am going to talk about the how, a bit. We have many, many questions in the hallway how this is really working, so I am going to give you an update object that one. Diving into the details.
This is the overall system design that we had, what you can deduct from it, are we going to the individual components a little better, you see is a highly high ?? very beginning in the design phase we realised if you want to control potentially tens of thousands or even hundreds of thousands of these measurement units and want them to work together the best way of doing is to distribute the pieces of information down in a hierarchy as low as you can and each component do what they are supposed to. If you wanton combine into a single service in general to a cloud you are going to have a problem maintaining everything.
So, all of these components are actually heirarchially arrangement and they are connected in a secure way with mutual authentication. The reason for that is we don't want anyone to start posing as a component and just start to falsify measurements and so on. We would like to believe that all of these individually can be scaled up so if you have a big database it always can be bigger or user interface you can make it bigger but that should not affect any other components, whether that is in practicalities is really 100% sure we will have to see. But that is what we would like to believe. In different levels of the hierarchy using different kinds of data aggregation, so for example, if there is a measurement going on and what you want to do is you want to have five probes in the same AS and you want to know how that AS behaves, it's perfectly fine to aggregate the data coming from the probes to the AS level and then report it back up to the hierarchy, you don't need to spam all the top level components with every little detail. We also believe in order to be scaleable we have to restrict the pieces of data that each component knows so we apply this kind of need to know principle so that individual components all only about stuff that they need to know to do their job.
Looks like at individual components. We have some, all of these can be multiple but I am using the single one here. Registration servers. These are the trusted entry points for the probes so the probes know when you take?home home, they know nothing else but IP address and the key and they try to connect there. Those are the registration servers. In turn the registration servers pick a suitable controller, I will talk about them a little later but they are down here, based on a couple of properties. For example how close the probe is to a specific controller, ideally you want to control an individual measurement point like a probe from close so you don't have high latency when controlling them. Also, the controller should be not too busy because if it has already enough probes then you should pick another one, but in any case, the registration pick a strategy and controller and tell that to the probe so, so it can go and connect to that specific controller. Registration servers also have a low level overview of the system, they don't know what is examining on exactly but every couple of hours they get an update to say by the way that control is now full so don't assign any probes to them.
OK. We also have a central database which consists of a couple of sub databases if you will, there is a store for administrative purpose, for example we have, in this store, we have the full list of probes. If you want to act as a probe inside the RIPE Atlas framework you actually have to be listed there. And all the probes that you have now are listed in there so you can connect, that is fine. But no one should be able to act as a probe and start sending in false measurements because the system will not allow them to join for the first time. We have a measurement store, everything that ongoing, like what are the measurements we are doing, the sub results and so on that is natural. We would like to have a data store which is kind of the archive store, so data that flows out will be seperated into a large the Internet number resource database so it's queerable by time and prefix and AS number and whatever properties it has. And as Daniel mentioned, we want to roll out a so?called credit system so that you can gain credits and you can spend them on measurements, obviously that needs to be stored somewhere.
We are living in the 21st century, user interfaces are good, you want to interact with the system, we have one obviously, you can look at your own probe status and the measurement results and interact with the community forums, share ideas and so on, it's all built into the.
More interestingly, there is a component which we call the brain. The brain as you can imagine is responsible for high order functions. If we only looks at the verbs mentioned there, coordinate, process, draw conclusions, incorporate other sources, that is what the brain does. It doesn't deal with the nasty details of what is going on but it has a high level view and say this should be done, that should be done. And it interacts with the lower level components in order to do that. Of course, it needs ?? receives the information from the database because that is where things are sorted. So, what it really does is, if there is a measurement that should be done because some users said I want to do this the brain will say OK, should we do this, pick 100 probes from here and there, who can help me, that controller, that controller and it just tries to distribute this measurement. The controllers will report back measurement results to the brain, the brain processes them further or throws no the database or creates graphs in order to come up with actual results. More interestingly, it could actually draw conclusions based on what it receives, so especially together incorporate other sources of information it can react to what is going object. Example: If a user starts to do a measurement and saying what ?? where do I end up if I want to go to YouTube? OK, that is a fair question, the system is probably going to be able to answer that. Then another asks the same question and another so after the tenth or so you have a threshold, after that you start to wonder why are the individual people asking the same question. So you can like offer, heyk, you don't even need to do this because I have done this here are the results, would you want to wake them or you'd really want to do your measurement. Can alert to us say there is something going on, the crowd says something is going on with YouTube so you might want to look into that.
Also, incorporate services of information like BGP I think that the credit goes to ?? University of Washington about this with his Hubble idea where ?? if we get the feed of information from BGP, and we have enough measurement points around the world we can zoom in and see what is going on there. If question of hundreds or thousands of probes, we can actually do this so the bring can say something interesting is going on in, can you look into the detail. So it's an active reaction.
Controllers. Brain are connected to the controllers so it's one level lower and the controllers are the ones that talk to probes. If you plug in your probe, it will go to the registration server and then end up at the controller, stay there, and do whatever it needs to do. So in this sense the controllers actually control the probes hence the name. They also assign probes to measurement requests that are coming from the brain so they can say, you asked for 50 probes in US, I will go you 20 and the brain will say fine I will go to number control to get the other 30. The controller do this by looking at how busy the probes are where they are and so on. Also, collecting the results from the probes, that is easy, and regularly reporting to brains. All right.
Probes. These are the foot soldiers of course. They listen to these measurement commands so flowing down from the hierarchy and they execute those, when the controller says do a ping measure. To this host this many times for this long and report me every minutes something like that, that is what happens. They do report to the controller. Otherwise in other words, they just actually do what they are told. Also they have kind of built in measurements so if you plug in your probe at home now it will immediately start doing pinging first hop second hop you will see the results on your own personal page, but these are also I mentioned and reported back on to the controller.
Other stuff we have a way of remotely operating the probes so they do that, when needed. And they are trying to maintain their own state meaning if they are disconnected they try to reconnect and continue where they left off.
All right. Coming back to the big picture, we imagine we will be operating the highest level component so brains and web?UIs and the critical stores and so on. The controllers are very likely going to be operated by us, currently all of them are but it's not entirely possible we will go and develop some partnerships and say with can you run a controller for us in the APNIC region, if somebody says yes fine it's not in our control and but that is fine: Probes are not operated by us they are plugged in in your home and ISPs and so on. We received what is the actual device, it's export pro. If you go to the Atlas website you can click on the link and see all the facts sheet from the manufacturer. The beauty of this device is it has very low power usage, charged over USB so you can imagine the actual top power usage is two?and?a?half watts or something, so your TV if it's on standby it uses more. But it's not really powerful; it has eight megabits of RAM and 16 of flash, it's not gigabites, it's megabites. So you can imagine that it's pretty tight. What is a bit worse the it doesn't have a floating point unit or memory management unit so for those who knows what these means it's really difficult to get it right. If you are not careful you can run programmes on it and the probe will run out of memory and never recover. The only way to do something about it is to reboot the probe. Lucky enough relatively cheap. We squeezed it down to 2 or ?? 3 or 4. Within 15 seconds it is up and running again. On the negative side, it takes 30 seconds to join because it involves RSA or DS A or whatever key is used and that would be a bit faster but we don't have that. Once connected it's relatively cheap to maintain at that. We can remotely update, you can go home and if we build in a new feature it will be built out, we can enable the remote microfilm and ?? microphone and video camera and all that. Of course this is only about software. We have been thinking about remotely upgrading the hardware but once we can do that we will tell you. Security aspects:
I mentioned this before all the components are connected over secure channels, that is obvious nowadays. I have no idea what the traffic clusterisation process would do with our approach because we are connecting over SSH over port 443. Antonio might be able to tell us but that is fine. And awful the information exchange up or down so there was information flowing up and down the hierarchy all that have is happening inside this single connection so for those of you who are interested in the details, this is actually report for ?? inside the SSH channel, it just works. The beauty of this is that no services are exposed on any of the components outside this secure SSH connection and if you don't have the key or ?? there is virtually no attack surface here, here I should say that the attack service is small.
The probes have hard?wired trust material, that means the ones we gave to you they contain IP address and key of registration servers, we can run multiple, Anycast, it's all there. Whenever they talk to anyone, awful the trust information about who they should talk to for example, if you connect them to a controller, which is automatically happening, all that trust material is derived from this connection. Which is good. Also, that means that the probes don't have any open ports, nothing whatsoever. If you plug it in, it boots up, it connects to our system, over secure connection and then the information flows inside that connection, so virtually no attack surface on the probe itself. This also means it works fine with NATS, because outgoing connections are just fine. It also means that probe to probe measurements are difficult to do, we imagine that most of the probes will be deployed behind NATS and firewalls and all that, so that is not really what we are aiming for.
Just to confirm the probes don't listen to any kind of local traffic; it's safe to plug in at home. We are not snooping around. We are not not the CIA, even if we were we wouldn't tell you. If you were concerned, still, you are very welcome to plug it in on a switch or behind the firewall, just constrain the probes and it cannot really talk to your local network but we made the promise we don't do that, if that is not enough for you feel free to constrain. We also know that we will lose probes as we go along. Most of the time ?? some will actually die; we have a commitment from the manufacturer that they will not die within two years but they didn't say anything about after two years, so something is going to happen there. But more to the point, some more adventurous and people will disassemble it, I want to see what is inside and reprogramme it and that is fine. We would like you not to do that but we cannot stop you so it's just a fact of life that we will lose probes. We would like to believe that the amount of lost probes will be low enough, but only time will tell. In any case, if you disassemble, if you hack or do anything with your probe that is physically in your pro session, you will not get any kind of information that could be used to attack other probes because there is no shared keys, no common information beyond the software. I am trying not to say feel free but if you did it you won't gain too much.
OK. Many, many people asked about IPv6, in general we fully support IPv6 but going down to the details we don't. The problem is ?? yeah, so the problem is there is a single component which is crucial on the token and that is the SSH client, I can tell you that SSH is v4 only only the change looks as it does v6, it doesn't, but the good news is that as time goes on along we can upgrade the firmware and as soon as the client is capable of doing v6 we will be OK.
So for now, that means that you have to run v4 in order to run the probe. Sorry for that.
AUDIENCE SPEAKER: We do v6 measurements?
SPEAKER: Did I mention that? No, I didn't. V4 is only needed for the control infrastructure, even though all ?? virtually all the components that we run today do v6 but the probes don't do it, so they correct to v4 but we do v6 measurements, if you plug in your probe we do pings to K?root and something like that, as long as your network supports it of course. Also, as you could see, the probe doesn't really have a user interface on it, right? Unless you understand the blinking lights, it's difficult to understand; even if you did, there is no way to tell anything to it. That is going to be possible via the user interface of course. For now, there is no way you can set the IPv4 or IPv6 address beyond someone suggested we should have small dip switches at the back, about 128 of them is just too much. So we decided not to do that. For now, DHCP is a must for IPv4, RA is needed for v6, therefore in a serve room environment you may not have v6 ?? you cannot plug it in, you can but it will not do you any good. We have ideas how to solve this so we imagine in the next couple of months we will roll out feature that enables you to get other this problem. You probably have to connect your probe to the infrastructure from a place that actually has the DHCP and you can tell the probe the neck time it should do something else. And that was I think it because of time constraints. I would love to tell you a whole lot more maybe next time.
CHAIR: And hopefully have some results.
SPEAKER: Beautiful results.
CHRIS BUCKRIDGE: Chris Buckridge from RIPE NCC and a couple of questions on the Jabber channel. One is from Wolfgang Tremmell: To prevent hacking the probe, why don't you provide firmware for download?
SPEAKER: Happens via the trusted channel so the registration server tells the probe you should be looking for a firmware and it should have this kind of signature so even if you would be able to push down a different firmware from ours to the probe, the probe will just refuse it because the trust material says that this is the ultimate and it comes from the right place. So, we are ?? we have measuresor protect against that.
CHRIS BUCKRIDGE: He just provides a correction, he said firmware sources meant.
ROBERT KISTELEKI: The point is, I could go into the details but it doesn't matter where the probes gets the firmware from but I can tell you it gets it from the hierarchy itself so there is no way you can feed it or hand feed it but it doesn't matter where it comes from; the probe with actually verify that it's authentic it and it will refuse if it isn't.
AUDIENCE SPEAKER: I am guessing he means the source code.
ROBERT KISTELEKI: That is a different question, sorry. I think that goes back to a question on Monday, where we ?? will we open source the whole thing or not. OK. I don't know at this point. Honestly. It's going to be a decision, Daniel also mentioned that. We may or may not, depending on ?? maybe you want to answer it.
DANIEL KARRENBERG: RIPE NCC. Not any time soon. Because quite frankly, we are not proud of how it looks right now. Robert has told you what the constraints are. That is one reason. The other reason is that we have a concern is, if we do this, that it would dilute our efforts. The worst thing that can happen right now is someone taking this investment from the RIPE NCC and commercialising it in a competing product. That would quite likely kill any such initiatives from the RIPE NCC for the future. So I am not going to risk that and I am ?? but, the ?? what happens, like in six or nine months' time is a totally different question, and like it or not, this was developed with the money from the RIPE NCC membership to start with, once we have established it, we have sponsors and we have a little bit more experience in how this goes, we are very open to input and requirements from the sponsors and the RIPE NCC membership, that is the RIPE community and this Working Group on what we should do, but making hasty decisions in this area is not a good idea.
CHAIR: The speaker at the back.
AUDIENCE SPEAKER: ISC. You mentioned that the ?? there was no memory management unit and at that it might lock up, which isn't really much of a problem if we need to reboot it but how will this work, will we get an e?mail ??
AUDIENCE SPEAKER: Watchdog.
ROBERT KISTELEKI: You don't even notice it I can tell you that. Within 30 seconds it comes back to the measurement network, within a minute or so in total. You may not even miss a measurement point. But just to expand on that, we have probes which are up and running for more than a week now. Without reboots.
(Applause)
CHRIS BUCKRIDGE: Another question on Jabber, this time from Ruben, private Internet citizen. He says given the need and to avoid the station that a misbehaving device sits more months without action how fast could provide updates in case of misbehaviour or security issues?
ROBERT KISTELEKI: That is about development cycles, I have a better answer to that, we can disable probes, that is easy, we can disable individual probes if they misbehave and I think the terms and conditions say that; if we see that, we can do that. But devising a firmware is like depending on how many changes we want to do. Deployment can be done within days if needed.
DANIEL KARRENBERG: As addition, we have two test beds that we use and actually before updating anything in the production network which is widely distributed we do thorough testing, just to add
RICHARD BARNES: Just a couple of comments. I wanted to putting the security head, I wanted to commend you on the design of this, I think you have put a lot of thought, it's quite well designed. Along the sort of same lines and echoing, I think this could be benefit opening up some of the interfaces so people can build stuff that feeds into this network. I think that maintains broadens the scope of sources of information, it wants the measurement network to grow and you have that cryptographic structure that you can authorise, we can talk about details off line, I think in general that is a good thing and good path to have in mind.
The other thing I wanted to suggest I appreciate that you need a brain to keep track of things and to advance things but I wanted to suggest maybe we don't want the brain to be a too smart. I think the major value of this ?? we don't want to be assimilated here. I think the major value of this thing is the volume of raw data increases and I think we want to open that up and share that up to the community as much as possible without ?? we can do analyses on top that have but I think it's important to maintain that separation between the data and the analyses and along the same spirit, you have the interface box on your diagram, I think mostly that is focused right now on graphical interfaces and user interfaces, I think it would be if we got programme attic that people could use.
ROBERT KISTELEKI: In actual fact you see or may not, there is a data store which I mentioned, that does not have really a web interface or some interface to access the data, web UI is for controlling the system and seeing stuff.
AUDIENCE SPEAKER: Two comments, I am one of the people who wanted to do this in IPv6 only networks. And obviously you need to do some work before that ?? it's operational but in addition the SSH stuff I think you have to change or add something about DNS discovers. So you can find DNS server, maybe RA.
ROBERT KISTELEKI: RA is supported as such. So we already get that. If you supply through RA we pick it up. Currently it doesn't help.
ROBERT KISTELEKI: I have been told I didn't understand your question.
AUDIENCE SPEAKER: You have to support RA ??
AUDIENCE SPEAKER: No vendor ?? no vendor ??
ROBERT KISTELEKI: Take this off line.
AUDIENCE SPEAKER: The other comment I had I want to echo about this openness, I think it would be useful to open this up at least in some way so others could perhaps put in interesting stuff on top of this, your decision of course, but it's kind of choice between doing your interesting research versus working with the rest of the research community on other interesting topics as well. I think there is lots of things that could be done, you know.
ROBERT KISTELEKI: That is not about that. Lots of interesting things could be done about that, as you have heard Daniel it's not yet and if it will happen later that is up to decision.
CHAIR: We have got one more item we want to fit in so we have to finish on time. We will take two questions we have got here and move onto the next item.
AUDIENCE SPEAKER: First of all, just say thanks for doing this, it's really great. I like it a lot, it's really, really cool. Second, we are open source company, we believe in that a lot and just in order to make security research just have a look at it and make sure there is no hidden vulnerabilities in it. So, we want this to succeed and see it open source and I have had a chat with my boss about it and I have gotten approval for sponsoring it provide that had open source it.
ROBERT KISTELEKI: That is called blackmail.
AUDIENCE SPEAKER: Or incentive. That is it. Thanks.
CHAIR: At the back.
ROBERT KISTELEKI: We definitely want to do a security audit but that is something different than open sourcing it.
AUDIENCE SPEAKER: John Quarterman, I am a commercial hat on and I think API is indeed key, Richard Barnes and the other speaker and in particular the localisation issue that Richard Barns was raising in the other session, this data could solve that you would have to drop one of those devices in the network you wanted everybody to know where it is and says you wanted the data used for that. Problem solved. If you had an API where any organisation that organisation wanted to produce the algorithms to man the data to do that, could do that. There are other applications like that that RIPE will never want to do directly but plenty of organisations could.
ROBERT KISTELEKI: Yes.
AUDIENCE SPEAKER: Adrian from Canada. We had some interest in having some of these devices in our network but we have gotten some feedback that you are not as interested in data from North America, would you mind clarifying that or setting the record straight or ??
ROBERT KISTELEKI: I can try. We are focusing on the RIPE NCC service region, but as you can already see if you go to the website, that we do accept registrations from anywhere around the world and we have active probes in the US now so ?? yes, three of them, yes, and another one of them got stuck with DHL. Anyway, the point is that we would like to concentrate on Europe and surrounding areas so our service regional in general, but that doesn't mean we don't want to have probes up and running in different networks. Maybe you misunderstood or maybe we gave you the not so acceptable answer but we do want to have probes and we will have probes in different regions. Maybe not yet.
AUDIENCE SPEAKER: Google. Do you support power over Internet?
ROBERT KISTELEKI: We have been looking at that and the short answer is no.
CHAIR: Thank you.
ROBERT KISTELEKI: I can elaborate but I don't think I have the time. Let's talk off line.
CHAIR: OK. Thank you very much. I and Richard Christian regularly talk in preparation for meeting and one thing has been exercising us having some kind of portal or measurement framework. This is something that is interesting the RIPE NCC as well, Daniel is going to give a very short overview of what he sees as possible framework and then Christian is going to make a proposal for a discussion through the group.
DANIEL KARRENBERG: My name is Daniel Karrenberg, I am the chief scientist at the RIPE NCC, and, first of all, let me say that I am very, very happy that this Working Group has reinvented itself and I can tell you that the three co?chair people that you have here are very active and are forward thinking people who actually engage, i.e., draw people into rooms and corners and I like it, it's very good. I didn't say anything more, did I?
OK. I also have take answer little bit more responsibility for the operational side of the measurement stuff that the RIPE NCC does and what I wanted to share with you very briefly is my personal vision on where we are going with the RIPE NCC activities and then we can discuss all sorts of other things.
Let me first say just steal 20 seconds about this open source business. And API business. I think we have to discuss this here and we have to discuss it openly and I actually value all the comments here. We have to be put it in the right framework. What kind of ?? what pieces of the system do we want open sourced, to what pieces of the system do we want APIs and things like that. So we need to have a rational discussion about that. And let me start by saying that actually APIs to the data store and having people do new kind of analysis on the data and so on is absolutely something that we want to do from day one. My vision of the RIPE NCC measurement activities is that we do a couple of things:
First of all, we will strive to have all our products available as raw data streams, so that actually you can go and say, you know, for RIS, TTM, DNSMON, and Atlas, you can get near realtime, as near realtime as we can make it, raw data, and do your own analysis. And of course we have to have a discussion like what the stuff that came up on Monday, how does it help the bad guys? We have some, for instance, for non?subscribers, some delays in DNSMON and that is just to prevent the bragging thing so where the high school kids go and launch a DDoS on the latest BotNet and then brag to their friends, see what I did, so these kinds of things we need to discuss but in principal, I want all the products be there in raw and be as realtime as possible and so that people can run their own analysis.
Sometimes, the amount of data is very big so I also want to support that you bring your code to us and run the analysis locally if that is more feasible so APIs and plug?ins and things like that for data analysis, yes let's start that discussion now, no problem.
The question I interpreted some of the questions about open sourcing the probe and so on, such that what kind of ?? people wanting to do specialised types of measurements on the probe, yes let's discuss that, maybe open sourcing the probe is not even necessary for that, but it's something that we ?? we can come back to the original one, since this was so ?? such an onslaught I think I need to answer it ?? is if you want to do certain measurement come talk to us and we might implement it.
I already talked a little bit to the realtime stuff and so on, and in which framework do we want to do our products in the future? I said realtime, what we really want top built NetSense foundations we have here which at the moment is just a toolbox, we want to build that into something much more useful which I call it sort of commune platform where you not only can get at the raw data, you can also get at some analysis and visualisations that we provide and some that other people come up with and contribute. That is the first thing.
The second thing is also add a possibility to actually discuss the results, so what we had in DNSMON was some of the DNSMON customers came up in the easterly' days and said we'd like to have a possibility to actually annotate the results; for instance, if one of the servers is down and shows in bright red we would like to put a note on it and say this was planned maintenance so everybody sees that. And I would like to expand on that idea and have an environment where you can highlight some measurements and some visualizations, some statistic at analysis or whatever, some raw data and say hey, I saw this, I think it means that or I saw that and what does it mean. And there can be others in the same community who can then go and say oh maybe this is related to that, and do this in a very convenient way. Also have things like once a threat like this has developed, to republish it somewhere, put it into a RIPE Labs article, tweet it whatever, the method du jour is to communicate, and my aspiration here is, I think that is the aspiration we should all have, in a career or two's time when in Europe something operationally relevant happens or in the RIPE region something operationally happens it's discussed in that community with those methods and not on NANOG mailing list, nothing against that but I think it's kind of archaic. So that is my vision about things.
The other is we should also not limit this to data that is produced by the RIPE NCC but we should also get data from anywhere and my little quib about this is we are interested in data, data about the Internet, that is us, as long as it has an IP address in it, an AS number and a time element, we want this data and to have access to it, we have done a couple of activities you might have noticed, the data repository which Emile has been running and things like that. I don't want to go into the details. Already marching in this direction. And I think Christian has some more ideas on this.
CHRISTIAN KAUFMAN: So, one step back and see if I can be faster than your 20 seconds. When the new Working Group was coming up we basically were thinking about what actually can we do different. Now we have a bigger scope and, well even more people, so what is actually the thing we want to achieve, and one of the things which we are actively clearly relatively fast were that we wanted to have, that is at least the vision from the Working Group chair so far, we wanted to have a platform, which is not just showing the RIPE NCC tools, because when you actually look around you see a lot of research projects, there are even some companies who public some data, and then there is Daniel was saying, instead of just having 20 different portals and websites where you can look up the stuff and they are not related to each other in any way, wouldn't it be a great thing if you actually would have a possibility to look that up on just one site, see the data is probably correlated, even crossing or overlapping, see if they actually show the same thing, or at least expand so that you are not just have a European view or a view about http, certain protocols, certain countries, that you have more or less a global view.
So I guess from a Working Group perspective, the question to the community is, is that something interesting for you, is that something you want to see and do you think that this would be helpful and you like? So that we actually can go back to the NCC and say please not just use your own portal, speak to different people, companies, institutions who have these data and would be happy to incorporate them. Questions? Comments on that? Are you guys still awake? We are not leaving the room before ?? not at least one or two has answered. We can have a mandate here or feedback, no, that is a bad idea or that is boring or.
AUDIENCE SPEAKER: Try it out and see how it works.
CHRISTIAN KAUFMAN: The feedback ?? that is kind of a feedback. Well, I guess that is the idea but is that first of all something other people share as well, because end of the day to be honest, if we say yes, that is portal we want to have and we give the NCC, well, not really a little push because they are already going to in that direction but we try and steer them and tell them what we want, that needs some manpower and has ?? this is end of the day taken from our membership fees, your membership fees. So do you agree with the kind of ?? well, vision and idea we have so far that this would be helpful? Not just for realtime purposes, also for basically historic parts and trends to see where that whole thing goes and as Daniel mentioned probably combined with a WIKI or block so whenever something comes up, some anomaly or strange behaviour of the Internet that you can comment on it or people can actually say how they see it from their perspective.
AUDIENCE SPEAKER: This project is bigger than you think it is. Once you get beyond the project you are going to need income beyond right fees, you are going to need economic input from outside to do it.
CHRISTIAN KAUFMAN: That might be probably true. Right now, the question is ?? well, we also have the problem how many data we get and all the privacy concerns and some other technical problems obviously, but again, the question is is that something which would be helpful, which people would like. Let's do that different. Who thinks that this is a bad idea someone want to raise their hand.
AUDIENCE SPEAKER: Just to sort of put this in concrete terms, what we are talking about a giant database with as many pings as we can collect across the Internet. Is there anybody that thinks it's a good idea, it seems like there is massive privacy implications, it would be handy, you can do topology inferences and things like that but there is lots of issues ranging from data storage to privacy to what this reveals about people's networks, does this cause offence to anybody or cause people to get excited or how does the room feel?
DANIEL KARRENBERG: Can I try; what I for fought to mention, I got derailed by the open source issue, a little bit, and so I forgot to say something very, very important: What I ?? changed also like to do and I will do, in the RIPE NCC's behaviour, is that we will be much more incremental in the way we do things, so we will start something and we will make a small step and we will expose it to you, show you what we are doing, see whether we are on the right vector just something someone said, we don't need pushing but maybe some steering, then look at where we are going and do this sort of the inference of like a couple of weeks to a month and then see where we are going so the suggestion I think by John to let's see it, let's try it out and see where we are going is very much how I would like this to happen and I think what I'm seeing here, maybe I am misinterpretationing, but like, yeah, we don't really know what you mean, show us ?? show us something and then we will say whether we like it and I'd like to go there and I see some people nodding so that helps me a lot.
AUDIENCE SPEAKER: Do it.
CHRISTIAN KAUFMAN: See, that was a comment. Two people raising their hand. Thanks. Well, I guess in that case ?? oh, sorry, question, please.
AUDIENCE SPEAKER: Dave Wilson, HEANET, it's a comment. I don't really know what you mean by the look of what you are doing because it's ?? to my great regret in the last few years I have seen a number of different projects in different areas of the industry, academia, nonprofit, commercial, all doing somewhat overlapping but not really overlapping things, to do with measurements and there has been no real way to get those together and I think that finally, this looks like someone might be putting together some form of framework where we might be able to do that.
CHRISTIAN KAUFMAN: I was probably a little bit cryptic to save the time. One reason why we think we will have success with that part, collecting the data is because often when you have a research project or when you get the data from different sources then people, you know, are very secretive about their data and don't want to really share them but the point is, or the idea here is that you have ?? in this case, the NCC and use them as a trusted and neutral organisation. And I think that is what we hope is that this makes a difference. So you can actually collect data and bring them together which you probably wouldn't have got before. But yes, I certainly agree with your point.
GEOFF HOUSTON: APNIC. This is research, it's not a product marketing demonstration, and part of this thing is tell us what it will do and the shape of it and the colour. You can't. Right. This is research. We actually don't understand what the outcomes are until you try it. The second thing of this is, it's just ping and trace route. There is no great secretive science here, it's tools that all of us use, all of the time when things don't happen the way we expect. That is all that is going on. There is no great secrets here or no great wonderful packet format of hell going on in these boxes. It's not. It's really simple stuff. What the real sort of potential here is putting a lot of these things out there and looking at the aggregate of outcomes. You know that is the magic going on if there is any, so it's in the analysis. But I don't know what is going to happen and I am not sure anyone really does understand what the product will be at the end of this, that, you know, in this kind of world when you are looking at research rather than product marketing material, you have to take some degree of risk going I am not sure what it is but it looks interesting. And you know from my point of view I am really not sure what it is but hell, it looks really interesting.
CHRISTIAN KAUFMAN: Thanks, apparently you are much better in summarising it than I. So thanks for that.
CHRIS BUCKRIDGE: Quick comment on the Jabber, Ruben thanking maybe people want to opt out from the data set.
AUDIENCE SPEAKER: Maybe people want to opt out from the data set.
CHRIS BUCKRIDGE: What he said.
RICHARD BARNES: That is one of the things that we will have to keep track of as this things evolves but it sounds like there is interests. Thank you for your feedback and it sounds like we have had some consensus to move forward and do something we can show the Working Group next time. At this point we are going to wrap up, I want to thank the speakers for their time and thank everyone for coming to the meeting. I think we had some discussion here. I wanted to make sure everyone is aware we do have Working Group web page, we have a mailing list. Can I see a show of hands, how many people are on the mailing list right now. OK. That is a small fraction. So everyone go sign up the mailing list, that is the URL, because that is where these discussions will happen, if you are interested what you heard today and staying plugged in, please do join. Daniel.
DANIEL KARRENBERG: Consumer warning: This is really going to happen on the mailing list. We are going to at least for the RIPE NCC, stuff the RIPE NCC does, we are going to change from the RIPE meeting to RIPE meeting operation to a much more frequent exposure of what we are doing and wanting direction, so if you are not on the mailing list, you will miss out on that and you will be surprised at the next RIPE meeting. Of course Miriam, I am sorry Miriam, we will once ?? once we reach certain stages, I mean major stages, we will document them on labs in a bit more redacted and reflected form than mailing lists usually take.
RICHARD BARNES: This mailing list been really quiet since the last RIPE meeting as these things get started it's going to get a lot more active and it will be the venue we get community feedback for discussing these issues. I would like to thank you for your time and tolerating us a little bit over time. But thanks again and we will see you in Amsterdam.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE