
The Marshall Symposium: Technology Demonstrations: Farnam Jahanian
|
Farnam Jahanian: During the next few moments, I hope to share with you several of the challenges that we faced in the development and deployment of the UARC system of the Internet, particularly focusing on the network-related issues. To do this, let me start by sharing with you several of the requirements of the UARC system, from the user's point of view. A key requirement of the system is the need to link together hundreds of instruments and data sources distributed across the globe to scientists who are also distributed across North America, South America as well as Europe. A related requirement is to enable everyone in the world, anyone in the world, to essentially participate via the Web. Regardless of what is your favorite browser and who wins what lawsuit, you can walk to a browser and essentially access the UARC collaboratory by essentially issuing a URL command. Another key requirement of the system is to offer access to scientific data and software tools in what we refer to as virtual rooms. These virtual rooms, of course, enable real-time collaboration and real-time interaction. To build a system, such a system over the Internet that satisfies these requirements, we face a number of technical challenges. What I hope to do in the next five minutes or so is to share with you three of these challenges. One of the main sources of complexity in building such a system had to do with geographical distribution and scalability of the system. Let's first of all ask the question, why a distributed system? There are really three related reasons. One reason has to do with the natural physical separation of scientific instruments, archival data sources, as well as the scientists who, as I mentioned, are distributed across the globe. The second reason for such a wide distribution has to do with availability requirements on the system. We do not want to allow the system to suffer a catastrophic failure if there was a single failure in the system - for example, failure of a single machine or a single link. I remind you of a recent example of the Galaxy satellite failure a couple of weeks ago which, as the result of a single failure, which was the satellite, 80 to 90 percent of the pagers in this country essentially became inaccessible. Furthermore, it took several days to overcome that failure and also it required repositioning of dishes, or satellite dishes, on-ground dishes, 1,500 of them, manually. What we did not want to do was to essentially make the system to be prone to a single point of failure. The third reason for distribution concerns performance and scalability. Scalability is somewhat of a loaded term. What we mean by scalability is we want to be able to allow more instruments to be brought into the system. During the history of the UARC system, we allowed integrations of tens and tens of new instruments into the system, particularly during the last year. Also allowed hundreds, perhaps thousands, of scientists to participate in various multiple virtual rooms simultaneously. We wanted to be able to do all of this without a significant aggregation in the performance of the system. The second technical challenge involves the variability that we experience in our networking infrastructure. I think Doug and Vint will agree with me when I say that there are many dirt roads that lead into the information superhighway. This gap will continue to widen. In fact, what we see is that the Internet is experiencing highly variable bandwidth, latency and loss characteristics, and the prediction is that the gap will continue to widen as the use of hand-held devices becomes more pervasive and the use of mobile and wireless networks becomes more common. So, what's the implication? What is the impact of the variability in the networking infrastructure on our collaboratories? To understand this, let me just highlight how the Web works, or how we interact with the Web. What we usually do is, we sit at a PC, or a work station, we type in a URL that goes to some Web server, and eventually some page comes back or some image comes back. What I want to point out is this interaction with the Web server or with the network really does not impact anyone else. The fact that I may get up after about 30 seconds or one minute and leave the building or leave my room and go do something else has very little impact on others. However, when you look at a collaborative model as we have - a number of scientists that are sitting around in this so-called virtual room and require real-time shared access to these instrument data and archival data sources - we must provide acceptable quality of service to all participants. Let me illustrate that by an example. Consider a scenario where you have a data server, or perhaps it's a radar instrument data that's coming in, and this data is going to go to a number of clients who are participating in some scientific collaboration. If one of the clients has a high-capacity link to the network, then this image, for example - well, this probably wouldn't come from a satellite - this image of the balanced rock from Arizona would in fact be able to be transported over a high-capacity link and, with 100 percent fidelity, it would require 275,000 bytes of data to be transmitted over the network. That's roughly about 80 or 100 pages of text, I think. Now, if we have other clients in the system that are connected to the network, however they do not have good network connectivity, or the network is suffering transient congestion, and suffering packet losses, we in fact may be able to degrade the quality of this image and send the image over this medium-capacity link or medium-capacity network to the client, and this 70 percent degraded, or 70 percent fidelity, image would require only 25,000 bytes. That's an order of magnitude less. Furthermore, if we have a congested link over the network, such that many packets may get dropped - this is one of those dirt roads that I was referring to - we may be able to, in fact, degrade the quality of the image by about 90 percent and transmit the image requiring only nine k-bytes of data. Let me just go back and zip through this once again and show you the degradation in the quality. This is 70 percent, which is fairly acceptable, and this, which is not unacceptable, is about 90 percent degradation. Finally, the third technical challenge that I would like to share with you has to do with the exploitation of Java, which is an emerging technology. Many of you may have heard the word Java. As some of you know, it refers to a new programming language that has gained widespread use, particularly in the Internet community. How Java has changed our lives can be described very briefly in the following terms. The way we used to interact with others over computers, over a network link, was we used to be able to exchange multimedia documents over the network. These documents include a text, video and audio and so on. What Java enables us to do is not only exchange multimedia documents but also exchange tools and programs that manipulate or allow the manipulation of these documents. What's the big deal here? The big deal is that, in fact, if you look at this, NASA used to transport code through the Apollo system 20 years ago. In fact, as recently as the Mars Pathfinder mission, some code had to be transmitted to the Mars Pathfinder after about three days into the mission itself, to fix a bug that was reported. What is different here is a paradigm shift. And this paradigm shift has happened as a result of combining the Internet with the Java technology. Just to highlight a couple of examples, we're able to send data across the network and send a program, also with that data, that at a client workstation will allow that client to render that data. We can download ready-to-run application from a remote Web site, conduct transactions directly using those applications. In a collaborative environment, we're able to access a virtual room from our offices, using a workstation, or perhaps very soon, on a laptop, on an airplane, independent of the browser of choice. Let me wrap up by highlighting several challenges of next-generation collaboratories. These have to do with seamless integration of same-time and different-time collaboration. They have to do, they include dramatically more scaleable collaborations, where we're talking about orders and orders of magnitude, larger numbers of participants, security and access controlled, and finally wireless and mobile networks. I'll leave you with one final thought. Internet2 will be an important vehicle for experimental development, deployment and evaluation of next-generation collaboratories. Thank you.
|