Path: utzoo!utgpu!news-server.csri.toronto.edu!rpi!zaphod.mps.ohio-state.edu!think.com!mintaka!ogicse!milton!hlab From: cdshaw@cs.UAlberta.CA (Chris Shaw) Newsgroups: sci.virtual-worlds Subject: Re: More on VR Architectures [LONG] Message-ID: <1991Apr6.182031.22494@milton.u.washington.edu> Date: 6 Apr 91 00:00:00 GMT Article-I.D.: milton.1991Apr6.182031.22494 Sender: hlab@milton.u.washington.edu (Human Int. Technology Lab) Organization: University of Alberta, Edmonton, Canada Lines: 312 Approved: cyberoid@milton.u.washington.edu In article <1991Apr5.030424.16993@milton.u.washington.edu> Herb Taylor writes: >Sutherland's vision ... (as reported by Fred Brooks) > >1. Display as a window into a virtual world. >2. Improve image generation until the picture in the window looks real. >3. Computer maintains world model in real time. >4. User directly manipulates virtual objects. >5. Manipulated objects move realistically. >6. Immersion in virtual world via head-mounted display. >7. Virtual World also sounds real, feels real. There's a lot of Brooks in this statement. Sutherland's 1965 IFIP Congress paper basically says "We can simulate anything but taste & smell". Anyway, I quote Sutherland's introduction here.. "We live in a physical world whose properties we have come to know well through long familiarity. We lack corresponding familiarity with the forces on charged particles, forces in nonuniform fields, the effects of nonprojective geometric transformations, and high-inertia, low friction motion." Sutherland then goes on to talk about simulating these things to higher and higher degrees of realism. The key point here is SIMULATE. You can't take a picture of something that doesn't actually exist. What do you point your camera at? Even more to the point, almost all of Scientific Visualization is conducted simply because the phenomenon being simulated is too difficult to observe. Other phenomena are impossible to observe with a camera. Design is based on the process of creation followed by critique. A CAD system helps the creation, and simulation helps you provide the critique. If you're designing something, you can't take a picture of the design until you've built it. Anyway, simulation is usually cheaper. All of the "canonical VR applications" have simulation as a key component. >It is not so much the physical apparatus ("head-mounted") as the effect of >total "immersion" which is critical to VR. Yes. One example is numerous projection TVs showing the world model on all sides, sort of like the 360 degree movie at Disney Land. This has the advantage of not encumbering the user with head-mounted cables, etc. The disadvantage is that everything's on the wall, there's limited stereopsis, and so on. Still, a valid approach at the outset. Talk to Myron Kreuger at UConn and to Bryan Lewis' group at IBM TJ Watson Labs for this kind of stuff. > It was my contention that there is nothing in VR which requires that >worlds be polygonally based. That is a choice made specifically to >meet the above criteria - not fundamental to the experience. Yes and no. If you allow other types of synthetic surfaces (bicubic patches, implicit surfaces, voxels, etc), then you get an obvious yes. Otherwise one is left with camera imagery. >I further concluded that other approaches to the world processing function >would require less floating point operation. I don't see what is the point of avoiding floating point. If one avoids floating point because one's box can't do floating point, then change your box. >telerobotics or teleoperation are considered worthy subfields of >Virtual Reality? Well, I think that telerobotics is telerobotics. There are probably 5-10 papers worth mentioning (not counting repeats) in the field of Virtual Reality, but there are numerous telerobotics and teleoperation journals. Which is a subfield of which? I prefer the narrower use of the term "Virtual Reality", which essentially means "Highly Interactive 3D Simulation". In any case, what's Virtual about telerobotics? The operator is in some sense virtual, not the environment. But really, this is a semantic argument. > Other examples of non polygonal approachs would include the famous >video based Aspen driving simulation. Does this qualify as a VR? Again, no. What if you wanted to see Aspen on a cloudy day? Or anything that's not a canned image? Well, you have to go back to Aspen & get more footage! >IF a system can provide a completely user directed experience of driving in >a "virtual" city (in this case modeled on a real one) It WAS a real city! It took about a month to shoot, and although they had stacks of images, you could still only enter each store in a certain way, and so forth. >Chris Shaw does seem to consider E&S simulators to >be VR - is that only because they use graphics? It's because they are *simulation*. >the DVI based Palenque walk-through .. it would not be difficult to configure >the system with a head mounted display (criteria #6) and to track body >motion to provide a true sense of walking through the ancient ruins. Ignoring the lag issue, so far so good. >Perhaps the most difficult of Sutherlands criteria involves the >requirement for manipulated objects to move realistically under the >control of the user (criteria # 4 and 5). These constraints cannot be met with canned video. The communication is one-way in this case. VR systems have two-way communication. Hence the teleoperation distinction. >The resolution of the head mounted display (in addition to frame >rate and latency) has been identified as A MAJOR LIMIT TO VR. Look. Unless you're talking < 1/60th second lag, latency has nothing to do with the display technology. Latency arises from the other system components, such as the head tracker technology, the computers driving the trackers, and the renderers drawing the scene. (Or to be more general, the video source drawing the scene). Secondly, the "frame rate" limitation is not display refresh (fixed at 60Hz), but display update. In a CGI system, display update rate depends on scene complexity -- the number of polygons/NURBS/voxels you have to process. For video, of course, display update rate equals display refresh rate. Thirdly, the display resolution on the VPL's EyePhone is very low. 320x240 max. The source of this problem, as Herb Taylor rightly points out, is the LCD TV's that they use. You have to expand your fonts to 2-3 times a "normal" readable size for text to be readable in an EyePhone. >If all you want is to redirect your SGI polygon world output to the head >display, fine, no problem. Good. That's what I want. Curse my blinkered stick-in-the-mud pig-ignorance. >** What do you need MPEG for??? > I can only say that we have been approached by a number of groups >about the feasibility of networked VR (and other applications) with a >heavy emphasis on data compression. My incredulousness was based on inherent silliness of sending computer generated images around, given that these images were generated in real time, and could be generated by the recipient in real-time. A much better approach is to send the model. It's a much better use of communication and computational resources, and you can compress the polygonal models when you send them, too. You also get the behaviour description of the model, and you can customize it if you wish. MPEG would probably be useful for sending images, but it ain't realtime, you still have to pay for the extra bandwidth, and you're stuck with somebody's canned image. >Chris asks: >** Where's the interaction? > > For telerobotic applications a remotely located, variable frame >rate camera (or multiple cameras) sends a 3D burst back to the VR. Excuse me? What's a "3D burst"? >This burst can be captured in virtual world managed frame buffers. >There are two obvious modes of user interaction either via continuous >teleoperation (at low resolution) or by traversing the captured 3D >"volume" of video data. Huh? Can Herb tell us what algorithm he's talking about here? Granted, given my naivete about image processing, maybe you can do this if you have enough cameras. But I am quite doubtful that you can do full 3D reconstruction of 3D phenomena on-the-fly with video, which is what Herb is essentially claiming. >** Can you change your point of view? > To the extent that the system has been configured to support either >of the two modes previously described. Be specific. This doesn't answer the question. >If 3D cameras are used then in principal one could change point of view >within a captured video volume. Arbitrary point of view. 3D cameras? In principle? What are you talking about here? This sounds as far-off and high-priced as the direct neural interface. This also looks an awful lot like a proof by claim. CGI can give you arbitrary point of view in practice, right now, at under $100,000. >The transmitted video contains an entire video >volume into which you navigate. What's a "video volume"? How many cameras do you need? If you get a video volume of my house, do you have to go into all my cupboards? What if I want to move the furniture? What if I want to check the state of my chimney? Look under the carpets? Now, it may seem that these are smart-ass questions, but I'm serious. What kind of work do we need to do to create a "video volume"? I'm guessing that you need to do a 6D scan of the entire volume of the house, including the hidden surfaces. 3D is position, and 3D is orientation. My point is, that in absence of such an exhaustive scan, you're going to have to accept some loss in position and orientation control. Strictly speaking, it's a trade-off between static visual realism and dynamic visual realism. The more you want static visual realism, the more appealing the video approach is. The more you want dynamic visual realism, the more you want CGI. My basic point is that beyond stringing together a certain set of canned still shots to simulate motion, the video approach fails because the dynamic realism is lacking. Flaw number two is that you can't see hidden surfaces, even if you want to. The fatal drawback, however, is the cost of putting something like this together. I can't imaging a more tedious task than scanning a room in 3D. Of course, the CGI drawbacks are obvious. The images aren't real, and producing them in real time depends on fast hardware. The more polygons you have, the faster the machine needed to maintain a given update rate. >** Can you change the experiment as it runs? >within the limits of continuous teleoperation and if the timeframe of >the experiment is long compared to human response times, then yes. In other words, if you have true teleoperation, then the answer is yes. But, how much it will cost to build a teleoperation system that will allow you to alter arbitrary experimental parameters? Given that system, is it a general enough tool to be used for any experiment? My intuition tells me that the costs for the basic tool will be astronomical. In any case, Herb's answer dodges the real issue. If you can simulate it, the time scale of the phenomenon is of no consequence, you slow it down to suit. If you *must* do the experiment, then canned video is probably the way to go. But the problem is that the video is all you get. THERE IS NO MODEL. >Alternatively, within a captured volume of data you can change >visualization controls (such as opacity) while you walk through the >data. I don't understand this point. Are we doing video, or are we doing voxel stuff? If we're doing voxel image processing, then we're in the realm of graphics. (Strictly speaking, no polygons though) >** The situation you describe allows N camera views at pre-programmed >** locations. If you want a new view that your camera(s) didn't get, you >** have to run the experiment again. > > Not exactly correct. Again, within the continuous video volume >there is sufficient information to construct a new and entirely >arbitrary point of view of the experiment. What is the proof of this somewhat surprising claim? Having "sufficient information" doesn't buy you anything! For example one has "sufficient information" to solve the satisfying assignment problem, but it still could take exponential time in the number of variables to solve! (The satisfying assignment problem requires that you find an assignment to the variables of a product-of-sums boolean equation that makes the equation true. The problem is NP-complete.) >** There's nothing wrong with this, but it ain't virtual reality, because the >** level of interaction is severely limited. > >If level of interactivity defines VR then this system can be VR... Hardly. I get 1 interaction per second typing on an ASCII terminal over a modem. Is this VR? > Herb Taylor Clearly, this could degenerate into a stupid argument over semantics, and that's not something I'm really interested in following up. So, I'll summarize what I think VR is, and leave it at that. Herb has a radically different view of what VR is than I do. Herb gives the impression that video is a suitable replacement for all Computer Generated Imagery in a Virtual Reality system. I do not agree. I also don't know anybody else who would agree. While video may be a useful adjunct to a virtual reality system, it lacks the fundamental property of arbitrary real-time view position and orientation control. CGI gives you this as part of the package. In fact, I would call a remote-controlled camera system such as Herb mentions a teleoperation or telepresence system. Why? Because the simulation component doesn't exist, because the view cannot be arbitrarily controlled, and because the operator cannot manipulate arbitrary objects, given a reasonable-cost system. The teleoperation distinction is important also for the types of technology that you need. Teleoperation relies on robotics if any view change is to be possible. Computer Graphics and Robotics have much in common, but they are also quite different. Similarly, VR and Teleoperation have common ground, but only on the operator side. On the side being operated, the difference is that VR is a virtual space, and teleoperation is a real space. I have to wonder why Herb insists on calling his work Virtual Reality when it so clearly is not. I think that the Princeton box is solution looking for a problem. This is why HDTV, video compression, MPEG, etc keep coming up. The HDTV box wants to become a VR box. This will be a truly Procrustean effort. Ten years from now, things may be different, but I can wait. -- Chris Shaw University of Alberta cdshaw@cs.UAlberta.ca Now with new, minty Internet flavour! CatchPhrase: Bogus as HELL ! [MODERATOR'S NOTE: Thanks to both Chris and Herb for a stimulating and provocative discussion, without degenerating into name calling. (Is it okay if someone calls themselves a "smart-ass"? What humility!) Please don't stop if there's more to be said. We're all learning a lot, from two different but indicative points of view. -- Bob J., now wafting over the Arctic....]