Path: utzoo!news-server.csri.toronto.edu!cs.utexas.edu!sdd.hp.com!usc!rpi!dali.cs.montana.edu!milton!hht@sarnoff.com
From: hht@sarnoff.com (Herbert H. Taylor   x2733)
Newsgroups: sci.virtual-worlds
Subject: Report from David Sarnoff Research Center.
Message-ID: <17993@milton.u.washington.edu>
Date: 7 Mar 91 07:30:32 GMT
Sender: hlab@milton.u.washington.edu
Organization: David Sarnoff Research Center, Princeton, NJ
Lines: 114
Approved: cyberoid@milton.u.washington.edu


  We are interested in exploring NOW what VR might be like for the
general computing world in another ten years - much as Xerox Park gave
us a vision in the 70's of what computing would be like in the 80's
and 90's. And although the system we describe here is very expensive,
we believe that in another ten years it will be a typical system. In
fact at SIGGRAPH someone pointed out that the typical computer of the
year 2000 will have 1G of RAM, operate at a 1GOP, have 1G I/O, etc. -
our system exceeds that performance now.

  We have developed a Video Supercomputer (aka the Princeton Engine)
which can continuously process multiple simultaneous streams of video
input and output. When we originally conceived the machine in 1983 we
intended it to be used for simulating in continuous real-time,
proposed digital television receivers.  However, it has since found a
happy home in a number of research fields including algorithms for
HDTV, data compression, neural nets, image pyramids, scientific data
and volume visualization and hopefully, VR. For example, we would like
to combine the processing power of the Princeton Engine with high
frame rate, high resolution displays - to create and manage a virtual
world built from "real" elements. To date all applications on the
Princeton Engine exploit in some way either real-time input or output
and usually both.  The following ascii-gram sumarizes the
architecture.
  8Bit A to D's                                        9 Bit DAC's
  (48 bits input)   _____________________________      (64 bits output)
  Video In 1 ---->|    The Princeton Engine     |----> Video Out 1 (R)
                  |  2048x16BIT SIMD/DSP Procs  |----> Video Out 2 (G)
  Video In 2 ---->|                             |----> Video Out 3 (B)   
                  |  o Processor Architecture   |
  Video In 3 ---->| - Seven data paths          |----> Video Out 4 (R)
 (Optional D1/D2) | - Mpy and Alu               |----> Video Out 5 (G)
                  | - NN & Cut Through IP Comm  |----> Video Out 6 (B)
  Video In 4 ---->| - 144 Bit Wide Inst Word    | /|\
                  | - 64 3-Port Register File   |  |
  Video In 5 ---->| - 1GigaByte Video Rate Ram  | OUTPUT
                  | - Hardware LUT              | Clocked At 28, 56 MHZ
  Video In 6 ---->|_____________________________|----> Video Out 7 (D1/D2)
             /|\                /|\               /|\
   INPUT      |                  |                 |
   Sampled at 14,28,56,81MHZ     |        D1/D2 Clocked at 13.5/14MHZ
                      Instruction Clock at 14MHZ

  The Princeton Engine is a SIMD architecture (ala CM2 and MassPar)
comprised of up to 2048 16bit DSP processors. It differs from those
machines in several respects including the ability to continuously
perform video rate I/O, flowing the video transparently through the
array of processors. The "front-end" is comprised of six Analog to
Digital converters while the "back-end" is comprised of seven D to
A's.  Alternatively, any of the analog inputs or outputs can be
substituted with a digital D1/D2 interface. All 13 video data streams
are independent of the instruction stream. With very little overhead
any or all of the six video input streams can be directed to processor
local memory based frame buffers. Video streams can then be "fused" or
individually processed.
   
 "Video" Data Glove
 ------------------
 By positioning camaras (including IR camaras) spatially around the
virtual participant it will be possible to achieve a "whole body" to
virtual world interaction which is not possible with a physical data
glove. To our knowledge, this concept has never been tried in VR
because of the inordinate amount of video processing required - but it
can be done utilizing the Princeton Engines unique video processing
power. In the Princeton Engine up to six simultaneous real-time video
input streams are possible. There is very little computational
overhead to process multiple video streams nor to "fuse" them with
artificial world data providing the "real" impression of a hand or
body within the virtual world. We would like to hear opinions on how
such a whole body interface would effect the design of the physical
data glove. Is the data glove still required? If so, how will it
differ from present designs. By having the "interface" in a sampled
video format, image processing algorithms such as filters, edge and
motion detectors can be applied, enhancing the transparancy of the
fusion of "real" into the virtual world.

 The Princeton Engine provides a degree of interaction with scientific
data in the HDTV framework which is not possible via other computing
resources. In fact, it should be possible to "walk through" complex
data without any perception of the latancy found in present systems.
This walk through world will likely include a variety of high
resolution rendered objects in data views with which the scientists,
mission planners and commanders can directly interact. It should be
possible to virtually "grab hold" of critical data - much as one uses
a marking pen to highlight text in a reference document and perhaps
perform the "virtual" equivalent of cut and paste.

 Video Windows
 -------------
  One could further envision within the "Virtual World" a 2D high
resolution display or perhaps a window onto the "real" world, for
example, camaras at strategic remote locations could direct live video
back to the Princeton Engine host. This "live" video is then projected
into the virtual world participants window. The "live" video window
might be coupled to a lower frame rate networked video communication
channel. Alternatively, one could envision a "television" within the
virtual world which VR participants can "switch" to a variety of
channels. This ability to integrate video into the virtual world will
be valuable to a number of applications. 

 Status
 ------
 Two Princeton Engines have been in operation since 1988. This spring
three more will be added - one of which will be placed at NIST under
DARPA sponsorship - for the High Definition systems program. Although
the VR program at DSRC is just getting started at a minimum level (but
with several PE's to play with...) we still hope to demonstrate some
of the major ideas using the present video environment this year. We
have already demonstrated, for example, scenarios for multiple video
I/O channels, "fusing" an IR source with a monochromatic source while
driving multiple high resolution displays.

p.s Last Thursday was the 100th Birthday of our Founder, David Sarnoff.