These notes are extracted from a document, written in 1993,
converted to HTML by Chris Hand
with the permission of the author Jerry Isdale.
I. What is Virtual Reality
The term Virtual Reality (VR) is used by many different people with many
meanings. There are some people to whom VR is a specific collection of
technologies, that is a Head Mounted Display, Glove Input Device and Audio.
Some other people stretch the term to include conventional books, movies
or pure fantasy and imagination. The NSF taxonomy mentioned in the introduction
can cover these as well. However, my personal preference, and for purposes
of this paper, we restrict VR to computer mediated systems. The best definition
of Virtual Reality I have seen to date comes from the book "The Silicon
"Virtual Reality is a way for humans to visualize, manipulate and interact
with computers and extremely complex data"
The visualization part refers to the computer generating visual, auditory
or other sensual outputs to the user of a world within the computer. This
world may be a CAD model, a scientific simulation, or a view into a database.
The user can interact with the world and directly manipulate objects within
the world. Some worlds are animated by other processes, perhaps physical
simulations, or simple animation scripts. Interaction with the virtual
world, at least with near real time control of the viewpoint, in my opinion,
is a critical test for a 'virtual reality'.
Some people object to the term "Virtual Reality", saying it is an oxymoron.
Other terms that have been used are Synthetic Environments, Cyberspace,
Artificial Reality, Simulator Technology, etc. VR is the most common and
sexiest. It has caught the attention of the media.
The applications being developed for VR run a wide spectrum, from games
to architectural and business planning. Many applications are worlds that
are very similar to our own, like CAD or architectural modeling. Some applications
provide ways of viewing from an advantageous perspective not possible with
the real world, like scientific simulators and telepresense systems, air
traffic control systems. Other applications are much different from anything
we have ever directly experienced before. These latter applications may
be the hardest, and most interesting systems. Visualizing the ebb and flow
of the world's financial markets. Navigating a large corporate information
I.1. Types of VR Systems
A major distinction of VR systems is the mode with which they interface
to the user. This section describes some of the common modes used in VR
I.1.1. Window on World Systems (WoW)
Some systems use a conventional computer monitor to display the visual
world. This sometimes called Desktop VR or a Window on a World (WoW). This
concept traces its lineage back through the entire history of computer
graphics. In 1965, Ivan Sutherland laid out a research program for computer
graphics in a paper called "The Ultimate Display" that has driven the field
for the past nearly thirty years.
"One must look at a display screen," he said, "as a window through which
one beholds a virtual world. The challenge to computer graphics is to make
the picture in the window look real, sound real and the objects act real."
[quoted from Computer Graphics V26#3]
I.1.2. Video Mapping
A variation of the WoW approach merges a video input of the user's silhouette
with a 2D computer graphic. The user watches a monitor that shows his body's
interaction with the world. Myron Kruger has been a champion of this form
of VR since the late 60's. He has published two books on the subject: "Artificial
Reality" and "Artificial Reality II". At least one commercial system uses
this approach, the Mandala system. This system is based on a Commodore
Amiga with some added hardware and software. A version of the Mandala is
used by the cable TV channel Nickelodeon for a game show (Nick Arcade)
to put the contestants into what appears to be a large video game.
I.1.3. Immersive Systems
The ultimate VR systems completely immerse the user's personal viewpoint
inside the virtual world. These "immersive" VR systems are often equipped
with a Head Mounted Display (HMD). This is a helmet or a face mask that
holds the visual and auditory displays. The helmet may be free ranging,
tethered, or it might be attached to some sort of a boom armature.
A nice variation of the immersive systems use multiple large projection
displays to create a 'Cave' or room in which the viewer(s) stand. An early
implementation was called "The Closet Cathedral" for the ability to create
the impression of an immense environment. within a small physical space.
The Holodeck used in the television series "Star Trek: The Next Generation"
is afar term extrapolation of this technology.
Telepresence is a variation on visualizing complete computer generated
worlds. This a technology links remote sensors in the real world with the
senses of a human operator. The remote sensors might be located on a robot,
or they might be on the ends of WALDO like tools. Fire fighters use remotely
operated vehicles to handle some dangerous conditions. Surgeons are using
very small instruments on cables to do surgery without cutting a major
hole in their patients. The instruments have a small video camera at the
business end. Robots equipped with telepresence systems have already changed
the way deep sea and volcanic exploration is done. NASA plans to use telerobotics
for space exploration. There is currently a joint US/Russian project researching
telepresence for space rover exploration.
I.1.5. Mixed Reality
Merging the Telepresence and Virtual Reality systems gives the Mixed Reality
or Seamless Simulation systems. Here the computer generated inputs are
merged with telepresence inputs and/or the users view of the real world.
A surgeon's view of a brain surgery is overlaid with images from earlier
CAT scans and real-time ultrasound. A fighter pilot sees computer generated
maps and data displays inside his fancy helmet visor or on cockpit displays.
I.1.6. Fish Tank Virtual Reality
The phrase "fish tank virtual reality" was used to describe a Canadian
VR system reported in the 1993 InterCHI proceedings. It combines a stereoscopic
monitor display using LCD Shutter glasses with a mechanical head tracker.
The resulting system is superior to simple stereo-WoW systems due to the
motion parallax effects introduced by the head tracker. (see INTERCHI '93
Conference Proceedings, ACM Press/Addison Wesley , ISBN 0-201-58884-6)
I.2. VR Hardware
There are a number of specialized types of hardware devices that have been
developed or used for Virtual Reality applications.
I.2.1. Image Generators
One of the most time consuming tasks in a VR system is the generation of
the images. Fast computer graphics opens a very large range of applications
aside from VR, so there has been a market demand for hardware acceleration
for a long while. There are currently a number of vendors selling image
generator cards for PC level machines, many of these are based on the Intel
i860 processor. These cards range in price from about $2000 up to $6 or
$10,000. Silicon Graphics Inc. has made a very profitable business of producing
graphics workstations. SGI boxes are some of the most common processors
found in VR laboratories and high end systems. SGI boxes range in price
from under $10,000 to over $100,000. The simulator market has produced
several companies that build special purpose computers designed expressly
for real time image generation. These computers often cost several hundreds
of thousands of dollars.
I.2.2. Manipulation and Control Devices
One key element for interaction with a virtual world, is a means of tracking
the position of a real world object, such as a head or hand. There are
numerous methods for position tracking and control. Ideally a technology
should provide 3 measures for position(X, Y, Z) and 3 measures of orientation
(roll, pitch, yaw). One of the biggest problem for position tracking is
latency, or the time required to make the measurements and preprocess them
before input to the simulation engine.
The simplest control hardware is a conventional mouse, trackball or
joystick. While these are two dimensional devices, creative programming
can use them for 6D controls. There are a number of 3 and 6 dimensional
mice/trackball/joystick devices being introduced to the market at this
time. These add some extra buttons and wheels that are used to control
not just the XY translation of a cursor, but its Z dimension and rotations
in all three directions. The Global Devices 6D Controller is one such 6D
joystick It looks like a racket ball mounted on a short stick. You can
pull and twist the ball in addition to the left/right & forward/back
of a normal joystick. Other 3D and 6D mice, joystick and force balls are
available from Logitech, Mouse System Corp. among others.
One common VR device is the instrumented glove. The use of a glove to
manipulate objects in a computer is covered by a basic patent in the USA.
Such a glove is outfitted with sensors on the fingers as well as an overall
position/orientation tracker. There are a number of different types of
sensors that can be used. VPL (holders of the patent) made several DataGloves,
mostly using fiber optic sensors for finger bends and magnetic trackers
for overall position. Mattel manufactured the PowerGlove for use with the
Nintendo game system, for a short time. This device is easily adapted to
interface to a personal computer. It provides some limited hand location
and finger position data using strain gauges for finger bends and ultrasonic
position sensors. The gloves are getting rare, but some can still be found
at Toys R' Us and other discount stores. Anthony Clifton recently posted
this suggestion for a" very good resource for PowerGloves etc.: small children.
A friend's son had gotten a glove a couple years ago and almost NEVER used
it, so I bought it off the kid. Remember children like money more than
toys they never use."
The concept of an instrumented glove has been extended to other body
parts. Full body suits with position and bend sensors have been used for
capturing motion for character animation, control of music synthesizers,
etc. in addition to VR applications.
I.2.3. Position Tracking
Mechanical armatures can be used to provide fast and very accurate tracking.
Such armatures may look like a desk lamp (for basic position/orientation)
or they may be highly complex exoskeletons (for more detailed positions).
The drawbacks of mechanical sensors are the encumbrance of the device and
its restrictions on motion. Exos Systems builds one such exoskeleton for
hand control. It also provides force feedback. Shooting Star system makes
a low cost armature system for head tracking. Fake Space Labs and LEEP
Systems make much more expensive and elaborate armature systems for use
with their display systems.
Ultrasonic sensors can be used to track position and orientation. A
set of emitters and receivers are used with a known relationship between
the emitters and between the receivers. The emitters are pulsed in sequence
and the time lag to each receiver is measured. Triangulation gives the
position. Drawbacks to ultrasonics are low resolution, long lag times and
interference from echoes and other noises in the environment. Logitech
and Transition State are two companies that provide ultrasonic tracking
Magnetic trackers use sets of coils that are pulsed to produce magnetic
fields. The magnetic sensors determine the strength and angles of the fields.
Limitations of these trackers are a high latency for the measurement and
processing, range limitations, and interference from ferrous materials
within the fields. However, magnetic trackers seem to be one of the preferred
methods. The two primary companies selling magnetic trackers are Polhemus
Optical position tracking systems have been developed. One method uses
a ceiling grid LEDs and a head mounted camera. The LEDs are pulsed in sequence
and the cameras image is processed to detect the flashes. Two problems
with this method are limited space (grid size) and lack of full motion
(rotations). Another optical method uses a number of video cameras to capture
simultaneous images that are correlated by high speed computers to track
objects. Processing time (and cost of fast computers) is a major limiting
factor here. One company selling an optical tracker is Origin Instruments.
Inertial trackers have been developed that are small and accurate enough
for VR use. However, these devices generally only provide rotational measurements.
They are also not accurate for slow position changes.
I.2.4. Stereo Vision
Stereo vision is often included in a VR system. This is accomplished by
creating two different images of the world, one for each eye. The images
are computed with the viewpoints offset by the equivalent distance between
the eyes. There are a large number of technologies for presenting these
two images. The images can be placed side-by-side and the viewer asked
(or assisted) to cross their eyes. The images can be projected through
differently polarized filters, with corresponding filters placed in front
of the eyes. Anaglyph images user red/blue glasses to provide a crude (no
The two images can be displayed sequentially on a conventional monitor
or projection display. Liquid Crystal shutter glasses are then used to
shut off alternate eyes in synchronization with the display. When the brain
receives the images in rapid enough succession, it fuses the images into
a single scene and perceives depth. A fairly high display swapping rate
(min. 60hz) is required to avoid perceived flicker. A number of companies
made low cost LC shutter glasses for use with TVs (Sega, Nintendo, Toshiba,
etc.). There are circuits and code for hooking these up to a computer available
on many of the On-line systems, BBSs and Internet FTP sites mentioned later.
However, locating the glasses themselves is getting difficult as none are
still being made or sold for their original use. Stereographics sells a
very nice commercial LC shutter system called CrystalEyes.
Another alternative method for creating stereo imagery on a computer
is to use one of several split screen methods. These divide the monitor
into two parts and display left and right images at the same time. One
method places the images side by side and conventionally oriented. It may
not use the full screen or may otherwise alter the normal display aspect
ratio. A special hood viewer is placed against the monitor which helps
the position the eyes correctly and may contain a divider so each eye e
sees only its own image. Most of these hoods, such as the one for the V5
of Rend386, use fresnel lenses to enhance the viewing. An alternative split
screen method orients the images so the top of each points out the side
of the monitor. A special hood containing mirrors is used to correctly
orient the images. A very nice low cost (under $200) unit of this type
is the Cyberscope available from Simsalabim.
I.2.5. Head Mounted Display (HMD)
One hardware device closely associated with VR is the Head Mounted Device
These use some sort of helmet or goggles to place small video displays
in front of each eye, with special optics to focus and stretch the perceived
field of view. Most HMDs use two displays and can provide stereoscopic
imaging. Others use a single larger display to provide higher resolution,
but without the stereoscopic vision.
Most lower cost HMDs ($3000-10,000 range ) use LCD displays, while others
use small CRTs, such as those found in camcorders. The more expensive HMDs
use special CRTs mounted along side the head or optical fibers to pipe
the images from non-head mounted displays. ($60,000 and up). A HMD requires
a position tracker in addition to the helmet. Alternatively, the display
can be mounted on an armature for support and tracking (a Boom display).
I.2.6. Health Hazards from Stereoscopic Displays
There was an article supplement with CyberEdge Journal issue #17 entitled
"What's Wrong with your Head Mounted Display". It is a summary report on
the findings of a study done by the Edinburgh Virtual Environment Lab,
Dept. of Psychology, Univ. of Edinburgh on the eye strain effects of stereoscopic
Head Mounted Displays. There have been a number of anecdotal reports of
stress with HMDs and other stereoscopic displays, but few, if any, good
clinical studies. This study was done very carefully and the results are
a cause for some concern.
The basic test was to put 20 young adults on a stationary bicycle and
let them cycle around a virtual rural road setting using a HMD (VPL LX
EyePhone and a second HMD LEEP optic equipped system). After 10 minutes
of light exercise, the subjects were tested...
"The results were alarming: measures of distance vision , binocular
fusion and convergence displayed clear signs of binocular stress in a significant
number of the subjects. Over half the subjects also reported symptoms of
such stress, such as blurred vision."
The article goes on to describe the primary reason for the stress -
the difference between the image focal depth and the disparity. Normally,
the when your eyes look at a close object they focus (accommodate) close
and also rotate inward (converge). When they accommodate on a far object,
the eyes also diverge. However, a stereoscopic display does not change
the either the effective focal plane (set by the optics) and the disparity
depth. The eyes strain to decouple the signals.
The article discusses some potential solutions, but notes that most
of them (dynamic focal/disparity) are difficult to implement. It mentions
monoscopic HMDs only to say that while they would seem to avoid the problems,
they were not tested. The article does not discuss non-HMD stereoscopic
devices at all, but I would extrapolate that they should show some similar
problems. The full article is available from CyberEdge Journal for a small
There has been a fair bit of discussion ongoing in the sci.virtual-worlds
newsgroup (check the Sept./Oct. 93 archives) about this and some other
studies. One contributor, Dipl.-Ing. Olaf H. Kelle, University of Wuppertal,
Germany, reported only 10% of his users showing eye strain. His system
is setup with a focal depth of 3m which seems to be a better, more comfortable
viewing distance. Others have noted that long duration monitor use often
leads to the user staring or not blinking. It is common for VDT users to
be cautioned to look away from the screen occasionally to adjust their
focal depth and to blink. Another contributor, John Nagle provided the
following list of other potential problems with HMDs: electrical safety,
Falling/tripping over real world objects, simulator sickness (disorientation
due to conflicting motion signals from eyes and inner ear), Eye Strain,
Induced post-HMD accidents ("some flight simulators some flight simulators,
usually those for military fighter aircraft, it's been found necessary
to forbid simulator users to fly or drive for a period of time after flying
I.3. Levels of VR Hardware Systems
The following defines a number of levels of VR hardware systems. These
are not hard levels, especially towards the more advanced systems.
I.3.1. Entry VR (EVR)
The 'Entry Level' VR system takes a stock personal computer or workstation
and implements a WoW system. The system may be based on an IBM clone (MS-DOS/Windows)
machine or an Apple Macintosh, or perhaps a Commodore Amiga. The DOS type
machines (IBM PC clones) are the most prevalent. There are Mac based systems,
but few very fast rendering ones. Whatever the base computer it includes
a graphic display, a 2D input device like a mouse, trackball or joystick,
the keyboard, hard disk & memory.
I.3.2. Basic VR (BVR)
The next step up from an EVR system adds some basic interaction and display
enhancements. Such enhancements would include a stereographic viewer (LCD
Shutter glasses) and a input/control device such as the Mattel PowerGlove
and/or a multidimensional (3D or 6D) mouse or joystick.
I.3.3. Advanced VR (AVR)
The next step up the VR technology ladder is to add a rendering accelerator
and/or frame buffer and possibly other parallel processors for input handling,
etc. The simplest enhancement in this area is a faster display card. For
the PC class machines, there are a number of new fast VGA and SVGA accelerator
cards. These can make a dramatic improvement in the rendering performance
of a desktop VR system. Other more sophisticated image processors based
on the Texas Instruments TI34020 or Intel i860 processor can make even
more dramatic improvements in rendering capabilities. The i860 in particular
is in many of the high end professional systems. The Silicon Graphics Reality
Engine uses a number of i860 processors in addition to the usual SGI workstation
hardware to achieve stunning levels of realism in real time animation.
An AVR system might also add a sound card to provide mono, stereo or
true 3D audio output. Some sound cards also provide voice recognition.
This would be an excellent additional input device for VR applications.
I.3.4. Immersion VR (IVR)
An Immersion VR system adds some type of immersive display system: a HMD,
a Boom, or multiple large projection type displays (Cave).
An IVR system might also add some form of tactile, haptic and touch
feedback interaction mechanisms. The area of Touch or Force Feedback (known
collectively as Haptics) is a very new research arena.
I.3.5. Cockpit Simulators
A common variation on VR is to use a Cockpit or Cab compartment to enclose
the user. The virtual world is viewed through some sort of view screen
and is usually either projected imagery or a conventional monitor. The
cockpit simulation is very well known in aircraft simulators, with a history
dating back to the early Link Flight Trainers (1929?). The cockpit is often
mounted on a motion platform that can give the illusion of a much larger
range of motion. Cabs are also used in driving simulators for ships, trucks,
tanks and 'battle mechs'. The latter are fictional walking robotic devices
(i.e. the Star Wars films). The BattleTech location based entertainment
(LBE) centers use this type of system.
I.3.6. SIMNET, Defense Simulation Internet
One of the biggest VR projects is the Defense Simulation Internet. This
project is a standardization being pushed by the USA Defense Department
to enable diverse simulators to be interconnected into a vast network.
It is an outgrowth of the Defense Advanced Research Projects Administration
(DARPA) SIMNET project of the later 1980s. SIMNET was/is a collection of
tank simulators (Cab type) that are networked together to allow unit tactical
training. Simulators in Germany can operate in the same virtual world as
simulators in the USA, partaking of the same battle exercise.
The basic Distributed Interactive Simulation (DIS) protocol has been
defined by the Orlando Institute for Simulation & Training. It is the
basis for the next generation of SIMNET, the Defense Simulation Internet
(DSI). (love those acronyms!) An accessible, if somewhat dark, treatment
of SIMNET and DSI can be found in the premier issue of WIRED magazine (January
1993) entitled "War is Virtual Hell" by Bruce Sterling.
I.4. Available VR Software Systems
There are currently quite a number of different efforts to develop VR technology.
Each of these projects have different goals and approaches to the overall
VR technology. Large and small University labs have projects underway (UNC,
Cornell, U.Rochester, etc.). ARPA , NIST, National Science Foundation and
other branches of the US Government are investing heavily in VR and other
simulation technologies. There are industry supported laboratories too,
like the Human Interface Technologies Laboratory (HITL) in Seattle and
the Japanese NTT project. Many existing and startup companies are also
building and selling world building tools (Autodesk, IBM', Sense8, VREAM).
There are two major categories for the available VR software: toolkits
and authoring systems. Toolkits are programming libraries, generally for
C or C++ that provide a set of functions with which a skilled programmer
can create VR applications. Authoring systems are complete programs with
graphical interfaces for creating worlds without resorting to detailed
programming. These usually include some sort of scripting language in which
to describe complex actions, so they are not really non-programming, just
much simpler programming. The programming libraries are generally more
flexible and have faster renders than the authoring systems, but you must
be a very skilled programmer to use them.
I.5. Aspects of A VR Program
Just what is required of a VR program? The basic parts of the system can
be broken down into an Input Processor, a Simulation Processor, a Rendering
Process, and a World Database. All these parts must consider the time required
for processing. Every delay in response time degrades the feeling of 'presence'
and reality of the simulation.
I.5.1. Input Processes
The Input Processes of a VR program control the devices used to input information
to the computer. There are a wide variety of possible input devices: keyboard,
mouse, trackball, joystick, 3D & 6D position trackers (glove, wand,
head tracker, body suit, etc.). A networked VR system would add inputs
received from net. A voice recognition system is also a good augmentation
for VR, especially if the user's hands are being used for other tasks.
Generally, the input processing of a VR system is kept simple. The object
is to get the coordinate data to the rest of the system with minimal lag
time. Some position sensor systems add some filtering and data smoothing
processing. Some glove systems add gesture recognition. This processing
step examines the glove inputs and determines when a specific gesture has
been made. Thus it can provide a higher level of input to the simulation.
I.5.2. Simulation Process
The core of a VR program is the simulation system. This is the process
that knows about the objects and the various inputs. It handles the interactions,
the scripted object actions, simulations of physical laws (real or imaginary)
and determines the world status. This simulation is basically a discrete
process that is iterated once for each time step or frame. A networked
VR application may have multiple simulations running on different machines,
each with a different time step. Coordination of these can be a complex
It is the simulation engine that takes the user inputs along with any
tasks programmed into the world such as collision detection, scripts, etc.
and determines the actions that will take place in the virtual world.
I.5.3. Rendering Processes
The Rendering Processes of a VR program are those that create the sensations
that are output to the user. A network VR program would also output data
to other network processes. There would be separate rendering processes
for visual, auditory, haptic (touch/force), and other sensory systems.
Each renderer would take a description of the world state from the simulation
process or derive it directly from the World Database for each time step.
I.5.3.1. Visual Renderer
The visual renderer is the most common process and it has a long history
from the world of computer graphics and animation. The reader is encouraged
to become familiar with various aspects of this technology.
The major consideration of a graphic renderer for VR applications is
the frame generation rate. It is necessary to create a new frame every
1/20 of a second or faster. 20 frames per second (fps) is roughly the minimum
rate at which the human brain will merge a stream of still images and perceive
a smooth animation. 24fps is the standard rate for film, 25fps is PAL TV,
30fps is NTSC TV. 60fps is Showscan film rate. This requirement eliminates
a number of rendering techniques such as raytracing and radiosity. These
techniques can generate very realistic images but often take hours to generate
Visual renderers for VR use other methods such as a 'painter's algorithm',
a Z-Buffer, or other Scanline oriented algorithm. There are many areas
of visual rendering that have been augmented with specialized hardware.
The Painter's algorithm is favored by many low end VR systems since it
is relatively fast, easy to implement and light on memory resources. However,
it has many visibility problems. For a discussion of this and other rendering
algorithms, see one of the computer graphics reference books listed in
a later section.
The visual rendering process is often referred to as a rendering pipeline.
This refers to the series of sub-processes that are invoked to create each
frame. A sample rendering pipeline starts with a description of the world,
the objects, lighting and camera (eye) location in world space. A first
step would be eliminate all objects that are not visible by the camera.
This can be quickly done by clipping the object bounding box or sphere
against the viewing pyramid of the camera. Then the remaining objects have
their geometry's transformed into the eye coordinate system (eye point
at origin). Then the hidden surface algorithm and actual pixel rendering
The pixel rendering is also known as the 'lighting' or 'shading' algorithm.
There are a number of different methods that are possible depending on
the realism and calculation speed available. The simplest method is called
flat shading and simply fills the entire area with the same color. The
next step up provides some variation in color across a single surface.
Beyond that is the possibility of smooth shading across surface boundaries,
adding highlights, reflections, etc.
An effective short cut for visual rendering is the use of "texture"
or "image" maps. These are pictures that are mapped onto objects in the
virtual world. Instead of calculating lighting and shading for the object,
the renderer determines which part of the texture map is visible at each
visible point of the object. The resulting image appears to have significantly
more detail than is otherwise possible. Some VR systems have special 'billboard'
objects that always face towards the user. By mapping a series of different
images onto the billboard, the user can get the appearance of moving around
I need to correct my earlier statement that radiosity cannot be used
for VR systems due to the time requirements. There have recently been at
least two radiosity renderers announced for walkthrough type systems -
Lightscape from Lightscape Graphics Software of Canada and Real Light from
Atma Systems of Italy. These packages compute the radiosity lighting in
a long time consuming process before hand. The user can interactively control
the camera view but cannot interact with the world. An executable demo
of the Atma product is available for SGI systems from ftp.iunet.it (220.127.116.11)
in the directory ftp/vendor/Atma.
I.5.3.2. Auditory Rendering
A VR system is greatly enhanced by the inclusion of an audio component.
This may produce mono, stereo or 3D audio. The latter is a fairly difficult
proposition. It is not enough to do stereo-pan effects as the mind tends
to locate these sounds inside the head. Research into 3D audio has shown
that there are many aspects of our head and ear shape that effect the recognition
of 3D sounds. It is possible to apply a rather complex mathematical function
(called a Head Related Transfer Function or HRTF) to a sound to produce
this effect. The HRTF is a very personal function that depends on the individual's
ear shape, etc. However, there has been significant success in creating
generalized HRTFs that work for most people and most audio placement. There
remains a number of problems, such as the 'cone of confusion' wherein sounds
behind the head are perceived to be in front of the head.
Sound has also been suggested as a means to convey other information,
such as surface roughness. Dragging your virtual hand over sand would sound
different than dragging it through gravel.
I.5.3.3. Haptic Rendering
Haptics is the generation of touch and force feedback information. This
area is a very new science and there is much to be learned. There have
been very few studies done on the rendering of true touch sense (such as
liquid, fur, etc.). Almost all systems to date have focused on force feedback
and kinesthetic senses. These systems can provide good clues to the body
regarding the touch sense, but are considered distinct from it. Many of
the haptic systems thus far have been exo-skeletons that can be used for
position sensing as well as providing resistance to movement or active
I.5.3.4. Other Senses
The sense of balance and motion can be served to a fair degree in a VR
system by a motion platform. These are used in flight simulators and some
theaters to provide some motion cues that the mind integrates with other
cues to perceive motion. It is not necessary to recreate the entire motion
perfectly to fool the mind into a willing suspension of disbelief.
The sense of temperature has seen some technology developments. There
exist very small electrical heat pumps that can produce the sensation of
heat and cold in a localized area. These system are fairly expensive.
Other senses such as taste, smell, pheromone, etc. are beyond our ability
to render rapidly and effectively. Sometimes, we just don't know enough
about the functioning of these other senses.
I.6. World Space
The virtual world itself needs to be defined in a 'world space'. By its
nature as a computer simulation, this world is necessarily limited. The
computer must put a numeric value on the locations of each point of each
object within the world. Usually these 'coordinates' are expressed in Cartesian
dimensions of X, Y, and Z (length, height, depth). It is possible to use
alternative coordinate systems such as spherical but Cartesian coordinates
are the norm for almost all applications. Conversions between coordinate
systems are fairly simple (if time consuming).
I.6.1. World Coordinates
A major limitation on the world space is the type of numbers used for the
coordinates. Some worlds use floating point coordinates. This allows a
very large range of numbers to be specified, with some precision lost on
large numbers. Other systems used fixed point coordinates, which provides
uniform precision on a more limited range of values. The choice of fixed
versus floating point is often based on speed as well as the desire for
a uniform coordinate field.
I.6.2. A World Divided: Separation of Environments
One method of dealing with the limitations on the world coordinate space
is to divide a virtual world up into multiple worlds and provide a means
of transiting between the worlds. This allows fewer objects to be computed
both for scripts and for rendering. There should be multiple stages (aka
rooms, areas, zones, worlds, multiverses, etc.) and a way to move between
I.7. World Database
The storage of information on objects and the world is a major part of
the design of a VR system. The primary things that are stored in the World
Database (or World Description Files) are the objects that inhabit the
world, scripts that describe actions of those objects or the user (things
that happen to the user), lighting, program controls, and hardware device
I.7.1. Storage Methods
There are a number of different ways the world information may be stored:
a single file, a collection of files, or a database. The multiple file
method is one of the more common approaches for VR development packages.
Each object has one or more files (geometry, scripts, etc.) and there is
some overall 'world' file that causes the other files to be loaded. Some
systems also include a configuration file that defines the hardware interface
Sometimes the entire database is loaded during program startup, other
systems only read the currently needed files. A real database system helps
tremendously with the latter approach. An Object Oriented Database would
be a great fit for a VR system, but I am not aware of any projects currently
The data files are most often stored as ASCII (human readable) text
files. However, in many systems these are replaced by binary computer files.
Some systems have all the world information compiled directly into the
Objects in the virtual world can have geometry, hierarchy, scripts, and
other attributes. The capabilities of objects has a tremendous impact on
the structure and design of the system. In order to retain flexibility,
a list of named attribute/values pairs is often used. Thus attributes can
be added to the system without requiring changes to the object data structures.
These attribute lists would be addressable by name (i.e. cube.mass =>
mass of the cube object). They may be a scalar, vector, or expression value.
They may be addressable from within the scripts of their object. They might
be accessible from scripts in other objects.
An object is positionable and orientable. That is, it has a location and
orientation in space. Most objects can have these attributes modified by
applying translation and rotation operations. These operations are often
implemented using methods from vector and matrix algebra.
An object may be part of an object part HIERARCHY with a parent, sibling,
and child objects. Such an object would inherit the transformations applied
to it's parent object and pass these on to it's siblings and children.
Hierarchies are used to create jointed figures such as robots and animals.
They can also be used to model other things like the sun, planets
and moons in a solar system.
I.7.2.3. Bounding Volume
Additionally, an object should include a BOUNDING VOLUME. The simplest
bounding volume is the Bounding Sphere, specified by a center and radius.
Another simple alternative is the Bounding Cube. This data can be used
for rapid object culling during rendering and trigger analysis. Objects
whose bounding volume is completely outside the viewing area need not be
transformed or considered further during rendering. Collision detection
with bounding spheres is very rapid. It could be used alone, or as a method
for culling objects before more rigorous collision detection algorithms
I.7.3. Object Geometry
The modeling of object shape and geometry is a large and diverse field.
Some approaches seek to very carefully model the exact geometry of real
world objects. Other methods seek to create simplified representations.
Most VR systems sacrifice detail and exactness for simplicity for the sake
of rendering speed.
The simplest objects are single dimensional points. Next come the two
dimensional vectors. Many CAD systems create and exchange data as 2D views.
This information is not very useful for VR systems, except for display
on a 2D surface within the virtual world. There are some programs that
can reconstruct a 3D model of an object, given a number of 2D views.
The sections below discuss a number of common geometric modeling methods.
The choice of method used is closely tied to the rendering process used.
Some renderers can handle multiple types of models, but most use only one,
especially for VR use. The modeling complexity is generally inversely proportional
to the rendering speed. As the model gets more complex and detailed, the
frame rate drops.
I.7.3.1. 3D PolyLines & PolyPoints
The simplest 3D objects are known as PolyPoints and PolyLines. A PolyPoint
is simply a collection of points in space. A Polyline is a set of vectors
that form a continuous line.
The most common form of objects used in VR systems are based on flat polygons.
A polygon is a planar, closed multi-sided figure. They maybe convex or
concave, but some systems require convex polygons. The use of polygons
often gives objects a faceted look. This can be offset by more advanced
rendering techniques such as the use of smooth shading and texture mapping.
Some systems use simple triangles or quadrilaterals instead of more
general polygons. This can simplify the rendering process, as all surfaces
have a known shape. However, it can also increase the number of surfaces
that need to be rendered.
Polygon Mesh Format (aka Vertex Join Set) is a useful form of polygonal
object. For each object in a Mesh, there is a common pool of Points that
are referenced by the polygons for that object. Transforming these shared
points reduces the calculations needed to render the object. A point at
the edge of a cube is only processed once, rather once for each of the
three edge/polygons that reference it. The PLG format used by REND386 is
an example of a Polygonal Mesh, as is the BYU format used by the 'ancient'
The geometry format can support precomputed polygon and vertex normals.
Both Polygons and vertices should be allowed a color attribute. Different
renderers may use or ignore these and possibly more advanced surface characteristics.
Precomputed polygon normals are very helpful for backface polygon removal.
Vertices may also have texture coordinates assigned to support texture
or other image mapping techniques.
Some systems provide only Primitive Objects, such as cubes, cones, and
spheres. Sometimes, these objects can be slightly deformed by the modeling
package to provide more interesting objects.
I.7.3.4. Solid Modeling & Boolean Operations
Solid Modeling (aka Computer Solid Geometry, CSG) is one form of geometric
modeling that uses primitive objects. It extends the concept by allowing
various addition, subtraction, Boolean and other operations between these
primitives. This can be very useful in modeling objects when you are concerned
with doing physical calculations, such as center of mass, etc. However,
this method does incur some significant calculations and is not very useful
for VR applications. It is possible to convert a CSG model into polygons.
Various complexity polygonal models (# polygons) could be made from a single
high resolution ''metaobject" of a CSG type.
I.7.3.5. Curves & Patches
Another advanced form of geometric modeling is the use of curves and curved
surfaces (aka patches). These can be very effective in representing complex
shapes, like the curved surface of an automobile, ship or beer bottle.
However, there is significant calculation involved in determining the surface
location at each pixel, thus curve based modeling is not used directly
in VR systems. It is possible, however, to design an object using curves
and then compute a polygonal representation of those curved patches. Various
complexity polygonal models could be made from a single high resolution
I.7.3.6. Dynamic Geometry (aka morphing)
It is sometimes desirable to have an object that can change shape. The
shape might simply be deformed, such a bouncing ball or the squash/stretch
used in classical animation ('toons'), or it might actually undergo metamorphosis
into a completely different geometry. The latter effect is commonly known
as 'morphing' and has been extensively used in films, commercials and television
shows. Morphing can be done in the image domain (2D morph) or in the geometry
domain (3D morph). The latter is applicable to VR systems. The simplest
method of doing a 3D morph is to precompute the various geometry's and
step through them as needed. A system with significant processing power
can handle real time object morphing.
I.7.3.7. Swept Objects & Surface of Revolution
A common method for creating objects is known as Sweeping and Surfaces
of Revolution. These methods use an outline or template curve and a backbone.
The template is swept along the backbone creating the object surface (or
rotated about a single axis to create a surface of revolution). This method
may be used to create either curve surfaces or polygonal objects. For VR
applications, the sweeping would most likely be performed during the object
modeling (creation) phase, and the resulting polygonal object stored for
real time use.
I.7.3.8. Texture Maps & Billboard Objects
As mentioned in the section on rendering, texture maps (images) can be
used to provide the appearance of more geometric complexity without the
geometric calculations. Using flat polygonal objects that maintain an orientation
towards the eye/camera (billboards) and multiple texture maps can extend
this trick even further. Texture maps, even without billboard objects,
are an excellent way to increase apparent scene complexity. Variations
on the image mapping concept are also used to simulate reflections, etc.
Lighting is a very important part of a virtual world (if it is visually
rendered). Lights can be ambient (everywhere), or located. Located lights
have position and may have orientation, color, intensity and a cone of
illumination. The more complex the light source, the more computation is
required to simulate its effect on objects.
Cameras or viewpoints may be described in the World Database. Generally,
each user has only one viewpoint at a time (ok, two closely spaced viewpoints
for stereoscopic systems). However, it may be useful to define alternative
cameras that can be used as needed. An example might be an overhead camera
that shows a schematic map of the virtual world and the user's location
within it (You Are Here.)
I.7.6. Scripts and Object Behavior
A virtual world consisting only of static objects is only of mild interest.
Many researchers and enthusiasts of VR have remarked that interaction is
the key to a successful and interesting virtual world. This requires some
means of defining the actions that objects take on their own and when the
user (or other objects) interact with them. This i refer to generically
as the World Scripting. I divide the scripts into three basic types: Motion
Scripts, Trigger Scripts and Connection Scripts
Scripts may be textual or they might be actually compiled into the program
structure. The use of visual programming languages for world design was
pioneered by VPL Research with their Body Electric system. This Macintosh
based language used 2d blocks on the screen to represent inputs, objects
and functions. The programmer would connect the boxes to indicate data
There is no common scripting language used in today's VR products. The
commercial authoring packages, such as VR Studio, VREAM and Superscape
all contain some form of scripting language. Autodesk's CDK has the "Cyberspace
Description Format" (CDF) and the Distributed Shared Cyberspace Virtual
Representation (DSCVR) database. These are only partially implemented in
the current release. They are derived from the Linda distributed programming
language/database system. ("Coordiantation Languages and their Significance",
David Gelernter and Nicholas Carriero, Communications of the ACM, Feb 1992
V35N2). On the homebrew/freeware side, some people are experimenting with
several Object Oriented interpretive languages such as BOB ("Your own tiny
Object-Oriented Language", David Betz, DrDobbs Journal Sept 1991). Object
Orientation, although perhaps not in the conventional class-inheritance
mechanism, is very nicely suited to world scripting. Interpretive langauges
are faster for development, and often more accessible to 'non-programmers'.
I.7.6.1. Motion Scripts
Motion scripts modify the position, orientation or other attributes of
an object, light or camera based on the current system tick. A 'tick' is
one advancement of the simulation clock. Generally, this is equivalent
to a single frame of visual animation. (VR generally uses Discrete Simulation
For simplicity and speed, only one motion script should be active for
an object at any one instant. Motion scripting is a potentially powerful
feature, depending on how complex we allow these scripts to become. Care
must be exercised since the interpretation of these scripts will require
time, which impacts the frame and delay rates.
Additionally, a script might be used to attach or detach an object from
a hierarchy. For example, a script might attach the user to a CAR object
when he wishes to drive around the virtual world. Alternatively, the user
might 'pick up' or attach an object to himself.
I.7.6.2. Physical or Procedural Modeling and Simulation
A complex simulation could be used that models the interactions of the
real physical world. This is sometimes referred to as Procedural Modeling.
It can be a very complex and time consuming application. The mathematics
required to solve the physical interaction equations can also be fairly
complex. However, this method can provide a very realistic interaction
mechanism. (for more on Physical Simulation, see the book by Ronen Barzel
listed in the Computer Graphics Books section)
I.7.6.3. Simple Animation
A simpler method of animation is to use simple formulas for the motion
of objects. A very simple example would be "Rotate about Z axis once every
4 seconds". This might also be represented as "Rotate about Z 10 radians
A slightly more advanced method of animation is to provide a 'path'
for the object with controls on its speed at various points. These controls
are sometimes referred to as "slow in-out". They provide a much more realistic
motion than simple linear motion.
If the motion is fixed, some systems can precompute the motion and provide
a 'channel' of data that is evaluated at each time instance. This may be
a simple lookup table with exact values for each frame, or it may require
some sort of simple interpolation.
I.7.6.4. Trigger Scripts
Trigger Scripts are invoked when some trigger event occurs, such as collision,
proximity or selection. The VR system needs to evaluate the trigger parameters
at each TICK. For proximity detectors, this may be a simple distance check
from the object to the 3D eye or effector object (aka virtual human) Collision
detection is a more involved process. It is desirable but may not be practical
without off loading the rendering and some UI tasks from the main processor.
I.7.6.5. Connection Scripts
Connection scripts control the connection of input and output devices to
various objects. For example a connection script may be used to connect
a glove device to a virtual hand object. The glove movements and position
information is used to control the position and actions of the hand object
in the virtual world. Some systems build this function directly into the
program. Other systems are designed such that the VR program is almost
entirely a connection script.
I.7.7. Interaction Feedback
The user must be given some indication of interaction feedback when the
virtual cursor selects or touches an object. Crude systems have only the
visual feedback of seeing the cursor (virtual hand) penetrate an object.
The user can then grasp or otherwise select the object. The selected object
is then highlighted in some manner. Alternatively, an audio signal could
be generated to indicate a collision. Some systems use simple touch feedback,
such as a vibration in the joystick, to indicate collision, etc.
I.7.8. Graphical User Interface/Control Panels
A VR system often needs to have some sort of control panels available to
the user. The world database may contain information on these panels and
how they are integrated into the application. Alternatively, they may be
a part of the program code.
There are several ways to create these panels. There could be 2D menus
that surround a WoW display, or are overlaid onto the image. An alternative
is to place control devices inside the virtual world. The simulation system
must then note user interaction with these devices as providing control
over the world.
One primary area of user control is control of the viewpoint (moving
around within the virtual world). Some systems use the joystick or similar
device to move. Others use gestures from a glove, such as pointing, to
indicate a motion command.
The user interface to the VW might be restricted to direct interaction
in the 3D world. However, this is extremely limiting and requires lots
of 3D calculations. Thus it is desirable to have some form of 2D Graphical
user interface to assist in controlling the virtual world. These 'control
panels' of the would appear to occlude portions of the 3D world, or perhaps
the 3D world would appear as a window or viewport set in a 2D screen interface.
The 2D interactions could also be represented as a flat panel floating
in 3D space, with a 3D effector controlling them.
I.7.8.1. Two Dimensional Controls
There are four primary types of 2D controls and displays. (controls cause
changes in the virtual world, displays show some measurement on the VW.)
Buttons, Sliders, Gauges and Text. Buttons may be menu items with either
icons or text identifiers. Sliders are used for more analog control over
various attributes. A variation of a slider is the dial, but these are
harder to implement as 2D controls. Gauges are graphical depiction's of
the value of some attribute(s) of the world. Text may be used for both
control and display. The user might enter text commands to some command
parser. The system may use text displays to show the various attributes
of the virtual world.
An additional type of 2D display might be a map or locator display.
This would provide a point of reference for navigating the virtual world.
The VR system needs a definition for how the 2D cursor effects these
areas. It may be desirable to have a notion of a 'current control' that
is the focus of the activity (button pressed, etc.) for the 2D effector.
Perhaps the arrow keys on the keyboard could be used to change the current
control, instead of using the mouse (which might be part of the 3D effector
I.7.8.2. Three Dimensional Controls
Some systems place the controls inside the virtual world. These are often
implemented as a floating control panel object. This panel contains the
usual 2D buttons, gauges, menu items, etc. perhaps with a 3D representation
and interaction style.
There have also been some published articles on 3D control Widgets.
These are interaction methods for directly controlling the 3D objects.
One method implemented at Brown University attaches control handles to
the objects. These handles can be grasped, moved, twisted, etc. to cause
various effects on an object. For example, twisting one handle might rotate
the object, while a 'rack' widget would provide a number of handles that
can be used to deform the object by twisting its geometry.
I.7.9. Hardware Control & Connections
The world database may contain information on the hardware controls and
how they are integrated into the application. Alternatively, they may be
a part of the program code. Some VR systems put this information into a
configuration file. I consider this extra file simply another part of the
The hardware mapping section would define the input/output ports, data
speeds, and other parameters for each device. It would also provide for
the logical connection of that device to some part of the virtual world.
For example a position tracker might be associated with the viewer's head
I.7.10. Room/Stage/Area Descriptions
If the system supports the division of the virtual world into different
areas, the world database would need multiple scene descriptions. Each
area description would give the names of objects in scene, stage description
(i.e. size, backgrounds, lighting, etc.). There would also be some method
of moving between the worlds, such as entering a doorway, etc., that would
most likely be expressed in object scripts.
I.8. World Authoring versus Playback
A virtual world can be created, modified and experienced. Some VR systems
may not distinguish between the creation and experiencing aspects. However,
there is currently a much larger body of experience to draw upon for designing
the world from the outside. This method may use techniques borrowed from
architectural and other forms of Computer Aided Design (CAD) systems. Also
the current technologies for immersive VR systems are fairly limiting in
resolution, latency, etc. They are not nearly as well developed as those
for more conventional computer graphics and interfaces.
For many VR systems, it makes a great deal of sense to have a Authoring
mode and a Playback mode. The authoring mode may be a standard text editor
and compiler system, or it may include 3D graphic and other tools. Such
a split mode system makes it easier to create a stand alone application
that can be delivered as a product.
An immersive authoring ability may be desirable for some applications
and some users. For example, an architect might have the ability to move
walls, etc. when immersed, while the clients with him, who are not as familiar
with the system, are limited to player status. That way they can't accidentally
rearrange the house by leaning on a wall.