32. 3D User Interfaces

by Doug A. Bowman

Ever since the advent of the computer mouse and the graphical user interface (GUI) based on the Windows, Icons, Menus, and Pointer (WIMP) paradigm, people have asked what the next paradigm shift in user interfaces will be (van Dam, 1997; Rekimoto, 1998). Mouse-based GUIs have proven remarkably flexible, robust, and general, but we are finally seeing a major sea change towards "natural" user interfaces (NUIs), not only in the research lab, but also in commercial products aimed at broad consumer audiences. Under the NUI umbrella, there are two broad categories of interfaces: those based on direct touch, such as multi-touch tablets (Wigdor & Wixon, 2011), and those based on three-dimensional spatial input (Bowman et al., 2005), such as motion-based games. It is this latter category, which we call three-dimensional user interfaces (3D UIs), that we focus on in this chapter.

32.1 What are 3D User Interfaces?

Like many high-level descriptive terms in our field (such as "virtual reality" and "multimedia"), it's surprisingly difficult to give a precise definition of the term "3D user interface." Although most practitioners and researchers would say, "I know one when I see one," stating exactly what constitutes a 3D UI and which interfaces should be included and excluded is tricky.

3D User Interfaces: Theory and Practice (Bowman et al., 2005) defines a 3D user interface as simply "a UI that involves 3D interaction." This simply delays the inevitable, as we now have to define 3D interaction. The book states that 3D interaction is "human-computer interaction in which the user's tasks are performed directly in a 3D spatial context."

One key word in this definition is "directly." There are some interactive computer systems that display a virtual 3D space, but the user only interacts indirectly with this space—e.g., by manipulating 2D widgets, entering coordinates, or choosing items from a menu. These are not 3D UIs.

The other key idea is that of a "3D spatial context." The book goes on to make it clear that this spatial context can be either physical or virtual, or both. The most prominent types of 3D UIs involve a physical 3D spatial context, used for input. The user provides input to the system by making movements in physical 3D space or manipulating tools, sensors, or devices in 3D space, without regard for what this input is used to do or control. Of course, all input/interaction is in some sense in a physical 3D spatial context (a mouse and keyboard exists in 3D physical space), but the intent here is that the user is giving spatial input that involves 3D position (x, y, z) and/or orientation (yaw, pitch, roll) and that this spatial input is meaningful to the system.

Accelerate your career: Get industry-trusted Course Certificates
Beginner UX courses starting soon
Intermediate UX courses starting soon

Thus, the key technological enabler of 3D UIs of this sort is spatial tracking (Meyer et al., 1992; Welch & Foxlin, 2002). The system must be able to track the user's position, orientation, and/or motion to enable this input to be used for 3D interaction. For example, the Microsoft Kinect tracks the 3D positions of multiple body parts to enable 3D UIs, while the Apple iPhone tracks its own 3D orientation, allowing 3D interaction. There are many different technologies used for spatial tracking; we describe some of these in a later section.

This tracked spatial input can be used for iconic gestures, direct pointing at menu items, controlling characters in a game, specifying 3D shapes, and many other uses. 3D UIs based on spatial input can be found in a variety of settings: gaming systems, modeling applications, virtual and augmented reality systems, large screen visualization setups, and art installations, just to name a few.

The other type of 3D UI involves direct interaction in a virtual 3D spatial context. In this type, the user may be using traditional (non-3D) input devices or movements as inputs, but if those inputs are transformed directly into actions in the virtual 3D space, we still consider it to be 3D interaction. For example, the user might drag the mouse across a 3D model in order to paint it a certain color, or the user might draw a path through a 3D world using touch input.

In this , we are going to focus on the first type of 3D UI, which is based on 3D spatial input. While both types are important and have many applications, they involve different research issues and different technologies to a large degree. 3D spatial tracking has come of age recently, and based on this technological driver, 3D UI applications with spatial input have exploded. We discuss a few of these applications in more detail in the next section.

32.2 Applications of 3D UIs

Why is it important to understand and study 3D UIs? For many years, the primary application of 3D UIs was in high-end virtual reality (VR) and augmented reality (AR) systems. Since users in these systems were generally standing up, walking around, and limited in their view of the real world, traditional mouse- and keyboard-based interaction was impractical. Such systems were already using spatial tracking of the user's headthe correct view of the virtual world, it was natural to also design UIs that took advantage of spatial tracking as well. As we indicated above, however, recent years have seen an explosion of spatial input in consumer-level systems such as game consoles and smartphones. Thus, the principles of good 3D UIs design are now more important to understand than ever.

To further motivate the importance of 3D UI research, let's look in a bit more detail at some important technology areas where 3D UIs are making an impact on real-world applications.

32.2.1 Video Gaming

As we've already mentioned, most people today are aware of 3D UIs because of the great success of "motion gaming" systems like the Nintendo Wii, the Microsoft Kinect, and the Sony Move. All of these systems use spatial tracking to allow users to interact with games through pointing, gestures, and most importantly, natural movements, rather than with buttons and joysticks. For example, in an archery game a user can hold two tracked devices—one for the handle of the bow and the other for the arrow and string—and can pull back the arrow, aim, and release using motions very similar to archery in the real world.

The Wii and Move both use tracked handheld devices that also provide buttons and joysticks, while the Kinect tracks the user's body directly. There's a clear tradeoff here. Buttons and joysticks are still useful for discrete actions like confirming a selection, firing a weapon, or changing the view. On the other hand, removing encumbrances from the user can make the experience seem even more natural.

3D UIs are a great fit for video gaming (LaViola, 2008; Wingrave et al., 2010), because the emphasis is on a compelling experience, which can be enhanced with natural actions that make the player feel as if he is part of the action, rather than just indirectly controlling the actions of a remote character.

32.2.2 Very Large Displays

Recent years have seen an explosion in the size, resolution, and ubiquity of displays. So-called "display walls" are found in shopping malls, conference rooms, and even people's homes. Many of these displays are passive, simply presenting canned information to viewers, but more and more of them are interactive.

So how should one interact with these large displays? The traditional mouse and keyboard still work, but they are difficult to use in this context because users want to move about in front of the display, and because such large displays invite multiple users (Ball and North, 2005). Touch screens are another option, but that means that to interact with the display one has to stand within arm's reach, limiting the amount of the display that can be seen.

3D interaction is a natural choice for large display contexts. A tracked handheld device, the hand itself, or the whole body can be used as portable input that works from any location and makes sense for multiple users. The simplest example is distal pointing, where the user points directly at a location on the display (as with a laser pointer) to interact with it (Vogel & Balakrishnan, 2005; Kopper et al., 2010), but other techniques such as full-body gestures or viewpoint-dependent display can also be used.

32.2.3 Mobile Applications

Today's mobile devices, such as smartphones and tablets, are an interaction designer's playground, not only because of the rich design space for multi-touch input, but also because these devices incorporate some fairly powerful sensors for 3D spatial input. The combination of accelerometers, gyroscopes, and a compass give these devices the ability to track their own orientation quite accurately. Position information based on GPS and accelerometers is less accurate, but still present. These devices offer a key opportunity for 3D interaction design, however, because they are ubiquitous, they have their own display, and they can do spatial input without the need for any external tracking infrastructure (cameras, base stations, etc.).

Many mobile games are using these capabilities. Driving games, for example, use the "tilt to steer" metaphor. Music games can sense when the user is playing a virtual drum. And golf games can incorporate a player's real swing.

But "serious" applications can take advantage of 3D input for mobile devices as well. Everyone is familiar with the idea of tilting the device to change the interface from portrait to landscape mode, but this is only the tip of the iceberg. A tool for amateur astronomers can use GPS and orientation information to help the user identify stars and planets they point the device towards. Camera applications can not only record the location at which a photo was taken, but also track the movement of the camera to aid in the reconstruction of a 3D scene.

Perhaps the most prominent example of mobile device 3D interaction is in mobile AR. In mobile AR, the smartphone becomes a window through which the user can see not only the real world, but virtual objects and information as well (Höllerer et al., 1999; Ashley, 2008). Thus, the user can browse information simply by moving the device to view a different part of the real world scene. Mobile AR is being used for applications in entertainment, navigation, social networking, tourism, and many more domains. Students can learn about the history of an area; friends can find restaurants surrounding them and link to reviews; and tourists can follow a virtual path to the nearest subway station. Prominent projects like MIT's SixthSense (Mistry & Maes, 2009) and Google's Project Glass (Google, 2012) have made mobile AR highly visible. Good 3D UI design is critical to realizing these visions.

32.3 3D UI Technologies

As we discussed above, spatial tracking technologies are intimately connected to 3D UIs. In order to design usable 3D UIs, then, a basic understanding of spatial tracking is necessary. In addition, other input technologies and display devices play a major role in 3D UI design.

32.3.1 Tracking Systems and Sensors

Spatial tracking systems sense the position, orientation, linear or angular velocity, and/or linear or angular acceleration of one or more objects. Traditionally, 3D UIs have been based on six-degree-of-freedom (6-DOF) position trackers, which detect the absolute 3D position (location in a fixed XYZ coordinate system) and orientation (roll, pitch, and yaw in the fixed coordinate system) of the object, which is typically mounted on the head or held in the hand.

These 6-DOF position trackers can be based on many different technologies, such as those using electromagnetic fields (e.g., Polhemus Liberty), optical tracking (e.g., NaturalPoint OptiTrack), or hybrid ultrasonic/inertial tracking (e.g., Intersense IS900). All of these, however, share the limitation that some external fixed reference, such as a base station, a camera array, a set of visible markers, or an emitter grid, must be used. Because of this, absolute 6-DOF position tracking can typically only be done in prepared spaces.

Inertial tracking systems, on the other hand, can be self-contained and require no external reference. They use technologies such as accelerometers, gyros, magnetometers (compasses), or video cameras to sense their own motion—their change in position or orientation. Because they measure relative position and orientation, inertial systems can't tell you their absolute location, and errors in the measurements tend to accumulate over time, producing drift.

The "holy grail" of spatial tracking is a self-contained 6-DOF system that can track its own absolute position and orientation with high levels of accuracy and precision. We are getting closer to this vision. For instance, a smartphone can use its accelerometers, gyros, and magnetometer to track its absolute orientation (relative to gravity and the earth's magnetic field), and its GPS receiver to track its 2D position on the surface of the earth. However, GPS position is only accurate to within a few feet at best, and the height (altitude) of the phone cannot currently be tracked with any accuracy. For now, then, smartphones on their own cannot be used as a general-purpose 6-DOF input device.

A 6-DOF tracker with minimal setup requirements is the Sony Move system. Designed as a "motion controller" (although it really senses position) for the PlayStation game console, the Move uses the typical accelerometers and gyros to sense 3D orientation, and a single camera to track the 3D position of a glowing ball atop the device. This works surprisingly well, coming near to the accuracy of much more expensive and complex tracking systems, but does have the limitation that the user must be facing the camera and not blocking the camera's view of the ball. In addition, accuracy in the depth dimension is worse than in the horizontal and vertical dimensions.

Probably the best candidate for self-contained 6-DOF tracking is inside-out vision-based tracking, in which the tracked object uses a camera to view the world, and analyzes the changes in this view over time to understand its own motion (translations and rotations). Although this approach is inherently relative, such systems can keep track of "feature points" in the scene to give a sort of absolute tracking in a fixed coordinate system connected with the scene. Algorithms such as parallel tracking and mapping (PTAM) (Klein & Murray, 2007) are getting closer to making this a reality.

Three recent tracking developments deserve special mention, as they are bringing many new designers and researchers into the realm of 3D UIs. The first is the Nintendo Wii Remote. This gaming peripheral does not offer 6-DOF tracking, but does include several inertial sensors in addition to a simple optical tracker that can be used to move a cursor on the screen. Wingrave and colleagues (Wingrave et al, 2010) presented a nice discussion of how the Wii Remote differs from traditional trackers, and how it can be used in 3D UIs.

Second, the Microsoft Kinect (Figure 1) delivers tracking in a very different way. Rather than tracking a handheld device or a single point on the user's head, it uses a depth camera to track the user's entire body (a skeleton of about 20 points). The 3-DOF position of each point is measured, but orientation is not detected. And since it tracks the body directly, no "controller" is needed. Researchers have designed some interesting 3D interactions with Kinect (e.g., Wilson & Benko, 2010), but they are necessarily quite different than those based on single-point 6-DOF tracking.

3D interaction with Microsoft Kinect Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.

Figure 32.1: 3D interaction with Microsoft Kinect

Third, the Leap Motion device, which has been announced but is not available at the time of this writing, promises to deliver very precise 3D tracking of hands, fingers, and tools in a small workspace. It has the potential to make 3D interaction a standard part of the desktop computing experience, but we will have to wait and see how best to design interaction techniques for this device. It will share many of the benefits and drawbacks of the Kinect, and although it is designed to support "natural" interaction, naturalism is not always possible, and not always the best solution (as we will discuss below).

For 3D interaction, spatial trackers are most often used inside handheld devices. These devices typically include other inputs such as buttons, joysticks, or trackballs, making them something like a "3D mouse." Like desktop mice, these can then be used for pointing, manipulating objects, selecting menu items, and the like. Trackers are also used to measure the user's head position and orientation. Head tracking is useful for modifying the view of a 3D environment in a natural way.

The type of spatial tracker used in a 3D UI can have a major impact on its usability, and different trackers may require different UI designs. For example, a tracker with higher latency might not be appropriate for precise object manipulation tasks, and an interface using a 3-DOF orientation tracker requires additional methods for translating the viewpoint in the 3D environment, since it does not track the user's position.

This short section can't do justice to the complex topic of spatial tracking. An older, but very good, overview of tracking technologies and issues can be found in Welch's paper (Welch & Foxlin, 2002).

32.3.2 Other Input Devices

While spatial tracking is the fundamental input device for 3D UIs, it is usually not sufficient on its own. As noted above, most handheld trackers include other sorts of input, because it's difficult to map all interface actions to position, orientation, or motion of the tracker. For example, to confirm a selection action, a discrete event or command is needed, and a button is much more appropriate for this than a hand motion. The Intersense IS900 wand is typical of such handheld trackers; it includes four standard buttons, a "trigger" button, and a 2-DOF analog joystick (which is also a button) in a handheld form factor. The Kinect, because of its "controller-less" design, suffers from the lack of discrete inputs such as buttons.

Generalizing this idea, we can see that almost any sort of input device can be made into a spatial input device by tracking it. Usually this requires adding some hardware to the device, such as optical tracking markers. This extends the capability and expressiveness of the tracker, and allows the input from the device to be interpreted differently depending on its position and orientation. For example, in my lab we have experimented with tracking multi-touch smartphones and combining the multi-touch input with the spatial input for complex object manipulation interfaces (Wilkes et al., 2012). Other interesting devices, such as bend-sensitive tape, can be tracked to provide additional degrees of freedom (Balakrishnan et al., 1999).

Gloves (or finger trackers) are another type of input device that is frequently combined with spatial trackers. Pinch gloves detect contacts between the fingers, while data gloves and finger trackers measure joint angles of the fingers. Combining these with trackers allow for interesting, natural, and expressive use of hand gestures, such as in-air typing (Bowman et al., 2002), writing (Ni et al., 2011), or sign language input (Fels & Hinton, 1997).

32.3.3 Displays

Much of the early work on 3D UIs was done in the context of interaction with VR systems, which use some form of "immersive" display, such as head-mounted displays (HMDs), surround-screen displays (e.g., CAVEs), or wall-sized stereoscopic displays. Increasingly, however, 3D interaction is taking place with TVs or even desktop monitors, due to the use of consumer-level tracking devices meant for gaming. Differences in display configuration and characteristics can have a major impact on the design and usability of 3D UIs.

HMDs (Figure 2) provide a full 360-degree surround (when combined with head tracking) and can block out the user's view of the real world, or enhance the view of the real world when used in AR systems. When used for VR, HMDs keep users from seeing their own hands or other parts of their bodies, meaning that devices must be usable eyes-free, and that users may be hesitant to move around in the physical environment. HMDs also vary widely in field of view (FOV). When a low FOV is present, 3D UI designers must use the limited screen real estate sparingly.

Using a 3D UI while wearing a head-mounted display. The TV in the background shows the image displayed in the HMD.

Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.

Figure 32.2: Using a 3D UI while wearing a head-mounted display. The TV in the background shows the image displayed in the HMD.

CAVE-like displays (Cruz-Neira et al., 1993) may provide a full surround, but more often use two to four screens to partially surround the user. Among other considerations, for 3D UIs this means that the designer must provide a way for the user to rotate the world. The mixture of physical and virtual viewpoint rotation can be confusing and can reduce performance on tasks like visual search (McMahan, 2011).

3D UIs on smaller displays like TVs also pose some interesting challenges. With HMDs and CAVEs, the software field of view (the FOV of the virtual camera) is usually matched to the physical FOV of the display so that the view is realistic, as if looking through a window to the virtual world. With desktop monitors and TVs, however, we may not know the size of the display or the user's position relative to it, so determining the appropriate software FOV is difficult. This in turn may influence the user's ability to understand the scale of objects being displayed.

Finally, we know that display characteristics can affect 3D interaction performance. Prior research in my lab has shown, for example, that stereoscopic display can improve performance on difficult manipulation tasks (Narayan et al., 2005) but not on simpler manipulation tasks (McMahan et al., 2006).

32.4 Designing Usable 3D UIs

As a serious topic in HCI, 3D interaction has not been around very long. The seminal papers in the field were only written in the mid- to late-1990s, the most-cited book in the field was published in 2005, and the IEEE Symposium on 3D User Interfaces didn't begin until 2006.

Because of this, the level of maturity of 3D UI design principles lags behind those for standard GUIs. There is no standard 3D UI (and it's not clear that there could be, given the diversity of input devices, displays, and interaction techniques), and few well-established guidelines for 3D UI design. While general HCI principles such as Nielsen's heuristics (Nielsen & Molich, 1990) still apply, they are not sufficient for understanding how to design a usable 3D UI.

Thus, it's important to have specific design principles for 3D interaction. While the 3D UI book (Bowman et al., 2005) and several other works (Kulik, 2009; Gabbard, 1997; Kaur, 1999) have extensive lists of guidelines, here I've tried to distill what I feel are the most important lessons about good 3D UI design.

32.4.1 Understand the design space

Despite the youth of the field, there is a very large number of existing 3D interaction techniques for the so-called "universal tasks" of travel, selection, manipulation, and system control. In many cases, these techniques can be reused directly or with slight modifications in new applications. The lists of techniques in the 3D UI book (Bowman et al., 2005) are a good place to start; more recent techniques can be found in the proceedings of IEEE 3DUI and VR, ACM CHI and UIST, and other major conferences.

When existing techniques are not sufficient, new techniques can sometimes be generated by combining existing technique components. Taxonomies of technique components (Bowman et al., 2001) can be used as design spaces for this purpose.

32.4.2 There is still room to innovate

A wide variety of techniques already existsit is impossible to innovate in 3D UI design. On one hand, most of the primary metaphors for the universal tasks have probably been invented already. On the other hand, there are several reasons to believe that there are new, radically different metaphors than what we currently have.

First, we know the design space of 3D interaction is very large due to the number of devices and mappings available. Second, 3D interaction design can be magical—limited only by the designer's imagination. Third, new technologies (such as the Leap Motion device) with the potential for new forms of interaction are constantly appearing. For example, in a recent project in our lab, students used a combination of recent technologies (multi-touch tablet, 3D reconstruction, marker-based AR tracking, and stretch sensors) to enable "AR Angry Birds"—a novel form of physical interaction with both real and virtual objects in AR (Figure 3). Finally, techniques can be designed specifically for specialized tasks in various application domains. For example, we designed domain-specific interaction techniques for object cloning in the architecture and construction domain (Chen and Bowman, 2009).

AR Angry Birds prototype with a physical slingshot and

Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.

Figure 32.3: AR Angry Birds prototype with a physical slingshot and "destruction" of real-world objects-an example of innovation in 3D UI design.

32.4.3 Be careful with mappings and DOFs

One of the most common problems in 3D UI design is the use of inappropriate mappings between input devices and actions in the interface. Zhai & Milgram (1993) showed, for instance, that elastic sensors (e.g., a joystick) and isometric sensors (e.g., a SpaceBall) map well to rate-controlled movements, where the displacement or force measured by the sensor is mapped to velocity of an object (including the viewpoint) in the virtual world, while isotonic sensors (e.g., a position tracker) map well to position-controlled movements, where the position measured by the sensor is mapped to the position of an object. When this principle is violated, performance suffers.

Similarly, there are often problems with the mappings of input DOFs to actions. When a high-DOF input is used for a task that requires a lower number of DOFs, task performance can be unnecessarily difficult. For example, selecting a menu item is inherently a one-dimensional task. If users need to position their virtual hands within a menu item to select it (a 3-DOF input), the interface requires too much effort.

Another DOF problem is the misuse of integral and separable DOFs. Jacob & Sibert (1992) showed that input devices with integral DOFs (those that are controlled all together, as in a 6-DOF tracker) should be mapped to tasks that users perceive as integral (such as 6-DOF object manipulation), while input devices with separable DOFs (those that can be controlled independently, such as a set of sliders) should be mapped to tasks that have sub-tasks users perceive as separable (such as setting the hue, saturation, and value of a color). A violation of this concept, for example, would be to use a six-DOF tracker to simultaneously control the 3D position of an object and the volume of an audio clip, since those tasks cannot be integrated by the user.

In general, 3D UI designers should seek to reduce the number of DOFs the user is required to control. This can be done by using lower-DOF input devices, by ignoring some of the input DOFs, or by using physical or virtual constraints. For example, placing a virtual 2D interface on a physical tablet prop (Schmalstieg et al., 1999) provides a constraint allowing users to easily use 6-DOF tracker input for 2D interaction.

32.4.4 Keep it simple

Although 3D UIs can be very expressive and can support complex tasks, not all tasks in a 3D UI need to use fully general interaction techniques. When the user's goal is simple, designers should provide simple and effortless techniques. For example, there are many general-purpose travel techniques that allow users to control the position and orientation of the viewpoint continuously, but if the user simply wants to move to a known landmark, a simple target-based technique (e.g., point at the landmark object) will be much more usable.

Reducing the number of DOFs, as described above, is another way to simplify 3D UIs. For instance, travel techniques can require only two DOFs if terrain following is enabled.

Finally, when using physical buttons or gestures to map to commands/functions, avoid the tendency to add another button or gesture for each new command. Users typically can't remember a large number of gestures, and remembering the mapping between buttons and functions becomes difficult after only 2-3 buttons are used.

32.4.5 Design for the hardware

In traditional UIs, we usually try to design without regard for the display or the input device (i.e., display- and device-independence). UIs should be just as usable no matter whether you are using a large monitor or a small laptop, with a mouse or a trackpad. This is not always strictly true—when you have a very large multi-monitor setup, for example. But in 3D UIs, what works on one display or with one device very rarely works exactly the same way on different systems.

We call this the migration issue. When migrating to a different display or device, the UI and interaction techniques often need to be modified. In other words, we need display- and device-specific 3D UIs.

For example, the World-in-Miniature (WIM) technique (Stoakley et al., 1995), which allows users to move virtual objects in a full-scale virtual environment by manipulating small "dollhouse" representations of those objects, was originally designed for an HMD with two handheld trackers for input. When we tried to migrate WIM to a CAVE (Bowman et al., 2007), we found performance to be significantly worse, probably because users found it difficult to fuse the stereo imagery when the virtual WIM was held close to their eyes. In addition, we had to add controls for rotating the world due to the missing back wall of the CAVE. More recently, we tried to migrate WIM to use the Kinect, and were not able to find any reasonable mapping that allowed users to easily manipulate both the WIM and the virtual hand with six DOFs.

32.4.6 You may still have to train users, but a little training can go a long way

3D interaction is often thought of as "natural," but for many novice users, effective operation of 3D UIs is anything but natural. Users in HMDs don't want to turn their heads, much less move their bodies. Moving a hand in two dimensions (parallel to a screen) is fine, but moving a hand towards or away from the screen doesn't come naturally. When using 3D travel techniques, users don't take advantage of the ability to fly, or to move sideways, or to walk through virtual walls (Bowman et al., 1999).

Because of this, we find that we often have to train our users before they become proficient at using even well designed 3D UIs. In most of the HCI community, the need for training or instruction is seen as a sign of bad design, but in the examples mentioned above, effective use requires users to go against their instincts and intuitions. If a minimal (one-minute) training session allows users to improve their performance significantly, we see that as both practical and positive.

32.4.7 Always evaluate

Finally, we suggest that all 3D UI designs should undergo formative, empirical usability evaluation with members of the target user population. While this guideline probably applies to all UIs, 3D UIs in particular are difficult to design well based on theory, principles, and intuition alone. Many usability problems don't become clear until users try the 3D UI. evaluate early and often.

32.5 Current 3D UI Research

In this final section, I want to highlight two of the interesting problems 3D UI researchers are addressing today.

32.5.1 Realism vs. Magic - The Question of Interaction Fidelity

One of the fundamental issues in 3D UI design is the tension between realistic and magical interaction. Many feel that 3D interaction should be as "natural" as possible, reusing and reproducing interactions from the real world so that users can take advantage of their existing skills, knowing what to do and how to do it. On the other hand, 3D UIs primarily allow users to interact with virtual objects and environments, whose only constraints are due to the skill of the programmer and the limits of the technology. Thus, "magic" interaction is possible, enabling the user to transcend the limitations of human perception and action, to reduce or eliminate the need for physical effort and lengthy operations, and even to perform tasks that are impossible in the real world.

This question is related to the concept of interaction fidelity, which we define as the objective degree with which the actions (characterized by movements, forces, body parts in use, etc.) used for a task in the UI correspond to the actions used for that task in the real world (Bowman et al., 2012). By talking about the degree of fidelity, we emphasize that we are not just talking about "realistic" and "non-realistic" interactions, but a continuum of realism, which itself has several different dimensions.

Consider an example. For the task of moving a virtual book from one location on a desk to another, we could, among many other options: a) map the movements of the user's real hand and fingers exactly, requiring exact placement, grasping, and releasing, b) position a 3D cursor over the book, press a button, move the cursor to the target position, and release the button, or c) choose "move" from a menu, and then use a laser pointer to indicate the book and the target location. Clearly, option a) is the most natural, option b) uses a natural metaphor but leaves out some of the less necessary details of the real-world interaction, and option c) has very low interaction fidelity. Option a) is probably the easiest for a novice user to learn and use, providing that the designer can replicate the actions and perceptual cues from the real world well enough, although option b) is the simplest and may be just as effective.

Some tasks are very difficult (or impossible) to do in the real world. What if I want to remove a building from a city? A highly natural 3D UI would require the user to obtain some virtual explosives or a virtual crane with a wrecking ball, and operate these over a long period of time. Here a "magic" technique, such as allowing the user to "erase" the building, or selecting the building and invoking a "delete" command by voice, is clearly more practical and effective.

In many cases of difficult tasks, the question is not whether we should use a natural or magical 3D UI, because the purely natural technique wouldn't be practical. Instead, the question is whether to use a natural metaphor. For example, in the real world I cannot pick up objects that are beyond arm's reach, but in the virtual world I can. Should I do this with a reaching and grasping metaphor, as in the Go-Go technique (Poupyrev et al., 1996), which extends the user's virtual hand far into the environment based on natural movements? Or should I pick up the object by pointing to it using a laser pointer metaphor, as in the HOMER technique (Bowman & Hodges, 1997)? In this case, the less natural laser pointer metaphor is more effective in terms of user performance, but enhanced natural metaphors are easy to learn and highly usable in many situations.

Because techniques like Go-Go use natural metaphors to extend users' abilities beyond what's possible in the real world, we refer to them as hyper-natural. There is not a single answer to the question of whether to choose natural, hyper-natural, or non-natural magic techniques, but overall, research has shown significant benefits for the natural and hyper-natural design approaches (Bowman et al., 2012).

32.5.2 Increasing Precision

A major disadvantage of 3D UIs based on spatial tracking systems is the difficulty of providing precise 3D spatial input. The modern mouse is a highly precise, accurate, and responsive 2D spatial input device—users can point at on-screen elements, even individual pixels, quickly and accurately. 3D spatial tracking systems are far behind the mouse in terms of precision (jitter), accuracy of reported values, and responsiveness (latency), making it problematic to use them for tasks requiring precision (Teather et al., 2009).

But even if 3D spatial tracking systems improve their specifications to be comparable with today's mouse, 3D UIs will still have a precision problem, for the following reasons:

  • 3D interaction is performed in the air, not on a surface. There is no friction or physical support to make movements more controlled and precise.
  • Humans have a natural hand tremor that causes in-air movements to be jittery.
  • Interfaces based on 3D pointing using ray-casting (i.e., laser pointer metaphor) amplify this hand tremor so that it becomes worse the farther out along the ray you go.
  • 3D spatial trackers are not "parkable" like the mouse—the user cannot let go of them and be assured that they will stay in the same position.

So is there any hope of 3D UIs that can be used for precise work? A partial solution is to filter the output of 3D spatial trackers to reduce noise, but filtering can cause other problems, such as increased latency. Current research is addressing the precision problem using several different strategies.

One approach is to modify the control/display (C/D) ratio. The simple idea here is to use an N:1 mapping between movements of the input device (control) and movements in the system (display), where N is greater than one. In other words, if the C/D ratio is five, then a five-centimeter movement (or five-degree rotation) of the tracker would result in a one-centimeter movement (or one-degree rotation) in the virtual world. This gives users greater levels of control and the ability to achieve more precision, but at the cost of increased physical effort and time. Some techniques (e.g., Frees et al., 2007) dynamically modify the C/D ratio so that precision is only added when necessary (e.g., when the user is moving slowly).

A second strategy is to ensure that the user is not required to be more precise than absolutely necessary. For example, if the user is selecting a very small object in a sparse environment, there is no need to make the user touch or point to the object precisely. Rather, the cursor can have area or volume (e.g., a circle or sphere) instead of being a point (e.g., Liang & Green, 1994), or the cursor can snap to the nearest object (e.g., de Haan et al., 2005).

Finally, a promising approach called progressive refinement spreads out the interaction over time rather than requiring a single precise action. A series of rough, imprecise actions can be used to achieve a precise result, without a great deal of effort on the part of the user. For instance, the SQUAD technique (Kopper et al., 2011) allows users to select small objects in cluttered environments by first doing a volume selection, then refining the set of selected objects with a series of rapid menu selections. In very difficult cases, this technique was even faster than ray-casting, which uses a single precise selection, and in all cases, SQUAD resulted in fewer selection errors. This progressive refinement approach should be broadly applicable to many sorts of difficult 3D interaction tasks.

32.6 For Further Reading

  • For an overview of the field of 3D UIs, and a comprehensive survey of devices and interaction techniques, see 3D User Interfaces: Theory and Practice (Bowman et al., 2005).
  • The best current research in the field can be found in the proceedings of the IEEE Symposium on 3D User Interfaces.
  • For more on how to use realism and magic in 3D UI design, see a recent tutorial in IEEE Computer Graphics & Applications (Kulik, 2009).
  • Wolfgang Stuerzlinger provides a set of practical guidelines from his years of experience in 3D UI design in a recent survey paper (Bowman et al., 2008).
  • To learn more about experimental results on the effects of interaction fidelity in 3D UIs, see my recent Communications of the ACM paper (Bowman et al., 2012).

32.7 References