Publication statistics

Pub. period:1996-2008
Pub. count:23
Number of co-authors:30



Co-authors

Number of publications with 3 favourite co-authors:

Rebecca Lunsford:9
Rachel Coulston:6
Matt Wesson:3

 

 

Productive colleagues

Sharon Oviatt's 3 most productive colleagues in number of publications:

James A. Landay:91
Terry Winograd:59
Philip R. Cohen:23
 
 
 

Upcoming Courses

go to course
Emotional Design: How to make products people will love
Starts the day after tomorrow !
go to course
UI Design Patterns for Successful Software
85% booked. Starts in 10 days
 
 

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !

 
 

Our Latest Books

 
 
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
 
 
 
 
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
 
 
 
 
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading
 
 

Sharon Oviatt

Ph.D

Picture of Sharon Oviatt.
Update pic
Has also published under the name of:
"S. Oviatt"

Personal Homepage:
http://www.incaadesigns.org

SPECIALIZED PROFESSIONAL COMPETENCE Human-centered design; Educational interfaces; Interfaces for universal access, lifespan use, and diverse populations (e.g., students varying in ability); Adaptive interfaces; Cognitive modeling and low-load interfaces; Mobile & ubiquitous interfaces; Multimodal interfaces; Pen-based and spoken language interfaces; Collaborative teamwork interfaces; Communication models and modality effects; Empirically-based interface design, evaluation and methodology

EDUCATION
1979 - Ph.D., Experimental Psychology, University of Toronto
1974 - M.A., Experimental Psychology, University of Toronto
1972 - B.A., Psychology, with Highest Honors

 

Publications by Sharon Oviatt (bibliography)

 what's this?
2008
 
Edit | Del

Oviatt, Sharon, Swindells, Colin and Arthur, Alex (2008): Implicit user-adaptive system engagement in speech and pen interfaces. In: Proceedings of ACM CHI 2008 Conference on Human Factors in Computing Systems April 5-10, 2008. pp. 969-978. Available online

As emphasis is placed on developing mobile, educational, and other applications that minimize cognitive load on users, it is becoming more essential to explore interfaces based on implicit engagement techniques so users can remain focused on their tasks. In this research, data were collected with 12 pairs of students who solved complex math problems using a tutorial system that they engaged over 100 times per session entirely implicitly via speech amplitude or pen pressure cues. Results revealed that users spontaneously, reliably, and substantially adapted these forms of communicative energy to designate and repair an intended interlocutor in a computer-mediated group setting. Furthermore, this behavior was harnessed to achieve system

© All rights reserved Oviatt et al. and/or ACM Press

 
Edit | Del

Cohen, Phil, Swindells, Colin, Oviatt, Sharon and Arthur, Alex (2008): A high-performance dual-wizard infrastructure for designing speech, pen, and multimodal interfaces. In: Proceedings of the 2008 International Conference on Multimodal Interfaces 2008. pp. 137-140. Available online

The present paper reports on the design and performance of a novel dual-Wizard simulation infrastructure that has been used effectively to prototype next-generation adaptive and implicit multimodal interfaces for collaborative groupwork. This high-fidelity simulation infrastructure builds on past development of single-wizard simulation tools for multiparty multimodal interactions involving speech, pen, and visual input [1]. In the new infrastructure, a dual-wizard simulation environment was developed that supports (1) real-time tracking, analysis, and system adaptivity to a user's speech and pen paralinguistic signal features (e.g., speech amplitude, pen pressure), as well as the semantic content of their input. This simulation also supports (2) transparent user training to adapt their speech and pen signal features in a manner that enhances the reliability of system functioning, i.e., the design of mutually-adaptive interfaces. To accomplish these objectives, this new environment also is capable of handling (3) dynamic streaming digital pen input. We illustrate the performance of the simulation infrastructure during longitudinal empirical research in which a user-adaptive interface was designed for implicit system engagement based exclusively on users' speech amplitude and pen pressure [2]. While using this dual-wizard simulation method, the wizards responded successfully to over 3,000 user inputs with 95-98% accuracy and a joint wizard response time of less than 1.0 second during speech interactions and 1.65 seconds during pen interactions. Furthermore, the interactions they handled involved naturalistic multiparty meeting data in which high school students were engaged in peer tutoring, and all participants believed they were interacting with a fully functional system. This type of simulation capability enables a new level of flexibility and sophistication in multimodal interface design, including the development of implicit multimodal interfaces that place minimal cognitive load on users during mobile, educational, and other applications.

© All rights reserved Cohen et al. and/or their publisher

2006
 
Edit | Del

Oviatt, Sharon, Arthur, Alex and Cohen, Julia (2006): Quiet interfaces that help students think. In: Proceedings of the ACM Symposium on User Interface Software and Technology 2006. pp. 191-200. Available online

As technical as we have become, modern computing has not permeated many important areas of our lives, including mathematics education which still involves pencil and paper. In the present study, twenty high school geometry students varying in ability from low to high participated in a comparative assessment of math problem solving using existing pencil and paper work practice (PP), and three different interfaces: an Anoto-based digital stylus and paper interface (DP), pen tablet interface (PT), and graphical tablet interface (GT). Cognitive Load Theory correctly predicted that as interfaces departed more from familiar work practice (GT > PT > DP), students would experience greater cognitive load such that performance would deteriorate in speed, attentional focus, meta-cognitive control, correctness of problem solutions, and memory. In addition, low-performing students experienced elevated cognitive load, with the more challenging interfaces (GT, PT) disrupting their performance disproportionately more than higher performers. The present results indicate that Cognitive Load Theory provides a coherent and powerful basis for predicting the rank ordering of users' performance by type of interface. In the future, new interfaces for areas like education and mobile computing could benefit from designs that minimize users' load so performance is more adequately supported.

© All rights reserved Oviatt et al. and/or ACM Press

 
Edit | Del

Barthelmess, Paulo, Kaiser, Edward, Lunsford, Rebecca, McGee, David, Cohen, Philip and Oviatt, Sharon (2006): Human-centered collaborative interaction. In: Proceedings of the 2006 ACM International Workshop on Human-Centered Multimedia 2006. pp. 1-8. Available online

Recent years have witnessed an increasing shift in interest from single user multimedia/multimodal interfaces towards support for interaction among groups of people working closely together, e.g. during meetings or problem solving sessions. However, the introduction of technology to support collaborative practices has not been devoid of problems. It is not uncommon that technology meant to support collaboration may introduce disruptions and reduce group effectiveness. Human-centered multimedia and multimodal approaches hold a promise of providing substantially enhanced user experiences by focusing attention on human perceptual and motor capabilities, and on actual user practices. In this paper we examine the problem of providing effective support for collaboration, focusing on the role of human-centered approaches that take advantage of multimodality and multimedia. We show illustrative examples that demonstrate human-centered multimodal and multimedia solutions that provide mechanisms for dealing with the intrinsic complexity of human-human interaction support.

© All rights reserved Barthelmess et al. and/or ACM Press

 
Edit | Del

Lunsford, Rebecca and Oviatt, Sharon (2006): Human perception of intended addressee during computer-assisted meetings. In: Proceedings of the 2006 International Conference on Multimodal Interfaces 2006. pp. 20-27. Available online

Recent research aims to develop new open-microphone engagement techniques capable of identifying when a speaker is addressing a computer versus human partner, including during computer-assisted group interactions. The present research explores: (1) how accurately people can judge whether an intended interlocutor is a human versus computer, (2) which linguistic, acoustic-prosodic, and visual information sources they use to make these judgments, and (3) what type of systematic errors are present in their judgments. Sixteen participants were asked to determine a speaker's intended addressee based on actual videotaped utterances matched on illocutionary force, which were played back as: (1) lexical transcriptions only, (2) audio-only, (3) visual-only, and (4) audio-visual information. Perhaps surprisingly, people's accuracy in judging human versus computer addressees did not exceed chance levels with lexical-only content (46%). As predicted, accuracy improved significantly with audio (58%), visual (57%), and especially audio-visual information (63%). Overall, accuracy in detecting human interlocutors was significantly worse than judging computer ones, and specifically worse when only visual information was present because speakers often looked at the computer when addressing peers. In contrast, accuracy in judging computer interlocutors was significantly better whenever visual information was present than with audio alone, and it yielded the highest accuracy levels observed (86%). Questionnaire data also revealed that speakers' gaze, peers' gaze, and tone of voice were considered the most valuable information sources. These results reveal that people rely on cues appropriate for interpersonal interactions in determining computer- versus human-directed speech during mixed human-computer interactions, even though this degrades their accuracy. Future systems that process actual rather than expected communication patterns potentially could be designed that perform better than humans.

© All rights reserved Lunsford and Oviatt and/or their publisher

 
Edit | Del

Arthur, Alexander M., Lunsford, Rebecca, Wesson, Matt and Oviatt, Sharon (2006): Prototyping novel collaborative multimodal systems: simulation, data collection and analysis tools for the next decade. In: Proceedings of the 2006 International Conference on Multimodal Interfaces 2006. pp. 209-216. Available online

To support research and development of next-generation multimodal interfaces for complex collaborative tasks, a comprehensive new infrastructure has been created for collecting and analyzing time-synchronized audio, video, and pen-based data during multi-party meetings. This infrastructure needs to be unobtrusive and to collect rich data involving multiple information sources of high temporal fidelity to allow the collection and annotation of simulation-driven studies of natural human-human-computer interactions. Furthermore, it must be flexibly extensible to facilitate exploratory research. This paper describes both the infrastructure put in place to record, encode, playback and annotate the meeting-related media data, and also the simulation environment used to prototype novel system concepts.

© All rights reserved Arthur et al. and/or their publisher

 
Edit | Del

Lunsford, Rebecca, Oviatt, Sharon and Arthur, Alexander M. (2006): Toward open-microphone engagement for multiparty interactions. In: Proceedings of the 2006 International Conference on Multimodal Interfaces 2006. pp. 273-280. Available online

There currently is considerable interest in developing new open-microphone engagement techniques for speech and multimodal interfaces that perform robustly in complex mobile and multiparty field environments. State-of-the-art audio-visual open-microphone engagement systems aim to eliminate the need for explicit user engagement by processing more implicit cues that a user is addressing the system, which results in lower cognitive load for the user. This is an especially important consideration for mobile and educational interfaces due to the higher load required by explicit system engagement. In the present research, longitudinal data were collected with six triads of high-school students who engaged in peer tutoring on math problems with the aid of a simulated computer assistant. Results revealed that amplitude was 3.25dB higher when users addressed a computer rather than human peer when no lexical marker of intended interlocutor was present, and 2.4dB higher for all data. These basic results were replicated for both matched and adjacent utterances to computer versus human partners. With respect to dialogue style, speakers did not direct a higher ratio of commands to the computer, although such dialogue differences have been assumed in prior work. Results of this research reveal that amplitude is a powerful cue marking a speaker's intended addressee, which should be leveraged to design more effective microphone engagement during computer-assisted multiparty interactions.

© All rights reserved Lunsford et al. and/or their publisher

2005
 
Edit | Del

Oviatt, Sharon, Lunsford, Rebecca and Coulston, Rachel (2005): Individual differences in multimodal integration patterns: what are they and why do they exist?. In: Proceedings of ACM CHI 2005 Conference on Human Factors in Computing Systems 2005. pp. 241-249. Available online

Techniques for information fusion are at the heart of multimodal system design. To develop new user-adaptive approaches for multimodal fusion, the present research investigated the stability and underlying cause of major individual differences that have been documented between users in their multimodal integration pattern. Longitudinal data were collected from 25 adults as they interacted with a map system over six weeks. Analyses of 1,100 multimodal constructions revealed that everyone had a dominant integration pattern, either simultaneous or sequential, which was 95-96% consistent and remained stable over time. In addition, coherent behavioral and linguistic differences were identified between these two groups. Whereas performance speed was comparable, sequential integrators made only half as many errors and excelled during new or complex tasks. Sequential integrators also had more precise articulation (e.g., fewer disfluencies), although their speech rate was no slower. Finally, sequential integrators more often adopted terse and direct command-style language, with a smaller and less varied vocabulary, which appeared focused on achieving error-free communication. These distinct interaction patterns are interpreted as deriving from fundamental differences in reflective-impulsive cognitive style. Implications of these findings are discussed for the design of adaptive multimodal systems with substantially improved performance characteristics.

© All rights reserved Oviatt et al. and/or ACM Press

 
Edit | Del

Lunsford, Rebecca, Oviatt, Sharon and Coulston, Rachel (2005): Audio-visual cues distinguishing self- from system-directed speech in younger and older adults. In: Proceedings of the 2005 International Conference on Multimodal Interfaces 2005. pp. 167-174. Available online

In spite of interest in developing robust open-microphone engagement techniques for mobile use and natural field contexts, there currently are no reliable techniques available. One problem is the lack of empirically-grounded models as guidance for distinguishing how users' audio-visual activity actually differs systematically when addressing a computer versus human partner. In particular, existing techniques have not been designed to handle high levels of user self talk as a source of "noise," and they typically assume that a user is addressing the system only when facing it while speaking. In the present research, data were collected during two related studies in which adults aged 18-89 interacted multimodally using speech and pen with a simulated map system. Results revealed that people engaged in self talk prior to addressing the system over 30% of the time, with no decrease in younger adults' rate of self talk compared with elders. Speakers' amplitude was lower during 96% of their self talk, with a substantial 26 dBr amplitude separation observed between self- and system-directed speech. The magnitude of speaker's amplitude separation ranged from approximately 10-60 dBr and diminished with age, with 79% of the variance predictable simply by knowing a person's age. In contrast to the clear differentiation of intended addressee revealed by amplitude separation, gaze at the system was not a reliable indicator of speech directed to the system, with users looking at the system over 98% of the time during both self- and system-directed speech. Results of this research have implications for the design of more effective open-microphone engagement for mobile and pervasive systems.

© All rights reserved Lunsford et al. and/or their publisher

2004
 
Edit | Del

Oviatt, Sharon and Seneff, Stephanie (2004): Introduction to mobile and adaptive conversational interfaces. In ACM Transactions on Computer-Human Interaction, 11 (3) pp. 237-240. Available online

 
Edit | Del

Oviatt, Sharon, Darves, Courtney and Coulston, Rachel (2004): Toward adaptive conversational interfaces: Modeling speech convergence with animated personas. In ACM Transactions on Computer-Human Interaction, 11 (3) pp. 300-328. Available online

The design of robust interfaces that process conversational speech is a challenging research direction largely because users' spoken language is so variable. This research explored a new dimension of speaker stylistic variation by examining whether users' speech converges systematically with the text-to-speech (TTS) heard from a software partner. To pursue this question, a study was conducted in which twenty-four 7 to 10-year-old children conversed with animated partners that embodied different TTS voices. An analysis of children's amplitude, durational features, and dialogue response latencies confirmed that they spontaneously adapt several basic acoustic-prosodic

© All rights reserved Oviatt et al. and/or ACM Press

 
Edit | Del

Oviatt, Sharon, Coulston, Rachel and Lunsford, Rebecca (2004): When do we interact multimodally?: cognitive load and multimodal communication patterns. In: Proceedings of the 2004 International Conference on Multimodal Interfaces 2004. pp. 129-136. Available online

Mobile usage patterns often entail high and fluctuating levels of difficulty as well as dual tasking. One major theme explored in this research is whether a flexible multimodal interface supports users in managing cognitive load. Findings from this study reveal that multimodal interface users spontaneously respond to dynamic changes in their own cognitive load by shifting to multimodal communication as load increases with task difficulty and communicative complexity. Given a flexible multimodal interface, users' ratio of multimodal (versus unimodal) interaction increased substantially from 18.6% when referring to established dialogue context to 77.1% when required to establish a new context, a +315% relative increase. Likewise, the ratio of users' multimodal interaction increased significantly as the tasks became more difficult, from 59.2% during low difficulty tasks, to 65.5% at moderate difficulty, 68.2% at high and 75.0% at very high difficulty, an overall

© All rights reserved Oviatt et al. and/or their publisher

2003
 
Edit | Del

Oviatt, Sharon (2003): Flexible and robust multimodal interfaces for universal access. In Universal Access in the Information Society, 2 (2) pp. 91-95. Available online

Multimodal interfaces are inherently flexible, which is a key feature that makes them suitable for both universal access and next-generation mobile computing. Recent studies also have demonstrated that multimodal architectures can improve the performance stability and overall robustness of the recognition-based component technologies they incorporate (e.g., speech, vision, pen input). This paper reviews data from two recent studies in which a multimodal architecture suppressed errors and stabilized system performance for accented speakers and during mobile use. It concludes with a discussion of key issues in the design of future multimodal interfaces for diverse user groups.

© All rights reserved Oviatt and/or Springer Verlag

 
Edit | Del

Oviatt, Sharon, Coulston, Rachel, Tomko, Stefanie, Xiao, Benfang, Lunsford, Rebecca, Wesson, Matt and Carmichael, Lesley (2003): Toward a theory of organized multimodal integration patterns during human-computer interaction. In: Proceedings of the 2003 International Conference on Multimodal Interfaces 2003. pp. 44-51. Available online

As a new generation of multimodal systems begins to emerge, one dominant theme will be the integration and synchronization requirements for combining modalities into robust whole systems. In the present research, quantitative modeling is presented on the organization of users' speech and pen multimodal integration patterns. In particular, the potential malleability of users' multimodal integration patterns is explored, as well as variation in these patterns during system error handling and tasks varying in difficulty. Using a new dual-wizard simulation method, data was collected from twelve adults as they interacted with a map-based task using multimodal speech and pen input. Analyses based on over 1600 multimodal constructions revealed that users' dominant multimodal integration pattern was resistant to change, even when strong selective reinforcement was delivered to encourage switching from a sequential to simultaneous integration pattern, or vice versa. Instead, both sequential and simultaneous integrators showed evidence of entrenching further in their dominant integration patterns (i.e., increasing either their inter-modal lag or signal overlap) over the course of an interactive session, during system error handling, and when completing increasingly difficult tasks. In fact, during error handling these changes in the co-timing of multimodal signals became the main feature of hyper-clear multimodal language, with elongation of individual signals either attenuated or absent. Whereas Behavioral/Structuralist theory cannot account for these data, it is argued that Gestalt theory provides a valuable framework and insights into multimodal interaction. Implications of these findings are discussed for the development of a coherent theory of multimodal integration during human-computer interaction, and for the design of a new class of adaptive multimodal interfaces.

© All rights reserved Oviatt et al. and/or their publisher

 
Edit | Del

Xiao, Benfang, Lunsford, Rebecca, Coulston, Rachel, Wesson, Matt and Oviatt, Sharon (2003): Modeling multimodal integration patterns and performance in seniors: toward adaptive processing of individual differences. In: Proceedings of the 2003 International Conference on Multimodal Interfaces 2003. pp. 265-272. Available online

Multimodal interfaces are designed with a focus on flexibility, although very few currently are capable of adapting to major sources of user, task, or environmental variation. The development of adaptive multimodal processing techniques will require empirical guidance from quantitative modeling on key aspects of individual differences, especially as users engage in different types of tasks in different usage contexts. In the present study, data were collected from fifteen 66- to 86-year-old healthy seniors as they interacted with a map-based flood management system using multimodal speech and pen input. A comprehensive analysis of multimodal integration patterns revealed that seniors were classifiable as either simultaneous or sequential integrators, like children and adults. Seniors also demonstrated early predictability and a high degree of consistency in their dominant integration pattern. However, greater individual differences in multimodal integration generally were evident in this population. Perhaps surprisingly, during sequential constructions seniors' intermodal lags were no longer in average and maximum duration than those of younger adults, although both of these groups had longer maximum lags than children. However, an analysis of seniors' performance did reveal lengthy latencies before initiating a task, and high rates of self talk and task-critical errors while completing spatial tasks. All of these behaviors were magnified as the task difficulty level increased. Results of this research have implications for the design of adaptive processing strategies appropriate for seniors' applications, especially for the development of temporal thresholds used during multimodal fusion. The long-term goal of this research is the design of high-performance multimodal systems that adapt to a full spectrum of diverse users, supporting tailored and robust future systems.

© All rights reserved Xiao et al. and/or their publisher

2001
 
Edit | Del

Oviatt, Sharon (2001): Designing robust multimodal systems for universal access. In: Proceedings of the 2001 EC/NSF Workshop on Universal Accessibility of Ubiquitous Computing 2001. pp. 71-74. Available online

Multimodal interfaces are being developed that permit our highly skilled and coordinated communicative behavior to control system interactions in a more transparent and flexible interface experience than ever before. As applications become more complex, a single modality alone does not permit varied users to interact effectively across different tasks and usage environments [11]. However, a flexible multimodal interface offers people the choice to use a combination of modalities, or to switch to a better-suited modality, depending on the specifics of their abilities, the task, and the usage conditions. This paper will begin by summarizing some of the primary advantages of multimodal interfaces. In particular, it will discuss the inherent flexibility of multimodal interfaces, which is a key feature that makes them suitable for universal access and mobile computing. It also will discuss the role of multimodal architectures in improving the robustness and performance stability of recognition-based systems. Data will be reviewed from two recent studies in which a multimodal architecture suppressed errors and stabilized system performance for accented nonnative speakers and during mobile use. The paper will conclude by discussing the implications of this research for designing multimodal interfaces for the elderly, as well as the need for future work in this area.

© All rights reserved Oviatt and/or ACM Press

2000
 
Edit | Del

Oviatt, Sharon (2000): Multimodal System Processing in Mobile Environments. In: Ackerman, Mark S. and Edwards, Keith (eds.) Proceedings of the 13th annual ACM symposium on User interface software and technology November 06 - 08, 2000, San Diego, California, United States. pp. 21-30. Available online

 
Edit | Del

Oviatt, Sharon, Cohen, Philip R., Wu, Lizhong, Duncan, Lisbeth, Suhm, Bernhard, Bers, Josh, Holzman, Thomas C., Winograd, Terry, Landay, James A., Larson, Jim and Ferro, David (2000): Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. In Human-Computer Interaction, 15 (4) pp. 263-322.

The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.

© All rights reserved Oviatt et al. and/or Taylor and Francis

1999
 
Edit | Del

Oviatt, Sharon (1999): Mutual Disambiguation of Recognition Errors in a Multimodal Architecture. In: Altom, Mark W. and Williams, Marian G. (eds.) Proceedings of the ACM CHI 99 Human Factors in Computing Systems Conference May 15-20, 1999, Pittsburgh, Pennsylvania. pp. 576-583. Available online

As a new generation of multimodal/media systems begins to define itself, researchers are attempting to learn how to combine different modes into strategically integrated whole systems. In theory, well designed multimodal systems should be able to integrate complementary modalities in a manner that supports mutual disambiguation (MD) of errors and leads to more robust performance. In this study, over 2,000 multimodal utterances by both native and accented speakers of English were processed by a multimodal system, and then logged and analyzed. The results confirmed that multimodal systems can indeed support significant levels of MD, and also higher levels of MD for the more challenging accented users. As a result, although speech recognition as a stand-alone performed far more poorly for accented speakers, their multimodal recognition rates did not differ from those of native speakers. Implications are discussed for the development of future multimodal architectures that can perform in a more robust and stable manner than individual recognition technologies. Also discussed is the design of interfaces that support diversity in tangible ways, and that function well under challenging real-world usage conditions.

© All rights reserved Oviatt and/or ACM Press

1997
 
Edit | Del

Oviatt, Sharon, DeAngeli, Antonella and Kuhn, Karen (1997): Integration and Synchronization of Input Modes during Multimodal Human-Computer Interaction. In: Pemberton, Steven (ed.) Proceedings of the ACM CHI 97 Human Factors in Computing Systems Conference March 22-27, 1997, Atlanta, Georgia. pp. 415-422. Available online

Our ability to develop robust multimodal systems will depend on knowledge of the natural integration patterns that typify people's combined use of different input modes. To provide a foundation for theory and design, the present research analyzed multimodal interaction while people spoke and wrote to a simulated dynamic map system. Task analysis revealed that multimodal interaction occurred most frequently during spatial location commands, and with intermediate frequency during selection commands. In addition, microanalysis of input signals identified sequential, simultaneous, point-and-speak, and compound integration patterns, as well as data on the temporal precedence of modes and on inter-modal lags. In synchronizing input streams, the temporal precedence of writing over speech was a major theme, with pen input conveying location information first in a sentence. Linguistic analysis also revealed that the spoken and written modes consistently supplied complementary semantic information, rather than redundant. One long-term goal of this research is the development of predictive models of natural modality integration to guide the design of emerging multimodal architectures.

© All rights reserved Oviatt et al. and/or ACM Press

 
Edit | Del

Oviatt, Sharon and Wahlster, Wolfgang (1997): Introduction to This Special Issue on Multimodal Interfaces. In Human-Computer Interaction, 12 (1) pp. 1-5.

 
Edit | Del

Oviatt, Sharon (1997): Multimodal Interactive Maps: Designing for Human Performance. In Human-Computer Interaction, 12 (1) pp. 93-129.

Dynamic interactive maps with powerful interface capabilities are beginning to emerge for a variety of geographical information systems, including ones situated on portables for travelers, students, business and service people, and others working in field settings. In part through the design of more expressive and flexible input capabilities, these map systems can provide new capabilities not supported by conventional interfaces of the past. In this research, interfaces supporting spoken, pen-based, and multimodal input were analyzed for their effectiveness in interacting with map systems. Input modality and map display format were varied as people completed realistic tasks with a simulated map system. The results identified a constellation of performance difficulties with speech-only map interactions, including elevated performance errors, lengthier task completion time, and more complex and disfluent input -- problems that declined substantially when people could interact multimodally. These difficulties also mirrored a strong user preference to interact multimodally. The error-proneness and unacceptability of speech-only input to maps was traced to people's difficulty articulating spatially oriented descriptions. Analyses also indicated that map displays can be structured to minimize performance errors and disfluencies effectively. Implications of this research are discussed for the design of high-performance multimodal interfaces for future map systems.

© All rights reserved Oviatt and/or Taylor and Francis

1996
 
Edit | Del

Oviatt, Sharon (1996): Multimodal Interfaces for Dynamic Interactive Maps. In: Tauber, Michael J., Bellotti, Victoria, Jeffries, Robin, Mackinlay, Jock D. and Nielsen, Jakob (eds.) Proceedings of the ACM CHI 96 Human Factors in Computing Systems Conference April 14-18, 1996, Vancouver, Canada. pp. 95-102. Available online

Dynamic interactive maps with transparent but powerful human interface capabilities are beginning to emerge for a variety of geographical information systems, including ones situated on portables for travelers, students, business and service people, and others working in field settings. In the present research, interfaces supporting spoken, pen-based, and multimodal input were analyze for their potential effectiveness in interacting with this new generation of map systems. Input modality (speech, writing, multimodal) and map display format (highly versus minimally structured) were varied in a within-subject factorial design as people completed realistic tasks with a simulated map system. The results identified a constellation of performance difficulties associated with speech-only map interactions, including elevated performance errors, spontaneous disfluencies, and lengthier task completion time -- problems that declined substantially when people could interact multimodally with the map. These performance advantages also mirrored a strong user preference to interact multimodally. The error-proneness and unacceptability of speech-only input to maps was attributed in large part to people's difficulty generating spoken descriptions of spatial location. Analyses also indicated that map display format can be used to minimize performance errors and disfluencies, and map interfaces that guide users' speech toward brevity can nearly eliminate disfluencies. Implications of this research are discussed for the design of high-performance multimodal interfaces for future map systems.

© All rights reserved Oviatt and/or ACM Press

 
Add publication
Show list on your website
 
 

Join our community and advance:

Your
Skills

Your
Network

Your
Career

 
Join our community!
 
 
 

Page Information

Page maintainer: The Editorial Team
URL: http://www.interaction-design.org/references/authors/sharon_oviatt.html