Number of co-authors:58
Number of publications with 3 favourite co-authors:Jay Pittman:Michael Johnston:Ira A. Smith:
Philip R. Cohen's 3 most productive colleagues in number of publications:James A. Landay:91Steven K. Feiner:76Terry Winograd:59
go to course
Gestalt Psychology and Web Design: The Ultimate Guide
Starts the day after tomorrow !
go to course
The Psychology of Online Sales: The Beginner's Guide
89% booked. Starts in 6 days
Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess
User Experience and Experience Design !
Our Latest Books
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
Philip R. Cohen
Publications by Philip R. Cohen (bibliography)
Cohen, Philip R. (2008): Natural interfaces in the field: the case of pen and paper. In: Digalakis, Vassilios, Potamianos, Alexandros, Turk, Matthew, Pieraccini, Roberto and Ivanov, Yuri (eds.) Proceedings of the 10th International Conference on Multimodal Interfaces - ICMI 2008 October 20-22, 2008, Chania, Crete, Greece. pp. 1-2. Available online
Cohen, Philip R., Swindells, Colin, Oviatt, Sharon L. and Arthur, Alexander M. (2008): A high-performance dual-wizard infrastructure for designing speech, pen, and multimodal interfaces. In: Digalakis, Vassilios, Potamianos, Alexandros, Turk, Matthew, Pieraccini, Roberto and Ivanov, Yuri (eds.) Proceedings of the 10th International Conference on Multimodal Interfaces - ICMI 2008 October 20-22, 2008, Chania, Crete, Greece. pp. 137-140. Available online
Barthelmess, Paulo, Kaiser, Edward C., Huang, Xiao, McGee, David and Cohen, Philip R. (2006): Collaborative multimodal photo annotation over digital paper. In: Quek, Francis K. H., Yang, Jie, Massaro, Dominic W., Alwan, Abeer A. and Hazen, Timothy J. (eds.) Proceedings of the 8th International Conference on Multimodal Interfaces - ICMI 2006 November 2-4, 2006, Banff, Alberta, Canada. pp. 131-132. Available online
Barthelmess, Paulo, Kaiser, Edward C., Huang, Xiao, McGee, David and Cohen, Philip R. (2006): Collaborative multimodal photo annotation over digital paper. In: Quek, Francis K. H., Yang, Jie, Massaro, Dominic W., Alwan, Abeer A. and Hazen, Timothy J. (eds.) Proceedings of the 8th International Conference on Multimodal Interfaces - ICMI 2006 November 2-4, 2006, Banff, Alberta, Canada. pp. 4-11. Available online
Kumar, Sanjeev, Cohen, Philip R. and Coulston, Rachel (2004): Multimodal interaction under exerted conditions in a natural field setting. In: Sharma, Rajeev, Darrell, Trevor, Harper, Mary P., Lazzari, Gianni and Turk, Matthew (eds.) Proceedings of the 6th International Conference on Multimodal Interfaces - ICMI 2004 October 13-15, 2004, State College, PA, USA. pp. 227-234. Available online
Cohen, Philip R. and McGee, David (2004): Tangible multimodal interfaces for safety-critical applications. In Communications of the ACM, 47 (1) pp. 41-46. Available online
Reeves, Leah, Lai, Jennifer C., Larson, James A., Oviatt, Sharon L., Balaji, T. S., Buisine, Stephanie, Collings, Penny, Cohen, Philip R., Kraal, Ben, Martin, Jean-Claude, McTear, Michael F., Raman, T. V., Stanney, Kay M., Su, Hui and Wang, Qian Ying (2004): Guidelines for multimodal user interface design. In Communications of the ACM, 47 (1) pp. 57-59. Available online
Kumar, Sanjeev, Cohen, Philip R. and Coulston, Rachel (2004): Multimodal interaction under exerted conditions in a natural field setting. In: Proceedings of the 2004 International Conference on Multimodal Interfaces 2004. pp. 227-234. Available online
This paper evaluates the performance of a multimodal interface under exerted conditions in a natural field setting. The subjects in the present study engaged in a strenuous activity while multimodally performing map-based tasks using handheld computing devices. This activity made the users breathe heavily and become fatigued during the course of the study. We found that the performance of both speech and gesture recognizers degraded as a function of exertion, while the overall multimodal success rate was stable. This stabilization is accounted for by the mutual disambiguation of modalities, which increases significantly with exertion. The system performed better for subjects with a greater level of physical fitness, as measured by their running speed, with more stable multimodal performance and a later degradation of speech and gesture recognition as compared with subjects who were less fit. The findings presented in this paper have a significant impact on design decisions for multimodal interfaces targeted towards highly mobile and exerted users in field environments.
© All rights reserved Kumar et al. and/or their publisher
Kaiser, Edward C., Olwal, Alex, McGee, David, Benko, Hrvoje, Corradini, Andrea, Li, Xiaoguang, Cohen, Philip R. and Feiner, Steven K. (2003): Mutual disambiguation of 3D multimodal interaction in augmented and virtual reality. In: Oviatt, Sharon L., Darrell, Trevor, Maybury, Mark T. and Wahlster, Wolfgang (eds.) Proceedings of the 5th International Conference on Multimodal Interfaces - ICMI 2003 November 5-7, 2003, Vancouver, British Columbia, Canada. pp. 12-19. Available online
McGee, David R., Cohen, Philip R., Wesson, R. Matthews and Horman, Sheilah (2002): Comparing paper and tangible, multimodal tools. In: Terveen, Loren (ed.) Proceedings of the ACM CHI 2002 Conference on Human Factors in Computing Systems Conference April 20-25, 2002, Minneapolis, Minnesota. pp. 407-414.
Cohen, Philip R., Coulston, Rachel and Krout, Kelly (2002): Multimodal Interaction During Multiparty Dialogues: Initial Results. In: 4th IEEE International Conference on Multimodal Interfaces - ICMI 2002 14-16 October, 2002, Pittsburgh, PA, USA. pp. 448-453. Available online
Corradini, Andrea, Wesson, Richard M. and Cohen, Philip R. (2002): A Map-Based System Using Speech and 3D Gestures for Pervasive Computing. In: 4th IEEE International Conference on Multimodal Interfaces - ICMI 2002 14-16 October, 2002, Pittsburgh, PA, USA. pp. 191-196. Available online
Corradini, Andrea, Wesson, Richard M. and Cohen, Philip R. (2002): A Map-Based System Using Speech and 3D Gestures for Pervasive Computing. In: Proceedings of the 2002 International Conference on Multimodal Interfaces 2002. p. 191. Available online
We describe an augmentation of Quickset, a multimodal voice/pen system that allows users to create and control map-based, collaborative, interactive simulations. In this paper, we report on our extension of the graphical pen input mode from stylus/mouse to 3D hand movements. To do this, the map is projected onto a virtual plane in space, specified by the operator before the start of the interactive session. We then use our geometric model to compute the intersection of hand movements with the virtual plane, translating these into map coordinates on the appropriate system. The goal of this research is the creation of a body-centered, multimodal architecture employing both speech and 3D hand gestures, which seamlessly and unobtrusively supports distributed interaction. The augmented system, built on top of an existing architecture, also provides an improved visualization, management and awareness of a shared understanding. Potential applications of this work include tele-medicine, battlefield management and any kind of collaborative decision-making during which users may wish to be mobile.
© All rights reserved Corradini et al. and/or their publisher
Cohen, Philip R., Coulston, Rachel and Krout, Kelly (2002): Multimodal Interaction During Multiparty Dialogues: Initial Results. In: Proceedings of the 2002 International Conference on Multimodal Interfaces 2002. p. 448. Available online
Groups of people involved in collaboration on a task often incorporate the objects in their mutual environment into their discussion. With this comes physical reference to these 3-D objects, including: gesture, gaze, haptics, and possibly other modalities, over and above the speech we commonly associate with human-human communication. From a technological perspective, this human style of communication not only poses the challenge for researchers to create multimodal systems capable of integrating input from various modalities, but also to do it well enough that it supports -- but does not interfere with -- the primary goal of the collaborators, which is their own human-human interaction. This paper offers a first step towards building such multimodal systems for supporting face-to-face collaborative work by providing both qualitative and quantitative analyses of multiparty multimodal dialogues in a field setting.
© All rights reserved Cohen et al. and/or their publisher
McGee, David R., Pavel, Misha and Cohen, Philip R. (2001): Context Shifts: Extending the Meanings of Physical Objects with Language. In Human-Computer Interaction, 16 (2) pp. 351-362.
The influence that language has on contextual interpretations cannot be ignored by computer systems that strive to be context aware. Rather, once systems are designed to perceive language and other forms of human action, these interpretative processes will of necessity be context dependent. As an example, we illustrate how people simply and naturally create new contexts by naming and referring. We then describe Rasa, a mixed-reality system that observes and understands how users in a military command post create such contexts as part of the process of maintaining situational awareness. In such environments, commander's maps are covered with Post-it notes. These paper artifacts are contextually transformed to represent units in the field by the application of multimodal language. Rasa understands this language, thereby allowing paper-based tools to become the basis for digital interaction. Finally, we argue that architectures for such context-aware systems will need to be built to process the inherent ambiguity and uncertainty of human input in order to be effective.
© All rights reserved McGee et al. and/or Taylor and Francis
McGee, David R. and Cohen, Philip R. (2001): Creating Tangible Interfaces by Augmenting Physical Objects with Multimodal Language. In: International Conference on Intelligent User Interfaces 2001 January 14-17, 2001, Sanata Fe, New Mexico, USA. pp. 113-119. Available online
Rasa is a tangible augmented reality environment that digitally enhances the existing paper-based command and control capability in a military command post. By observing and understanding the users' speech, pen, and touch-based multimodal language, Rasa computationally augments the physical objects on a command post map, linking these items to digital representations of the same-for example, linking a paper map to the world and Post-it notes to military units. Herein, we give a thorough account of Rasa's underlying multiagent framework, and its recognition, understanding, and multimodal integration components. Moreover, we examine five properties of language-generativity, comprehensibility, compositionality, referentiality, and, at times, persistence-that render it suitable as an augmentation approach, contrasting these properties to those of other augmentation methods. It is these properties of language that allow users of Rasa to augment physical objects, transforming them into tangible interfaces.
© All rights reserved McGee and Cohen and/or ACM Press
Oviatt, Sharon, Cohen, Philip R., Wu, Lizhong, Duncan, Lisbeth, Suhm, Bernhard, Bers, Josh, Holzman, Thomas C., Winograd, Terry, Landay, James A., Larson, Jim and Ferro, David (2000): Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. In Human-Computer Interaction, 15 (4) pp. 263-322.
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.
© All rights reserved Oviatt et al. and/or Taylor and Francis
Oviatt, Sharon L. and Cohen, Philip R. (2000): Multimodal Interfaces That Process What Comes Naturally. In Communications of the ACM, 43 (3) pp. 45-53. Available online
Cohen, Philip R., McGee, David, Oviatt, Sharon L., Wu, Lizhong, Clow, Josh, King, Rob, Julier, Simon and Rosenblum, Lawrence J. (1999): Multimodal Interaction for 2D and 3D Environments. In IEEE Computer Graphics and Applications, 19 (4) pp. 10-13. Available online
Cohen, Philip R., Johnston, Michael, McGee, David, Oviatt, Sharon L., Pittman, Jay, Smith, Ira A., Chen, Liang and Clow, Josh (1997): QuickSet: Multimodal Interaction for Distributed Applications. In: ACM Multimedia 1997 1997. pp. 31-40. Available online
Cohen, Philip R. (1992): The Role of Natural Language in a Multimodal Interface. In: Mackinlay, Jock D. and Green, Mark (eds.) Proceedings of the 5th annual ACM symposium on User interface software and technology November 15 - 18, 1992, Monteray, California, United States. pp. 143-149. Available online
Although graphics and direct manipulation are effective interface technologies for some classes of problems, they are limited in many ways. In particular, they provide little support for identifying objects not on the screen, for specifying temporal relations, for identifying and operating on large sets and subsets of entries, and for using the context of interaction. One the other hand, these are precisely strengths of natural language. This paper presents and interface that blends natural language processing and direct manipulation technologies, using each for their characteristic advantages. Specifically, the paper shows how to use natural language to describe objects and temporal relations, and how to use direct manipulation for overcoming hard natural language problems involving the establishment and use of context and pronominal reference. This work has been implemented in SRI's Shoptalk system, a prototype information and decision-support system for manufacturing.
© All rights reserved Cohen and/or ACM Press
Cohen, Philip R. (1991): Computer Dialogue Laboratory, SRI International. In: Robertson, Scott P., Olson, Gary M. and Olson, Judith S. (eds.) Proceedings of the ACM CHI 91 Human Factors in Computing Systems Conference April 28 - June 5, 1991, New Orleans, Louisiana. pp. 469-470. Available online
Cohen, Philip R., Dalrymple, Mary, Moran, Douglas B., Pereira, Fernando C. N., Sullivan, Joseph W., Gargan Jr, Robert A., Schlossberg, Jon L. and Tyler, Sherman W. (1989): Synergistic Use of Direct Manipulation and Natural Language. In: Bice, Ken and Lewis, Clayton H. (eds.) Proceedings of the ACM CHI 89 Human Factors in Computing Systems Conference April 30 - June 4, 1989, Austin, Texas. pp. 227-233.
This paper shows how the integration of natural language with direct manipulation produces a multimodal interface that overcomes limitations of these techniques when used separately. Natural language helps direct manipulation in being able to specify objects and actions by description, while direct manipulation enables users to learn which objects and actions are available in the system. Furthermore, graphical rendering and manipulation of context provides a partial solution to difficult problems of natural language anaphora.
© All rights reserved Cohen et al. and/or ACM Press
Join our community and advance:
Page maintainer: The Editorial Team