Bernhard Suhm
Current place of employment:
AVOKE Call Center Analytics / BBN TechnologiesBernhard Suhm is the Director of Professional Services in the AVOKE Call Center Analytics group at BBN Technologies. He has over 14 years of experience in speech recognition and interface usability. He recently rejoined BBN from Enterprise Integration Group (EIG), where he worked as a Sr. Consultant for two years. The Call Center Analytics group at BBN offers unique tools and services that allow call centers to optimize caller experience in a comprehensive fashion by analyzing calls end-to-end. The data-driven analyses methods that he co-developed mine end-to-end calls in a statistically rigorous way for opportunities to optimize IVR and speech dialogs, as well as how agents handle call. Prior to working at BBN, he spent several years with Interactive Systems Laboratories at Carnegie Mellon and Karlsruhe Universities on research in multilingual speech translation and multimodal interfaces. He received his Ph.D. in Computer Science 1998 from Karlsruhe University (Germany) for his research on multimodal error correction. Bernhard has co-authored several patents, and published papers on design and deployment methodologies, speech interface usability, multimodal interaction, speech-to-speech translation, and statistical language modeling.
Publications by Bernhard Suhm (bibliography)
» 2009 «
Suhm, Bernhard and Peterson, Pat (2009): Call browser: a system to improve the caller experience by analyzing live calls end-to-end. In: Proceedings of ACM CHI 2009 Conference on Human Factors in Computing Systems 2009. pp. 1313-1322. Available online
This paper describes a system that empowers practitioners to substantially improve the user experience with call center automation and agents. Unlike other approaches we analyze the caller experience in live calls end-to-end, from dialing to hangup. A web-based solution, the Call Browser provides access to hundreds or thousands of live end-to-end calls, and empowers usability practitioners and call-center analysts to systematically and efficiently evaluate the caller experience and identify usability issues. Case studies from our consulting practice illustrate how this approach reveals issues that remain hidden to traditional methods, such as log analyses, lab user studies, focus groups, and design guidelines.
Copyrights may apply
» 2002 «
Suhm, Bernhard, Bers, Josh, McCarthy, Dan, Freeman, Barbara, Getty, David, Godfrey, Katherine and Peterson, Pat (2002): A comparative study of speech in the call center: natural language call routing vs. touch-tone menus. In: Terveen, Loren (ed.) Proceedings of the ACM CHI 2002 Conference on Human Factors in Computing Systems Conference April 20-25, 2002, Minneapolis, Minnesota. pp. 283-290.
» 2001 «
Suhm, Bernhard, Myers, Brad A. and Waibel, Alex (2001): Multimodal error correction for speech user interfaces. In ACM Transactions on Computer-Human Interaction, 8 (1) pp. 60-98
Although commercial dictation systems and speech-enabled telephone voice user interfaces have become readily available, speech recognition errors remain a serious problem in the design and implementation of speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors. This article presents multimodal error correction methods that allow the user to correct recognition errors efficiently without keyboard input. Correction accuracy is maximized by novel recognition algorithms that use context information for recognizing correction input. Multimodal error correction is evaluated in the context of a prototype multimodal dictation system. The study shows that unimodal repair is less accurate than multimodal error correction. On a dictation task, multimodal correction is faster than unimodal correction by respeaking. The study also provides empirical evidence that system-initiated error correction (based on confidence measures) may not expedite error correction. Furthermore, the study suggests that recognition accuracy determines user choice between modalities: while users initially prefer speech, they learn to avoid ineffective correction modalities with experience. To extrapolate results from this user study, the article introduces a performance model of (recognition-based) multimodal interaction that predicts input speed including time needed for error correction. Applied to interactive error correction, the model predicts the impact of improvements in recognition technology on correction speeds, and the influence of recognition accuracy and correction method on the productivity of dictation systems. This model is a first step toward formalizing multimodal interaction.
Copyrights may apply
» 2000 «
Oviatt, Sharon, Cohen, Philip R., Wu, Lizhong, Duncan, Lisbeth, Suhm, Bernhard, Bers, Josh, Holzman, Thomas C., Winograd, Terry, Landay, James A., Larson, Jim and Ferro, David (2000): Designing the User Interface for Multimodal Speech and Pen-Based Gesture Applications: State-of-the-Art Systems and Future Research Directions. In Human-Computer Interaction, 15 (4) pp. 263-322
The growing interest in multimodal interface design is inspired in large part by the goals of supporting more transparent, flexible, efficient, and powerfully expressive means of human-computer interaction than in the past. Multimodal interfaces are expected to support a wider range of diverse applications, be usable by a broader spectrum of the average population, and function more reliably under realistic and challenging usage conditions. In this article, we summarize the emerging architectural approaches for interpreting speech and pen-based gestural input in a robust manner-including early and late fusion approaches, and the new hybrid symbolic-statistical approach. We also describe a diverse collection of state-of-the-art multimodal systems that process users' spoken and gestural input. These applications range from map-based and virtual reality systems for engaging in simulations and training, to field medic systems for mobile use in noisy environments, to web-based transactions and standard text-editing applications that will reshape daily computing and have a significant commercial impact. To realize successful multimodal systems of the future, many key research challenges remain to be addressed. Among these challenges are the development of cognitive theories to guide multimodal system design, and the development of effective natural language processing, dialogue processing, and error-handling techniques. In addition, new multimodal systems will be needed that can function more robustly and adaptively, and with support for collaborative multiperson use. Before this new class of systems can proliferate, toolkits also will be needed to promote software development for both simulated and functioning systems.
Copyrights may apply
» 1999 «
Suhm, Bernhard, Waibel, Alex and Myers, Brad A. (1999): Model-Based and Empirical Evaluation of Multimodal Interactive Error Correction. In: Altom, Mark W. and Williams, Marian G. (eds.) Proceedings of the ACM CHI 99 Human Factors in Computing Systems Conference May 15-20, 1999, Pittsburgh, Pennsylvania. pp. 584-591. Available online
Our research addresses the problem of error correction in speech user interfaces. Previous work hypothesized that switching modality could speed up interactive correction of recognition errors (so-called multimodal error correction). We present a user study that compares, on a dictation task, multimodal error correction with conventional interactive correction, such as speaking again, choosing from a list, and keyboard input. Results show that multimodal correction is faster than conventional correction without keyboard input, but slower than correction by typing for users with good typing skills. Furthermore, while users initially prefer speech, they learn to avoid ineffective correction modalities with experience. To extrapolate results from this user study we developed a performance model of multimodal interaction that predicts input speed including time needed for error correction. We apply the model to estimate the impact of recognition technology improvements on correction speeds and the influence of recognition accuracy and correction method on the productivity of dictation systems. Our model is a first step towards formalizing multimodal (recognition-based) interaction.
Copyrights may apply
SHOW THIS LIST ON YOUR HOMEPAGE
What do YOU think?
Give us your opinion! Do you have any comments/additions that you would like other visitors to see?
You say:
Mar 12th, 2010
Changes to this page (author)
25 Feb 2010: Enabled abstracts to be shown on Bernhard Suhm's author page.09 May 2009: Author was edited 28 Apr 2003: Added the author to the bibliography