Publication statistics

Pub. period:2004-2012
Pub. count:14
Number of co-authors:28


Number of publications with 3 favourite co-authors:

Larry S. Davis:
Bo Xie:
Boriz Katz:



Productive colleagues

Tom Yeh's 3 most productive colleagues in number of publications:

Benjamin B. Beders..:70
Robert C. Miller:42
Trevor Darrell:38

Upcoming Courses

go to course
Dynamic User Experience: Ajax Design and Usability
go to course
Gestalt Psychology and Web Design: The Ultimate Guide
92% booked. Starts in 3 days

Featured chapter

Marc Hassenzahl explains the fascinating concept of User Experience and Experience Design. Commentaries by Don Norman, Eric Reiss, Mark Blythe, and Whitney Hess

User Experience and Experience Design !


Our Latest Books

The Glossary of Human Computer Interaction
by Mads Soegaard and Rikke Friis Dam
start reading
The Social Design of Technical Systems: Building technologies for communities. 2nd Edition
by Brian Whitworth and Adnan Ahmad
start reading
Gamification at Work: Designing Engaging Business Software
by Janaki Mythily Kumar and Mario Herger
start reading
The Social Design of Technical Systems: Building technologies for communities
by Brian Whitworth and Adnan Ahmad
start reading
The Encyclopedia of Human-Computer Interaction, 2nd Ed.
by Mads Soegaard and Rikke Friis Dam
start reading

Tom Yeh


Publications by Tom Yeh (bibliography)

 what's this?
Edit | Del

Xie, Bo, Yeh, Tom, Walsh, Greg, Watkins, Ivan and Huang, Man (2012): Co-designing an e-health tutorial for older adults. In: Proceedings of the 2012 iConference 2012. pp. 240-247.

Older adults' ability to access and use electronic health information is generally low, requiring innovative approaches for improvement. An integrated e-tutorial overlays instructions onto Websites. The literature suggests integrated e-tutorials are more effective than paper or video-based tutorials for younger people, but little is known about their effectiveness for older adults. This study explores the applicability of an integrated e-health tutorial for older adults. An integrated e-tutorial, the Online Tutorial Overlay Presenter (OnTOP), added an instructional overlay to the Website. Overlay features were examined in seven participatory design sessions with seven older adults. Participatory design techniques were used to elicit participants' preferences for tutorial features. Three themes emerged: 1) using contextual cues; 2) tailoring to the learner's literacy level; and 3) enhancing interfaces with multimedia cues. These findings improved the design features of OnTOP. They also generated empirical evidence about the effects of multimedia learning among older adults.

© All rights reserved Xie et al. and/or their publisher

Edit | Del

Pedro, Jose San, Yeh, Tom and Oliver, Nuria (2012): Leveraging user comments for aesthetic aware image search reranking. In: Proceedings of the 2012 International Conference on the World Wide Web 2012. pp. 439-448.

The increasing number of images available online has created a growing need for efficient ways to search for relevant content. Text-based query search is the most common approach to retrieve images from the Web. In this approach, the similarity between the input query and the metadata of images is used to find relevant information. However, as the amount of available images grows, the number of relevant images also increases, all of them sharing very similar metadata but differing in other visual characteristics. This paper studies the influence of visual aesthetic quality in search results as a complementary attribute to relevance. By considering aesthetics, a new ranking parameter is introduced aimed at improving the quality at the top ranks when large amounts of relevant results exist. Two strategies for aesthetic rating inference are proposed: one based on visual content, another based on the analysis of user comments to detect opinions about the quality of images. The results of a user study with $58$ participants show that the comment-based aesthetic predictor outperforms the visual content-based strategy, and reveals that aesthetic-aware rankings are preferred by users searching for photographs on the Web.

© All rights reserved Pedro et al. and/or ACM Press

Edit | Del

Yeh, Tom, White, Brandyn, Pedro, Jose San, Katz, Boriz and Davis, Larry S. (2011): A case for query by image and text content: searching computer help using screenshots and keywords. In: Proceedings of the 2011 International Conference on the World Wide Web 2011. pp. 775-784.

The multimedia information retrieval community has dedicated extensive research effort to the problem of content-based image retrieval (CBIR). However, these systems find their main limitation in the difficulty of creating pictorial queries. As a result, few systems offer the option of querying by visual examples, and rely on automatic concept detection and tagging techniques to provide support for searching visual content using textual queries. This paper proposes and studies a practical multimodal web search scenario, where CBIR fits intuitively to improve the retrieval of rich information queries. Many online articles contain useful know-how knowledge about computer applications. These articles tend to be richly illustrated by screenshots. We present a system to search for such software know-how articles that leverages the visual correspondences between screenshots. Users can naturally create pictorial queries simply by taking a screenshot of the application to retrieve a list of articles containing a matching screenshot. We build a prototype comprising 150k articles that are classified into walkthrough, book, gallery, and general categories, and provide a comprehensive evaluation of this system, focusing on technical (accuracy of CBIR techniques) and usability (perceived system usefulness) aspects. We also consider the study of added value features of such a visual-supported search, including the ability to perform cross-lingual queries. We find that the system is able to retrieve matching screenshots for a wide variety of programs, across language boundaries, and provide subjectively more useful results than keyword-based web and image search engines.

© All rights reserved Yeh et al. and/or ACM Press

Edit | Del

Yeh, Tom, Chang, Tsung-Hsiang, Xie, Bo, Walsh, Greg, Watkins, Ivan, Wongsuphasawat, Krist, Huang, Man, Davis, Larry S. and Bederson, Benjamin B. (2011): Creating contextual help for GUIs using screenshots. In: Proceedings of the 2011 ACM Symposium on User Interface Software and Technology 2011. pp. 145-154.

Contextual help is effective for learning how to use GUIs by showing instructions and highlights on the actual interface rather than in a separate viewer. However, end-users and third-party tech support typically cannot create contextual help to assist other users because it requires programming skill and source code access. We present a creation tool for contextual help that allows users to apply common computer skills-taking screenshots and writing simple scripts. We perform pixel analysis on screenshots to make this tool applicable to a wide range of applications and platforms without source code access. We evaluated the tool's usability with three groups of participants: developers, instructors, and tech support. We further validated the applicability of our tool with 60 real tasks supported by the tech support of a university campus.

© All rights reserved Yeh et al. and/or ACM Press

Edit | Del

Chang, Tsung-Hsiang, Yeh, Tom and Miller, Rob (2011): Associating the visual representation of user interfaces with their internal structures and metadata. In: Proceedings of the 2011 ACM Symposium on User Interface Software and Technology 2011. pp. 245-256.

Pixel-based methods are emerging as a new and promising way to develop new interaction techniques on top of existing user interfaces. However, in order to maintain platform independence, other available low-level information about GUI widgets, such as accessibility metadata, was neglected intentionally. In this paper, we present a hybrid framework, PAX, which associates the visual representation of user interfaces (i.e. the pixels) and their internal hierarchical metadata (i.e. the content, role, and value). We identify challenges to building such a framework. We also develop and evaluate two new algorithms for detecting text at arbitrary places on the screen, and for segmenting a text image into individual word blobs. Finally, we validate our framework in implementations of three applications. We enhance an existing pixel-based system, Sikuli Script, and preserve the readability of its script code at the same time. Further, we create two novel applications, Screen Search and Screen Copy, to demonstrate how PAX can be applied to development of desktop-level interactive systems.

© All rights reserved Chang et al. and/or ACM Press

Edit | Del

Chang, Tsung-Hsiang, Yeh, Tom and Miller, Robert C. (2010): GUI testing using computer vision. In: Proceedings of ACM CHI 2010 Conference on Human Factors in Computing Systems 2010. pp. 1535-1544.

Testing a GUI's visual behavior typically requires human testers to interact with the GUI and to observe whether the expected results of interaction are presented. This paper presents a new approach to GUI testing using computer vision for testers to automate their tasks. Testers can write a visual test script that uses images to specify which GUI components to interact with and what visual feedback to be observed. Testers can also generate visual test scripts by demonstration. By recording both input events and screen images, it is possible to extract the images of components interacted with and the visual feedback seen by the demonstrator, and generate a visual test script automatically. We show that a variety of GUI behavior can be tested using this approach. Also, we show how this approach can facilitate good testing practices such as unit testing, regression testing, and test-driven development.

© All rights reserved Chang et al. and/or their publisher

Edit | Del

Bigham, Jeffrey P., Jayant, Chandrika, Ji, Hanjie, Little, Greg, Miller, Andrew, Miller, Robert C., Tatarowicz, Aubrey, White, Brandyn, White, Samuel and Yeh, Tom (2010): VizWiz: nearly real-time answers to visual questions. In: Proceedings of the 2010 International Cross-Disciplinary Conference on Web Accessibility W4A 2010. p. 24.

Visual information pervades our environment. Vision is used to decide everything from what we want to eat at a restaurant and which bus route to take to whether our clothes match and how long until the milk expires. Individually, the inability to interpret such visual information is a nuisance for blind people who often have effective, if inefficient, work-arounds to overcome them. Collectively, however, they can make blind people less independent. Specialized technology addresses some problems in this space, but automatic approaches cannot yet answer the vast majority of visual questions that blind people may have. VizWiz addresses this shortcoming by using the Internet connections and cameras on existing smartphones to connect blind people and their questions to remote paid workers' answers. VizWiz is designed to have low latency and low cost, making it both competitive with expensive automatic solutions and much more versatile.

© All rights reserved Bigham et al. and/or their publisher

Edit | Del

Bigham, Jeffrey P., Jayant, Chandrika, Ji, Hanjie, Little, Greg, Miller, Andrew, Miller, Robert C., Miller, Robin, Tatarowicz, Aubrey, White, Brandyn, White, Samual and Yeh, Tom (2010): VizWiz: nearly real-time answers to visual questions. In: Proceedings of the 2010 ACM Symposium on User Interface Software and Technology 2010. pp. 333-342.

The lack of access to visual information like text labels, icons, and colors can cause frustration and decrease independence for blind people. Current access technology uses automatic approaches to address some problems in this space, but the technology is error-prone, limited in scope, and quite expensive. In this paper, we introduce VizWiz, a talking application for mobile phones that offers a new alternative to answering visual questions in nearly real-time -- asking multiple people on the web. To support answering questions quickly, we introduce a general approach for intelligently recruiting human workers in advance called quikTurkit so that workers are available when new questions arrive. A field deployment with 11 blind participants illustrates that blind people can effectively use VizWiz to cheaply answer questions in their everyday lives, highlighting issues that automatic approaches will need to address to be useful. Finally, we illustrate the potential of using VizWiz as part of the participatory design of advanced tools by using it to build and evaluate VizWiz::LocateIt, an interactive mobile tool that helps blind people solve general visual search problems.

© All rights reserved Bigham et al. and/or their publisher

Edit | Del

Yeh, Tom and Katz, Boris (2009): Searching documentation using text, OCR, and image. In: Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval 2009. pp. 776-777.

We describe a mixed-modality method to index and search software documentation in three ways: plain text, OCR text of embedded figures, and visual features of these figures. Using a corpus of 102 computer books with a total of 62,943 pages and 75,800 figures, we empirically demonstrate that our method achieves better precision/recall than do alternatives based on single modalities.

© All rights reserved Yeh and Katz and/or their publisher

Edit | Del

Yeh, Tom, Chang, Tsung-Hsiang and Miller, Robert C. (2009): Sikuli: using GUI screenshots for search and automation. In: Proceedings of the ACM Symposium on User Interface Software and Technology 2009. pp. 183-192.

We present Sikuli, a visual approach to search and automation of graphical user interfaces using screenshots. Sikuli allows users to take a screenshot of a GUI element (such as a toolbar button, icon, or dialog box) and query a help system using the screenshot instead of the element's name. Sikuli also provides a visual scripting API for automating GUI interactions, using screenshot patterns to direct mouse and keyboard events. We report a web-based user study showing that searching by screenshot is easy to learn and faster to specify than keywords. We also demonstrate several automation tasks suitable for visual scripting, such as map navigation and bus tracking, and show how visual scripting can improve interactive help systems previously proposed in the literature.

© All rights reserved Yeh et al. and/or their publisher

Edit | Del

Yeh, Tom and Darrell, Trevor (2008): Multimodal question answering for mobile devices. In: Proceedings of the 2008 International Conference on Intelligent User Interfaces 2008. pp. 405-408.

This paper introduces multimodal question answering, a new interface for community-based question answering services. By offering users an extra modality -- photos -- in addition to the text modality to formulate queries, multimodal question answering overcomes the limitations of text-only input methods when the users ask questions regarding visually distinctive objects. Such interface is especially useful when users become curious about an interesting object in the environment and want to know about it -- simply by taking a photo and asking a question in a situated (from a mobile device) and intuitive (without describing the object in words) manner. We propose a system architecture for multimodal question answering, describe an algorithm for searching the database, and report on the findings of two prototype studies.

© All rights reserved Yeh and Darrell and/or ACM Press

Edit | Del

Yeh, Tom, Lee, John J. and Darrell, Trevor (2008): Photo-based question answering. In: El-Saddik, Abdulmotaleb, Vuong, Son, Griwodz, Carsten, Bimbo, Alberto Del, Candan, K. Selcuk and Jaimes, Alejandro (eds.) Proceedings of the 16th International Conference on Multimedia 2008 October 26-31, 2008, Vancouver, British Columbia, Canada. pp. 389-398.

Edit | Del

Yeh, Tom, Grauman, Kristen, Tollmar, Konrad and Darrell, Trevor (2005): A picture is worth a thousand keywords: image-based object search on a mobile platform. In: Proceedings of ACM CHI 2005 Conference on Human Factors in Computing Systems 2005. pp. 2025-2028.

Finding information based on an object's visual appearance is useful when specific keywords for the object are not known. We have developed a mobile image-based search system that takes images of objects as queries and finds relevant web pages by matching them to similar images on the web. Image-based search works well when matching full scenes, such as images of buildings or landmarks, and for matching objects when the boundary of the object in the image is available. We demonstrate the effectiveness of a simple interactive paradigm for obtaining a segmented object boundary, and show how a shape-based image matching algorithm can use the object outline to find similar images on the web.

© All rights reserved Yeh et al. and/or ACM Press

Edit | Del

Tollmar, Konrad, Yeh, Tom and Darrell, Trevor (2004): IDeixis - Searching the Web with Mobile Images for Location-Based Information. In: Brewster, Stephen A. and Dunlop, Mark D. (eds.) Mobile Human-Computer Interaction - Mobile HCI 2004 - 6th International Symposium September 13-16, 2004, Glasgow, UK. pp. 288-299.

Add publication
Show list on your website

Join our community and advance:




Join our community!

Page Information

Page maintainer: The Editorial Team