James R. Lewis
Has also published under the name of:
"J. R. Lewis"
Current place of employment:
IBMBegan college studies as music major. Graduated 1975 with BM in music theory and composition, 1978 with MM in music composition. Switched to experimental psychology, graduating with BA in 1978 and MA (engineering psychology) in 1982 (all degrees from New Mexico State University). Began work at IBM in 1981 as human factors engineer, primary focus on input methods (keyboards, mice, touchscreens, joysticks). Started work on speech input/output in early 1990s. Graduated with PhD in experimental psychology (psycholinguistics) in 1996 (from Florida Atlantic University). In addition to scholarly publications, has over 50 patents issued by the US Patent Office -- designated an IBM Master Inventor in 2003.
Publications by James R. Lewis (bibliography)
Sauro, Jeff and Lewis, James R. (2011): When designing usability questionnaires, does it hurt to be positive?. In: Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems 2011. pp. 2215-2224.
When designing questionnaires there is a tradition of including items with both positive and negative wording to minimize acquiescence and extreme response biases. Two disadvantages of this approach are respondents accidentally agreeing with negative items (mistakes) and researchers forgetting to reverse the scales (miscoding). The original System Usability Scale (SUS) and an all positively worded version were administered in two experiments (n=161 and n=213) across eleven websites. There was no evidence for differences in the response biases between
© All rights reserved Sauro and Lewis and/or their publisher
Sauro, Jeff and Lewis, James R. (2010): Average task times in usability tests: what to report?. In: Proceedings of ACM CHI 2010 Conference on Human Factors in Computing Systems 2010. pp. 2347-2350.
The distribution of task time data in usability studies is positively skewed. Practitioners who are aware of this positive skew tend to report the sample median. Monte Carlo simulations using data from 61 large-sample usability tasks showed that the sample median is a biased estimate of the population median. Using the geometric mean to estimate the center of the population will, on average, have 13% less error and 22% less bias than the sample median. Other estimates of the population center (trimmed, harmonic and Winsorized means) had worse performance than the sample median.
© All rights reserved Sauro and Lewis and/or their publisher
Sauro, Jeff and Lewis, James R. (2009): Correlations among prototypical usability metrics: evidence for the construct of usability. In: Proceedings of ACM CHI 2009 Conference on Human Factors in Computing Systems 2009. pp. 1609-1618.
Correlations between prototypical usability metrics from 90 distinct usability tests were strong when measured at the task-level (r between .44 and .60). Using test-level satisfaction ratings instead of task-level ratings attenuated the correlations (r between .16 and .24). The method of aggregating data from a usability test had a significant effect on the magnitude of the resulting correlations. The results of principal components and factor analyses on the prototypical usability metrics provided evidence for an underlying construct of general usability with objective and subjective factors.
© All rights reserved Sauro and Lewis and/or ACM Press
Lewis, James R. (2006): Sample sizes for usability tests: mostly math, not magic. In Interactions, 13 (6) pp. 29-33.
Lewis, James R. and Sauro, Jeff (2006): When 100% Really Isn't 100%: Improving the Accuracy of Small-Sample Estimates of Completion Rates. In Journal of Usability Studies, 1 (3) pp. 136-150.
Small sample sizes are a fact of life for most usability practitioners. This can lead to serious measurement problems, especially when making binary measurements such as successful task completion rates (p). The computation of confidence intervals helps by establishing the likely boundaries of measurement, but there is still a question of how to compute the best point estimate, especially for extreme outcomes. In this paper, we report the results of investigations of the accuracy of different estimation methods for two hypothetical distributions and one empirical distribution of p. If a practitioner has no expectation about the value of p, then the Laplace method ((x+1)/(n+2)) is the best estimator. If practitioners are reasonably sure that p will range between .5 and 1.0, then they should use the Wilson method if the observed value of p is less than .5, Laplace when p is greater than .9, and maximum likelihood (x/n) otherwise.
© All rights reserved Lewis and Sauro and/or Usability Professionals Association
Lewis, James R. (2001): Current Issues in Usability Evaluation. In International Journal of Human-Computer Interaction, 13 (4) pp. 343-349.
In this introduction to the special issue of the International Journal of
Human-Computer Interaction, I discuss some current topics in usability
evaluation and indicate how the contributions to the issue relate to these
topics. The contributions cover a wide range of topics in usability evaluation,
including a discussion of usability science, how to evaluate usability
evaluation methods, the effect and control of certain biases in the selection
of evaluative tasks, a lack of reliability in problem detection across
evaluators, how to adjust estimates of problem-discovery rates computed from
small samples, and the effects of perception of hedonic and ergonomic quality
on user ratings of a product's appeal.
© All rights reserved Lewis and/or Lawrence Erlbaum Associates
Lewis, James R. (2001): Evaluation of Procedures for Adjusting Problem-Discovery Rates Estimated From Small Samples. In International Journal of Human-Computer Interaction, 13 (4) pp. 445-479.
There are 2 excellent reasons to compute usability problem-discovery rates.
First, an estimate of the problem-discovery rate is a key component for
projecting the required sample size for a usability study. Second,
practitioners can use this estimate to calculate the proportion of discovered
problems for a given sample size. Unfortunately, small-sample estimates of the
problem-discovery rate suffer from a serious overestimation bias. This bias can
lead to serious underestimation of required sample sizes and serious
overestimation of the proportion of discovered problems. This article contains
descriptions and evaluations of a number of methods for adjusting small-sample
estimates of the problem-discovery rate to compensate for this bias. A series
of Monte Carlo simulations provided evidence that the average of a
normalization procedure and Good-Turing (Jelinek, 1997; Manning&Schutze,
1999) discounting produces highly accurate estimates of usability
problem-discovery rates from small sample sizes.
© All rights reserved Lewis and/or Lawrence Erlbaum Associates
Wang, H. and Lewis, James R. (2001): Intelligibility and Acceptability of Short Phrases Generated by Embedded Text-to-Speech Engines. In: Proceedings of the Ninth International Conference on Human-Computer Interaction 2001. pp. 144-148.
Lewis, James R. (2001): Psychometric Properties of the Mean Opinion Scale. In: Proceedings of the Ninth International Conference on Human-Computer Interaction 2001. pp. 149-153.
Lewis, James R. (1999): Tradeoffs in the Design of the IBM Computer Usability Satisfaction Questionnaires. In: Bullinger, Hans-Jörg (ed.) HCI International 1999 - Proceedings of the 8th International Conference on Human-Computer Interaction August 22-26, 1999, Munich, Germany. pp. 1023-1027.
Lewis, James R. (1995): IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and Instructions for Use. In International Journal of Human-Computer Interaction, 7 (1) pp. 57-78.
This article describes recent research in subjective usability measurement at IBM, focused on evaluating the psychometric properties of questionnaires designed for use in scenario-based usability evaluation. The questionnaires address evaluation at both a global overall system level and at a more detailed scenario level. The primary goals of this article are to (a) discuss the psychometric characteristics of IBM questionnaires that measure user satisfaction with computer system usability, and (b) provide the questionnaires, with administration and scoring instructions. For scenario-level measurement, the 3-item After-Scenario Questionnaire (ASQ) has excellent internal consistency, with coefficient alphas across a set of scenarios ranging from .90 to .96. For more global assessment, the Post-Study System Usability Questionnaire (PSSUQ) also has excellent internal consistency, with an overall coefficient alpha of .97. Preliminary principal factor analysis of 48 PSSUQ questionnaires suggested the presence of three factors named, after varimax rotation, System Usefulness, Information Quality, and Interface Quality, with corresponding coefficient alphas of .96, .91, and .91. Evaluation of 377 PSSUQ questionnaires (modified to allow mailing to respondents in their offices and referred to as the Computer System Usability Questionnaire, or CSUQ) confirmed the structure of the preliminary principal factor analysis. Consequently, usability practitioners can use these questionnaires to help them measure users' satisfaction with the usability of computer systems in the context of scenario-based usability studies.
© All rights reserved Lewis and/or Lawrence Erlbaum Associates
Lewis, James R. (1993): Multipoint Scales: Mean and Median Differences and Observed Significance Levels. In International Journal of Human-Computer Interaction, 5 (4) pp. 383-392.
Researchers in human-computer interaction (HCI) often use discrete multipoint scales (such as 5- or 7-point scales) to measure user satisfaction and preference. Many knowledgeable authors state that the median is the appropriate measure of central tendency for such ordinal scales, although others challenge this assertion. This article introduces a new point of view, based on a human factors consideration. When decision makers read a usability report or attend a briefing, they may make decisions based on the magnitude of the difference between the measures of central tendency for key dependent variables. A major criterion that should affect the choice of presenting means or medians is the strength of the relationship between this difference and the observed significance levels of appropriate statistical tests. The results from two series of "real-world" usability studies showed that the mean difference correlated more than the median difference with the observed significance levels (both parametric and nonparametric) for discrete multipoint scale data. Therefore, for these scales in this measurement context, the mean can be a better measure of central tendency than the median. The results also provided evidence that mean differences for 7-point scales correlate more strongly with observed significance levels than those for 5-point scales.
© All rights reserved Lewis and/or Lawrence Erlbaum Associates
Lewis, James R. (1993): Problem Discovery in Usability Studies: A Model Based on the Binomial Probability Formula. In: Proceedings of the Fifth International Conference on Human-Computer Interaction 1993. pp. 666-671.
Product developers want their products to be as easy to use as possible, but must consider constraints such as cost and schedule. The primary goal of many usability studies is to discover design problems. After discovery, designers can take steps to eliminate or minimize problem impact. This paper shows that problem discovery in usability studies is consistent with the binomial probability formula. The problem discovery curves from two recent studies lend empirical support to this problem discovery model. One practical application of the model is to help estimate appropriate sample sizes for problem discovery usability studies. This model can help usability researchers simultaneously consider cost (minimized by running as small a sample as possible) and risk (minimized by running as large a sample as possible) to maximize the efficiency of a study.
© All rights reserved Lewis and/or Elsevier Science
Lewis, James R. (1993): Problem Discovery in Usability Studies: A Models Based on the Binomial Probability Formula. In: Smith, Michael J. and Salvendy, Gavriel (eds.) HCI International 1993 - Proceedings of the Fifth International Conference on Human-Computer Interaction - Volume 1 August 8-13, 1993, Orlando, Florida, USA. pp. 666-671.
Lewis, James R. (1992): Psychometric Evaluation of the Post-Study System Usability Questionnaire: The PSSUQ. In: Proceedings of the Human Factors Society 36th Annual Meeting 1992. pp. 1259-1263.
Usability evaluators used an 18-item, post-study questionnaire in three related usability tests. I conducted an exploratory factor analysis to investigate statistical justification to combine items into subscales. The factor analysis indicated that three factors accounted for 87 percent of the total variance. Coefficient alpha analyses showed that the reliability of the overall summative scale was .97, and ranged from .91 to .96 for the three subscales. In the sensitivity analyses, the overall scale and all three subscales detected significant differences among the user groups; and one subscale indicated a significant system effect. Correlation analyses support the validity of the scales. The overall scale correlated highly with the sum of the After-Scenario Questionnaire ratings that participants gave after each scenario. The overall scale also correlated moderately with the percentage of successful scenario completion. These results are consistent with the hypothesis that these alternative measurements tap into a common underlying construct. This construct is probably usability, based on the content of the questionnaire items and the measurement context.
© All rights reserved Lewis and/or Human Factors Society
Lewis, James R. (1991): Psychometric Evaluation of an After-Scenario Questionnaire for Computer Usability Studies: The ASQ. In ACM SIGCHI Bulletin, 23 (1) pp. 78-81.
A three-item after-scenario questionnaire was used in three related usability tests in different areas of the United States. The studies had eight scenarios in common. After participants finished a scenario, they completed the After-Scenario Questionnaire (the ASQ). A factor analysis of the responses to the ASQ items revealed that an eight-factor solution explained 94 percent of the variability of the 24 (eight scenarios by three items per scenario) items. The varimax-rotated factor pattern showed that these eight were clearly associated with the eight scenarios. The benefit of this research to system designers is that this three-item questionnaire has acceptable psychometric properties of reliability, sensitivity, and concurrent validity, and may be used with confidence in other, similar usability studies.
© All rights reserved Lewis and/or ACM Press
Lewis, James R. (1991): An After-Scenario Questionnaire for Usability Studies: Psychometric Evaluation Over Three Trials. In ACM SIGCHI Bulletin, 23 (4) p. 79.
Loricchio, David F. and Lewis, James R. (1991): User Assessment of Standard and Reduced-Size Numeric Keypads. In: Proceedings of the Human Factors Society 35th Annual Meeting 1991. pp. 251-252.
As technology improves, portable computers become smaller and more compact. A clear design challenge is to provide a system that is as compact as possible without degrading system usability. The keyboard is still the primary input device for compact computers. Previous research has indicated that reduced key spacing adversely affects skilled typing. Therefore, a portable computer system should provide a keyboard with full-sized keys in the primary typing area. The purpose of this study was to determine if reducing key size and spacing adversely affects the usability of a numeric keypad. Skilled keypad operators compared a standard-size numeric keypad to two keypads that had reduced center-to-center key spacing. One of these keypads achieved its reduction primarily by reducing the key spacing. The other reduced both key size and spacing. (Note that the small changes in key size and spacing have little effect on the overall device dimensions of a numeric keypad.) Operators typed numbers faster with and preferred the standard keypad over the keypad with both reduced key size and key spacing. If a numeric keypad is offered as part of a portable computer, every effort should be made to provide full-sized keys. If reduced key spacing is unavoidable, wide keys are preferable to narrow keys.
© All rights reserved Loricchio and Lewis and/or Human Factors Society
Lewis, James R. (1991): A Rank-Based Method for the Usability Comparison of Competing Products. In: Proceedings of the Human Factors Society 35th Annual Meeting 1991. pp. 1312-1316.
Lewis, James R., Henry, Suzanne C. and Mack, Robert L. (1990): Integrated Office Software Benchmarks: A Case Study. In: Diaper, Dan, Gilmore, David J., Cockton, Gilbert and Shackel, Brian (eds.) INTERACT 90 - 3rd IFIP International Conference on Human-Computer Interaction August 27-31, 1990, Cambridge, UK. pp. 337-343.
In this paper we present a case study of a benchmark evaluation of integrated office systems. The case study includes developing scenarios, benchmark measures, and quantitative and qualitative analysis of user performance and user problems. We studied two systems, one loosely integrated windowing environment and one more tightly integrated (with respect to consistent graphical interface style). Multivariate analyses showed that significant differences were attributable to performance/analytical variables and to patterns of error impact classifications, but not to subjective ratings. Somewhat surprisingly, users experienced serious problems with the seemingly more integrated (consistent) system largely because of a handful of serious problems. This was taken as evidence that improvement of the poorer performing system should be based primarily on an analysis of errors. Some examples are presented to indicate the potential diagnostic value of analyzing of problems and the development of testable behavioral objectives from benchmark measures.
© All rights reserved Lewis et al. and/or North-Holland
Lewis, James R. (1990): The Iowa Silent Reading Test's Comprehension Section: Local Norms and Predictive Validity for Usability Studies. In: D., Woods, and E., Roth, (eds.) Proceedings of the Human Factors Society 34th Annual Meeting 1990, Santa Monica, USA. pp. 922-926.
Lewis, James R. (1989): Pairs of Latin Squares to Counterbalance Sequential Effects and Pairing of Conditions and Stimuli. In: Proceedings of the Human Factors Society 33rd Annual Meeting 1989. pp. 1223-1227.
This paper discusses methods with which one can simultaneously counterbalance immediate sequential effects and pairing of conditions and stimuli in a within-subjects design using pairs of Latin squares. Within-subjects (repeated measures) experiments are common in human factors research. The designer of such an experiment must develop a scheme to ensure that the conditions and stimuli are not confounded, or randomly order stimuli and conditions. While randomization ensures balance in the long run, it is possible that a specific random sequence may not be acceptable. An alternative to randomization is to use Latin squares. The usual Latin square design ensures that each condition appears an equal number of times in each column of the square. Latin squares have been described which have the effect of counterbalancing immediate sequential effects. The objective of this work was to extend these earlier efforts by developing procedures for designing pairs of Latin squares which ensure complete counter-balancing of immediate sequential effects for both conditions and stimuli, and also ensure that conditions and stimuli are paired in the squares an equal number of times.
© All rights reserved Lewis and/or Human Factors Society
Show this list on your homepage
Knowledge wants to be free !
We have decided to give away world-class educational materials
because we believe that universal access to high quality education is key to the building
of peace, sustainable social and economic development, and intercultural dialogue.
To calculate just have much we have saved you, our wonderful readers, we compare our free encyclopedia to two
books we love:
$110: Human-Computer Interaction by Dix et al (a great textbook but without video interviews)
$116: Shneiderman's Designing the User Interface
(a great textbook but without video interviews).
As you are reading our encyclopedia on your iPad/tablet (and saving a few trees), we estimate that the price would be $90 if sold as an eBook.
With that number, we can calculate how much money we have saved our readers, based on calculating the number of readers.
How we calculate readership
Because of our online and tablet/iPad approach to publishing, we are able to precisely measure reading behaviour across hundreds of parameters in realtime: Anything from reading
speed, drop-off points in the text, reader demographics, and much more.
Based on our server logs and the Google Analytics API,
we calculate the number of readers as described in the calculation method below.
A reader is not the same as a simple pageview and a reader is not the same as a
website visitor (as described in our calculation method below).
We calculate readership for two types of readers:
- Readers that have read our whole encyclopedia, much the same way you read a printed book
- Readers that have reader an individual chapter
Calcalution method: How we define a reader
- First we use the Google Analytics API to get a report of the number of unique human visitors to a chapter/page. Google runs its business on ads and thus completely relies on the ability to distinguish between a human visitor and an automated request. If not, you could earn millions on automating clicks on Google Ads.
- We then compare that number to our Apache webserver logs, which report the much higher number of actual visits to a chapter/page (both human and automated). We calculate the difference in percent, which we call an "exaggeration factor", which we use in step 6 below.
- With a large part of the visitors excluded, we further exclude any visitor who:
- has not remained on the page for at least 3 minutes (this factor is calculated by recording visit durations of 1000 randomly selected visitors) or has not printed the page (i.e. has not visited the printerfriendly version of the chapter/page)
- has not scrolled the page (this factor is calculated by recording scroll movements on 1000 randomly selected visitors)
- We then further exclude "double readers", i.e. readers who read a portion of a chapter and then returns in,
say, a week or a month to read the rest.
Although this person's reading activity spans multiple server sessions, the person is only counted as a single reader.
We categorize a "double reader" as a visitor who:
- visits a page, or multiple pages, across multiple server sessions
- qualifies to be defined as a reader, cf step 1-3 above, in all server sessions
- uses the same originating IP address
- We then subtract 5% from the final number to counter-balance a last remaining factor, namely the situation where one reader reads a chapter on his/her tablet
using a WiFi connection (and counted as one reader) but then picks up his other tablet using a 3G dongle
(with another IP address) and re-reads some of the chapter. That will equal two readers, not one. We have no way
of calculating how many times this situation arises, but to be on the safe side we subtract 5%
from the final number.
- We then take half of the "exaggeration factor" from step 2 and substract from the final number. We do this for no rational reason. We do it only as a further measure to be certain that our number of readers is not inflated.
- To qualify as a reader who has read our whole encyclopedia - much the same way you read a printed book - that person must have qualified as a reader (cf. 1-6 above) of at least 80% of the encyclopedia chapters.
As a result, we have eliminated everything from automated requests to the more casual visitors. That leaves us with what we can safely call readers.
Changes to this page (author)
05 Jul 2011: Author was edited 18 Nov 2010: Author was edited 02 Nov 2010: Author was edited
24 Feb 2010: Enabled abstracts to be shown on James R. Lewis's author page.
04 Jun 2009: Author was edited
04 Jun 2009: Author was edited
09 May 2009: Author was edited
03 Sep 2007: Added a picture of James R. Lewis
29 Jun 2007: Author was edited
28 Jun 2007: Author was edited
27 Jun 2007: Author was edited
27 Jun 2007: Author was added to the bibliography
26 Jun 2007: Author was edited
26 Jun 2007: Author was edited
26 Jun 2007: Author was edited
26 Jun 2007: Author was edited
26 Jun 2007: Author was edited
23 Jun 2007: Author was edited
23 Jun 2007: Author was edited
28 Apr 2003: Added the author to the bibliography
Page Information
Page maintainer:
The Editorial TeamHow to cite/reference this page
URL: http://www.interaction-design.org/references/authors/james_r__lewis.html