Jeff Sauro

Personal Homepage

Jeff is a Six-Sigma trained statistical analyst and pioneer in quantifying the user experience. He is founding principal of Measuring Usability LLC, a quantitative user research firm based in Denver, CO. He is author of four books including: Quantifying the User Experience: Practical Statistics for User Research. He has worked for GE, Intuit, PeopleSoft and Oracle and has consulted with dozens of Fortune 500 companies including Walmart, PayPal, Autodesk and McGraw Hill. Jeff received his Masters from Stanford University and maintains the website You can follow him on Twitter: @MsrUsability.

Publication Statistics

Publication period start
Publication period end
Number of co-authors


Number of publications with favourite co-authors

Productive Colleagues

Most productive colleagues in number of publications


Sauro, Jeff (2006): The user is in the numbers. In Interactions, 13 (6) pp. 22-25.

Lewis, James R., Sauro, Jeff (2006): When 100% Really Isn't 100%: Improving the Accuracy of Small-Sample Estimates of Completio. In Journal of Usability Studies, 1 (3) pp. 136-150.

Sauro, Jeff (2006): Quantifying usability. In Interactions, 13 (6) pp. 20-21.

Sauro, Jeff (2004): Premium usability: getting the discount without paying the price. In Interactions, 11 (4) pp. 30-37.

Sauro, Jeff (2011). Measuring User Interface Disasters. Retrieved 2013-09-26 00:00:00 from

Sauro, Jeff (2012). How Effective are Heuristic Evaluations. Retrieved 2014-02-09 00:00:00 from Measuring Usability:

Sauro, Jeff (-0001). Measuring Usability. Retrieved 2013-10-09 00:00:00 from

Sauro, Jeff, Lewis, James R (2012): Quantifying the User Experience: Practical Statistics for User Research, Morgan Kaufmann,

Sauro, Jeff (2010): A Practical Guide to Measuring Usability: 72 Answers to the Most Common Questions about Quantifying the Usability of Websites and Software, CreateSpace Independent Publishing Platform,

Sauro, Jeff, Lewis, James R. (2011): When designing usability questionnaires, does it hurt to be positive?. In: Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems , 2011, . pp. 2215-2224.

Sauro, Jeff

22.12 Commentary by Jeff Sauro

What is it, how do we use it, where did it come from and how do we interpret the results? That’s what you want to know when using a method like card sorting. Hudson delivers succinct points and comprehensive coverage on this essential UX method. He accurately articulates how card sorting generates both qualitative and quantitative data and illustrates how interpreting one of the signature graphs of card sorting (the dendrogram) involves both data and judgment.   Here are a few more points to consider when quantifying the results of a card sort.

22.12.1 Confidence Intervals

Card sorts - like most User Research methods - involve working with a sample (often a small one) of the larger user population.  With any sample comes uncertainty as to how stable the numbers are.  One of the most effective strategies is to add confidence intervals around the sample statistics.  A confidence interval tells us the most plausible range for the unknown population percentages.

For example, let’s assume 20 out of 26 users (77%) were able to successfully find Strawberries under the “Soft Fruit” category (Figure 22.13). Even without measuring all users, we can then be 95% confident between 58% and 89% of all users would successfully locate strawberries (assuming our sample is reasonably representative).  

The margin of error around our percentage is +/- ~16%. The lower boundary of the confidence interval tells us that we can be 95% confident 58% or more of users would find the location of Strawberries. If we have as a rudimentary goal to have most users find the fruit then we have evidence of achieving this goal.

We can apply the same method to qualifying the percentages of cards placed into a category. For example, let’s assume 70 participants conducted the card-sort shown in Figure 22.10. We see that 46% placed “Getting a New Person Started” in the “Joining” category but 39% placed this card in the “Hiring New People” category. 

The 95% confidence interval for the “Hiring New People” category is between 35% and 58% and between 28% and 50% for the “Joining” category (see the figure above). The substantial overlap in the confidence intervals means we shouldn’t have much confidence in this difference.  An online calculator is available at to make the computations.

Due to the large overlap in the intervals we cannot distinguish the 5 percentage point difference from sampling error. If we need to pick one we should go with “Joining” but we should consider both categories as viable options.

22.12.2 Sample Sizes

As with most evaluations, when involving users one of the first questions asked is “How many users do I need?”  Surprisingly, there is little guidance on determining your sample size other than the 2004 Tullis and Wood article. Tullis and Wood performed a resampling study with one large card sort involving 168 users and found the cluster results would have been very similar (correlations above .93) at sample sizes between 20-30.

This sample size is based on the particulars of a single study (agreement in card placement and 46 cards) and on viewing the dendrogram so the results are most appropriate if your study is similar to theirs.

Another approach to sample size planning is based on the percent of users who place cards in each category chart (Figure 22.10) or correctly select the right path in tree testing (Figure 22.13). This approach is based on working backwards from the confidence intervals like those generated in the previous section.  In the first example we had a margin of error of 16% around the percent of users who would correctly locate strawberries.

If we wanted to generate a more precise estimate, and cut our margin of error in half to +/- 8% we work backwards from the confidence interval and get a required sample size of 147. The following table shows the expected margins of error for 95% confidence intervals at different sample sizes.

Sample Size

Margin of Error (+/-)



























The computations are explained in Chapter 3 and Chapter 6 of Quantifying the User Experience.

Webb, Erika Noll, Matsil, Ray, Sauro, Jeff (2011): Benefit analysis of user assistance improvements. In: Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems , 2011, . pp. 841-850.

Sauro, Jeff, Lewis, James R. (2010): Average task times in usability tests: what to report?. In: Proceedings of ACM CHI 2010 Conference on Human Factors in Computing Systems , 2010, . pp. 2347-2350.

Sauro, Jeff, Lewis, James R. (2009): Correlations among prototypical usability metrics: evidence for the construct of usability. In: Proceedings of ACM CHI 2009 Conference on Human Factors in Computing Systems , 2009, . pp. 1609-1618.

Sauro, Jeff, Dumas, Joseph S. (2009): Comparison of three one-question, post-task usability questionnaires. In: Proceedings of ACM CHI 2009 Conference on Human Factors in Computing Systems , 2009, . pp. 1599-1608.

Sauro, Jeff, Kindlund, Erika (2005): A method to standardize usability metrics into a single score. In: Proceedings of ACM CHI 2005 Conference on Human Factors in Computing Systems , 2005, . pp. 401-409.

Sauro, Jeff (2013). What UX Methods to Use and When to Use Them. Retrieved 2014-02-09 00:00:00 from Measuring Usability: