22. Card Sorting
The term card sorting applies to a wide variety of activities involving the grouping and/or naming of objects or concepts. These may be represented on physical cards; virtual cards on computer screens; or photos in either physical or computer form. Occasionally, objects themselves may be sorted. The results can be expressed in a number of ways, with the primary focus being which items were most frequently grouped together by participants and the names given to the resulting categories.
For the purpose of interaction design, the sorting process — usually performed by potential users of an interactive solution — provides:
We can use this information to decide which items should be grouped together in displays; how menu contents should be organized and labelled; and perhaps most fundamentally, what words we should employ to describe the objects of our users' attention.
22.1 A practical example
Imagine that you are responsible for the information architecture of computerized touch-screen scales of the kind increasingly common in large supermarkets, shown in Figure 1. The screen displays 12 images and captions at a time. There have been some complaints that customers are spending a long time at the scales and are frustrated by how the categories are organized. Table 1 shows a list of sample items that customers need to find. These have been printed on cards with bar codes for easy data capture (see Figure 2 and the Syntagm web site). Figure 3 shows an example of the cards organized into groups. Since this is an 'open' sort, users make up their own groups and names for them. This particular grouping represents the current solution implemented in the scales, referred to as a 'reference sort', discussed later in this chapter.
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Broccoli / Calabrese
Courgettes / Zucchini
Squash / Marrows
Swede / Rutabaga
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Take a moment to consider how you might organize these items yourself. For most people there are at least two groups — fruit and vegetables. But in a large supermarket two groups would contain very long lists of items which would not be helpful without further subdivision. Also, there may be some terms that are unfamiliar to you. Courgette is the French name for the long, green marrow (squash) seen in British supermarkets, while zucchini is the Italian name found in the US. Conversely, what is known as a rutabaga in the US is called a swede in the UK as it was introduced to Scotland by the Swedes. Where simple language differences like these are known in advance, listing the alternatives on a single card is probably a satisfactory solution. However, in novel problem domains or in multicultural/multilingual situations where terminology is a larger issue, it may be better for participants to sort photographs or even the objects themselves (with a barcode label attached).
Whatever you are sorting, you will end up with some things (items) arranged in groups, ideally with group names. The next challenge is how to make sense of these, particularly when you have tens or hundreds of participants. No matter how the analysis is done, there are at least two things we want to know:
Be careful to note that these are two separate sets of information. That grapefruit and oranges were always grouped together in the sample study is not affected by the fact that several different group names were used. Also, not surprisingly, other items were grouped with grapefruit and oranges — but the nature of these items varied with the approach taken by participants. If the group was called simply 'fruit' it contained apples, pears and other fruits as well as grapefruit and oranges. If it was called 'citrus', the only addition was lemons. So, to get a good idea of what the sort is telling us, we use different kinds of analysis. The first two correspond to the things we wanted to know:
22.1.1 Items by groups chart
You can produce simple versions of the charts yourself with pencil and paper or a spreadsheet and printer. First, the items by groups chart:
22.1.2 Items by items chart
The items by items chart is a little more challenging to produce:
22.2 What the analyses mean
While it is tempting to think that a card sorting project is going to immediately provide a navigation hierarchy, this is rarely the case. The results inform a design process; they do not provide a packaged solution. The sample fruit-and-vegetable project described here provides a realistic case in point — the results are far from conclusive.
What do we know for sure from the analyses? Refer back to Figure 5 and Figure 8 and see what conclusions you can draw before proceeding.
Both charts include the results of a cluster analysis that divide the items into four groups. The items by groups chart (Figure 5) shows that the most popular names for the four groups were 'Fruit', 'Spices', 'Vegetables' and 'Root Veg'. 'Citrus Fruit' was a strong contender for grapefruit, oranges and lemons, while some participants (about a third) did not distinguish between 'Root veg' and 'Vegetables'.
- Starts in X days–96% booked: Get Your First Job as a UX or Interaction Designer
- Starts in X days–93% booked: Gestalt Psychology and Web Design: The Ultimate Guide
Anything else? What about fennel? In both charts it should be possible to see that fennel has been grouped with a wide variety of other items. Although the cluster analysis placed it in the group called 'Spices' almost 20% of participants sorted it into the 'Vegetables' group. There may be nothing we can do about this other than providing access to fennel from both groups — easily done on a computerized scale or web site.
Focussing on the items by items chart for a moment, we see an important feature of the items themselves — independent of group names. Very few participants attempted to group the fruits with any of the vegetables. This shows a clear understanding of and distinction between these two main categories that we certainly should build on when designing a suitable information hierarchy. In contrast, the charts show a good deal of participant ambiguity over onions and leeks. These were frequently grouped with root vegetables, but the items by items chart shows an affinity — particularly for onions — with the group most commonly referred to as 'spices'.
What conclusions can we draw from this example? The first is that while we have learned a great deal about our participant's appreciation of the terminology, categories and concepts, the exercise was too limited for the results to be applied to a larger information space. Specifically, the small number of fruits provided in the example encouraged participants to place them in a single group. This may not be realistic in practice, although we do have some suggestions for refinement — 'Citrus Fruits' and 'Berries' from Figure 5. One solution would be to provide participants with a larger range of fruits, including what are called exemplars (representative types) of the categories we expect. An alternative approach would be to brief and monitor participants more closely. This is difficult to do in an online sorting activity — even if the briefing is very detailed, participants may fail to see it, read it or act on it. Most of these issues can be overcome in face-to-face sorting. If facilitators see participants producing too few categories, they can simply cajole them to create more.
So far we have touched on two popular methods of analysing card sorts — there are others which will be discussed later. But first a little background...
22.3 The History of card sorting
Card sorting has a surprisingly long history, especially if the concept of categorization is included. The ancient Greeks are credited with the early development of categories, with Aristotle providing the foundations for the categorization scheme that we use today for plants and animals (Sachs 2002). The practice of sorting cards in the social sciences is somewhat more recent, but still well over 100 years old. Initially, printed playing cards were used for a variety of experiments in the nascent field of psychology (Jastrow 1886), but these were joined relatively quickly by blank cards on which researchers would write words to be categorized by subjects (Bergström 1893). Early card sorting activities were primarily concerned with establishing characteristics of the subjects — the speed of sorting used as an indicator of mental processes and reaction time (Jastrow 1886; Jastrow 1898); memory function (Bergström 1893; Bergström 1894) and imagination — using inkblots on cards (Dearborn 1898). Some of these experiments developed into what is now considered to be a standard test for neurological damage in patients who have suffered head injuries, the Wisconsin Card Sorting Test (Eling et al. 2008). In fact, card sorting was so well received in psychology that an article appeared in Science as early as 1914 espousing the virtues of various types of card-based activities (Kline and Kellogg 1914).
Card sorting also made its way into other fields: criminology (Galton 1891), market research (Dubois 1949), semantics (Miller 1969) and as a standard qualitative tool in the social sciences (Weller and Romney 1988; Bernard and Ryan 2009). However, it was not until the emergence of the World Wide Web in the early 1990's that card sorting was applied to the task of organizing information spaces (Nielsen and Sano 1995), with the rare exception that Tom Tullis applied card sorting to the design of menus for an operating system in the early 1980's (Tullis 1985).
22.3.1 Card sorting and the design of interactive products
Despite the popularity of the web, card sorting remains an under-used tool in the design of interactive products. In a survey of 217 attendees of Usability Week 2008, Nielsen Norman Group reported that the average number of card sorts conducted per year was 2. While this is twice as frequent as eye-tracking studies in the survey (average 1 per year), this is a surprisingly low number given that there are no large up-front investments required. In fact, card sorting has had only a peripheral role in interactive product design since its inception — perhaps reflecting the limited uptake of user-centred design methods in general. Peter Morville and Louis Rosenfeld devote only a few pages to card sorting in their seminal work, Information Architecture — now in its third edition (Morville and Rosenfeld 2006). And at the time of writing, there is only one book available on the topic of card sorting for interactive systems design, Donna Spencer's Card Sorting: Designing Usable Categories (Spencer 2009), which tends to be fairly conservative in terms of analysis.
22.4 Benefits of card sorting
For interaction design, customer research or research in the social sciences, few investigative techniques are as effective as card sorting in dealing with large numbers of concepts. In face-to-face settings, handling and annotating physical cards is a fairly natural and unintimidating process: observing users engaged in this process can result in many insights for researchers and provide a fertile source of questions and conversations about the problem domain being studied and, of course, users themselves. These outcomes and opportunities are hard to obtain through interviews, questionnaires and usability evaluations, although each of these alternatives has its strengths for more limited scopes of investigation. For example, it is relatively easy to discover that a single menu item is mislabelled in a usability study, but prohibitively expensive for several dozen items.
22.5 Qualitative versus quantitative outcomes
At one extreme, card sorts can be conducted on a one-to-one basis as a tool for discovery (knowledge elicitation) and a means of generating meaningful discussion between participants and researchers (Weller and Romney 1988; Bernard and Ryan 2009). The outcomes here are generally a better understanding of the problem domain from a user's perspective with terms, relationships and categories expressed in the resulting groups. At the other extreme, it is very easy to organize online sorts with hundreds of participants to discover whether the terminology and concepts presented are well understood across a large user population (Fincher and Tenenberg 2005). While results in the one-to-one approach are primarily qualitative, those of the large-scale online studies are mostly quantitative. (Note that it is not impossible to obtain qualitative information from online studies; there simply are not as many opportunities to persuade or allow online participants to provide useful feedback.)
22.6 What to sort
Not surprisingly, the choice of what to have participants sort depends largely on what a researcher, information architect or interaction designer is trying to discover. For 'green-field' projects — those that lack any constraints imposed by prior work — a first priority would be to establish a vocabulary. In this context, users could be presented with objects, images or descriptions of items and asked to name them. Once named, they could be grouped, with the groups in turn also named. This is fairly easy to do in face-to-face settings, where numbered or bar-coded labels can be applied to objects or photos (see, for example, the card sorting templates for Microsoft Word at the Syntagm web site). Note that some web-based sorting packages, such as websort.net, do allow photos to be sorted, but provide no means for users to apply names to the items depicted.
22.7 How to do a card sort
22.7.1 Choosing an approach
Face-to-face sorting methods are generally better for qualitative research, while online methods (web-based or desktop) are more appropriate for quantitative results. However, this is not always true; for example, it would be possible to sit with a participant or share their desktop while they conducted an online sort. This could result in good qualitative data, but it would be more intimidating for participants and much harder work for the facilitator. Remote desktop sharing can also be technically challenging, especially in the presence of corporate firewalls and security policies.
Researchers or interaction designers can also choose between
For most purposes, open sorting is the best choice, although supplying some predefined categories is always helpful to participants and is supported by most sorting and analysis tools. Closed sorting can be used when trying to establish changes required to an existing structure, particularly with analysis tools that provide comparisons between a 'reference sort' (such as an existing or proposed solution) and participants' results — see Figure 9.
In this items by groups chart, Figure 9, the current solution is shown with black squares in a cell. So while most participants choose to group all fruit together, the computerized scales used two unusual groups; 'Grapes & Citrus' and 'Exotic Fruit'. However, there were some areas of correspondence: many participants agreed with the current design for the 'root veg' group towards the bottom-centre of the chart.
22.7.2 Recruiting and briefing participants
As with any other form of user-centred design, participants of a card sorting activity should be representative of the users envisaged for the solution. However, given the difficulties that some members of the population may have with technology (older users for example) it is often beneficial to over-sample these groups to ensure that the resulting design is effective for as broad an audience as possible. Where possible, try to use participants who are motivated to participate by interests that are more than purely monetary — existing users or customers for example.
When briefing participants for a sorting activity, it does not pay to be too vague in stating the requirements. In navigation design the number of categories needed for a set of items is not a complete mystery. There is usually a balance to be struck between the number and size of groups (Kiger 1984). Consequently, it is important to provide participants with adequate information about the number and level of groups you require. If you are trying to devise menus for our computerised produce scale having space for 12 items on the screen, do not be shy about letting participants know that. Similarly, horizontal menu bars on websites or desktop applications rarely have space for more than 6 or 8 items. Allowing participants to generate 20 or 30 categories in these cases is potentially a waste of their time and yours.
Similarly, if you have group names that you know, or at least strongly suspect you need, provide those to participants. This can be done in both face-to-face and online settings. But do encourage participants to make up their own group names if they prefer.
Participants should also be advised on how to deal with items they do not understand. While some researchers or interaction designers suggest that all items should be sorted — leaving participants simply to guess at those they do not recognize — this can lead to spurious groupings. Consider asking users simply not to sort items they do not recognize, or create a specific 'unknown' group to receive them. These can then be excluded from the results. Most online sorting tools now do allow items to remain unsorted. However, make sure that analysis results are based on the number of participants rather than the number of times that an item was sorted.
22.7.3 Time to sort
The amount of time required to perform a sort can vary considerably from person to person, but is largely dependent on the number of items to be sorted:
However, other factors include how familiar the terms and concepts are to participants and how motivated they are to provide results conscientiously. Also, it is possible to sort up to 150 cards in single sessions, but higher quality results might be obtained by splitting such a large project into smaller parts.
22.7.4 Preparing a sort
For face-to-face (paper-based) sorting, getting items and group names onto cards can be a tedious undertaking. Happily, standard mail-merge software can be used to make this task easier, meaning that items can be printed either directly onto cards or self-adhesive labels. Free mail-merge templates for Microsoft Word can be found on the Syntagm web site for both North American and European paper sizes. These also include bar codes that can be used to simplify data collection: instead of typing in an item name or number, the bar codes allow them to be read directly using a simple USB scanner. This is both quicker and less error-prone than manual entry — it makes it relatively easy to process 120 cards or more per minute (full instructions are included on the web page referred to).
Preparation for online sorting is relatively straightforward, requiring only lists of the items and group names (if any) to be uploaded.
However, regardless of the method of sorting, be aware that superficial similarities in the names used can produce unhelpful results. Consider these menu item names from an intranet:
If faced with a large number of items to sort, participants may simply group similar names together. This is called a superficial match. To overcome this, consider modifying the item names:
In the first two items the word 'manage' was not an essential part of the name. Removing it or using a synonym prevents unwanted grouping.
22.7.5 Choosing names
Apart from the issue of superficial similarities mentioned above, be careful to choose names that are in common use, especially where interactive solutions are being designed for a broad range of abilities. This is not just common sense, but also a requirement of disability discrimination legislation in many countries. Put simply, language should be no more complex than needed to convey the required information. In English, longer words (measured in syllables) are used much less frequently than shorter ones (Klare 1963). And even though participants in a card sort might suggest unusual names for items or groups — such as 'brassicas' — most people will go into their local supermarket or green grocers asking for cabbage rather than use its Latin genus. If in doubt, consult a reference on common words such as the Corpus of Contemporary American English, the British National Corpus or similar sources for other languages.
22.8 How to understand the results
For very small projects, just leafing through the sorted cards or listing of online results can provide useful insights into groupings. However, larger projects will require some form of analysis, ranging from simple tabulation through to cluster analysis. Note that while cluster analysis is potentially a very complex subject (Romesburg 2004; Bernard and Ryan 2009), most card sorting tools use a fairly simple form of cluster analysis that could easily be replicated manually. It is known as 'hierarchical cluster analysis'. The 'hierarchy' in this case refers to the way in which smaller clusters are aggregated to form larger ones until all are included.
22.8.1 Simple analysis
Simple tabulation of items by groups can be performed manually (as described above) or by using a spreadsheet package such as can be found at Boxes and Arrows. However, online sorting tools will do this analysis for you. For printed cards using the Microsoft Word mail-merge templates described earlier, SynCaps V2 and later will produce items by items, items by groups and dendrogram analyses.
Figure 10 is an items by groups chart showing an alternative presentation to that of Figure 5. In both cases, the items are listed down the left-hand side of the chart with the group names across the top. A cluster analysis has been performed to determine which items are most closely related, producing an item ordering that moves from one cluster to the next. The only significant difference between the two figures is that Figure 5 uses shading to show the relative strength of each relationship (figures are available by clicking on a cell) while Figure 10 presents the percentage figures with blue shading only to highlight the most significant results.
22.8.2 Cluster analysis
The type of cluster analysis performed by most card sorting tools is 'hierarchical cluster analysis' or HCA. The usual result is a graphical display called a dendrogram, or sometimes 'dendrogram', which has its roots (literally) in the Greek word for 'tree', which is 'dendron'.
Figure 11 shows a hierarchical cluster analysis in the form of a dendrogram. The example is taken from an intranet navigation sorting activity. The hierarchical nature of the dendrogram is related to the strength of the relationships between items, as measured by how frequently they appeared in the same groups. And as in real trees, shorter branches are stronger. In Figure 11, the six items at the bottom all include the word 'leave'. However, participants have primarily grouped 'Adoption -', 'Parental -' and 'Special' Leave as being closely related, but were less consistent with 'Maternity & Paternity - ', 'Annual -' and 'Sick' Leave. Finally, 'Work Breaks' was sometimes grouped with the leave items, but this relationship is fairly weak compared with the others. If you wanted to know why the work breaks item relationship is weaker, you would need to consult an items by items chart, an items by groups chart or the raw proximity matrix if available — the latter simply showing the number of times each pairing of items appeared together in the same groups.
The dendrogram also gives some insight to the way the cluster analysis works. The method used is called 'agglomerative clustering', meaning simply that we build the clusters from the bottom up. So in the intranet example, the first cluster would have started with the last three 'leave' items — they have the shortest branches — with 'Maternity & Paternity Leave' subsequently subsumed. Then, looking again at Figure 11, the next strongest relationship appears towards the top of the chart, as 'Resignation' and 'Survey of leavers policy'.
As items are agglomerated into clusters, an average score (based again on the number of times pairs of items appeared in the same groups) is calculated. This is shown in the dendrogram by how far the vertical connecting lines are from the labels. As mentioned above, the resulting branches reflect stronger relationships when they are shorter — that is, when the vertical connecting lines are closer to the labels, as for the bottom three items in Figure 11.
In a dendrogram, clusters are joined together into branches until all items have been included. This means that the weakest relationships — between dissimilar clusters — can be found furthest from the item labels. Although Figure 11 does not show a complete dendrogram, it does include three long branches that are continued off to the right. These represent three dissimilar clusters; each will require their own category labels (which could be derived from an items by groups chart). Note that dendrograms take no account of group names; it may well be that even though ‘Adoption -’, ‘Parental -’ and ‘Special Leave’ were grouped together frequently, participants may have applied a wide variety of names to that grouping. Also be aware that in a dendrogram, items can appear in only one place. Therefore, if an item was split equally by participants between two different groups, it would appear only as a weak relationship in one of them. You would need to visit the items by groups chart to notice this.
22.9 Advanced analysis
In trying to make sense of card sorting results, there are two problems that frequently recur. The first is that not all participants have the same motivation, experience or needs. This means that we may have participants whose sort results are simply 'noise' — particularly for online sorts with an attractive incentive. In other cases we may believe we have one relatively homogenous group of participants, when in fact we have multiple. This can be due to general factors such as experience — in which case we need to accommodate these multiple groups in our designs; or it may be due to different contexts of use. In the latter case we should try to understand the differences and to decide whether separate designs are warranted. Unfortunately, traditional card sorting analysis tools are not much help here. But some of this information can be obtained manually — by examining the number and size of groups produced by each participant, for example: those in a hurry tend to have fewer groups and a large number of items in unhelpful categories such as 'don't know' or 'miscellaneous', while those who have a substantially different view of the problem domain may produce an unusual number of groups (relative to the average). Optimal Workshop has added some participant-oriented results to their web-based service. Fairly detailed participant and item spreadsheets can be found in all versions of SynCaps.
The second recurring problem is related to the basic principle of cluster analysis: every item is assigned to exactly one cluster. To a certain extent, this can be worked-around by careful inspection of the items by items and items by groups analyses. For example, an item such as a cucumber might be split equally between 'green vegetables' and 'salad vegetables'. It will appear in the dendrogram in either of these groups — the choice will be arbitrary if the split is exactly 50:50 — with a fairly weak relationship. However, the weakness of the relationship is not because participants were confused about where it should go; they just did not agree. The items by items and items by groups charts would show this clearly. However, because of this limitation of cluster analysis, some researchers have explored other advanced statistical techniques; most notably factor analysis. See Capra 2005 and Giovannini 2012. A more detailed account of card sorting analysis methods can be found in (Corter 1996) and (Coxon 1999).
22.9.1 Multilevel sorting
The primary method of sorting discussed in this chapter can be described as single-level or 'flat'. Participants are given a set of items which they should sort into a single level of groups. So while it might be tempting to nest groups — 'leaf vegetables' within 'green vegetables' within 'vegetables', for example — there are two issues to be aware of:
22.10 Tree Sorting
Tree sorting (also called 'tree testing' and 'reverse card sorting') is a concept related to card sorting, but in many respects quite different. In essence it is a simulation of a navigation tree that would be found in a software application or web site. Online participants are presented with goals and then asked to navigate using the tree simulation. Figure 13 illustrates the process across several screens (step 1 is the first screen; step 2 is the second and so on). In step 1, the participant has chosen 'Fruit', while in step 2 'Soft Fruit' was selected. If the wrong selections are made, participants will need to back-track to find a more appropriate menu. A large number of tasks can be made available, with only a random subset displayed to each participant if required.
On completion of a project, researchers and designers can be presented with success rates, error rates and time taken (or related variations). While closed card sorting can be of some help in validating a navigation design, tree sorting is a more effective approach in most cases. (See plainframe.com and optimalworkshop.com)
22.11 Where to learn more
Aside from the references listed below and particularly Donna Spencer's on card sorting (Spencer 2009), there are a number of helpful web resources:
22.12 Commentary by Jeff Sauro
What is it, how do we use it, where did it come from and how do we interpret the results? That’s what you want to know when using a method like card sorting. Hudson delivers succinct points and comprehensive coverage on this essential UX method. He accurately articulates how card sorting generates both qualitative and quantitative data and illustrates how interpreting one of the signature graphs of card sorting (the dendrogram) involves both data and judgment. Here are a few more points to consider when quantifying the results of a card sort.
22.12.1 Confidence Intervals
Card sorts - like most User Research methods - involve working with a sample (often a small one) of the larger user population. With any sample comes uncertainty as to how stable the numbers are. One of the most effective strategies is to add confidence intervals around the sample statistics. A confidence interval tells us the most plausible range for the unknown population percentages.
For example, let’s assume 20 out of 26 users (77%) were able to successfully find Strawberries under the “Soft Fruit” category (Figure 22.13). Even without measuring all users, we can then be 95% confident between 58% and 89% of all users would successfully locate strawberries (assuming our sample is reasonably representative).
The margin of error around our percentage is +/- ~16%. The lower boundary of the confidence interval tells us that we can be 95% confident 58% or more of users would find the location of Strawberries. If we have as a rudimentary goal to have most users find the fruit then we have evidence of achieving this goal.
We can apply the same method to qualifying the percentages of cards placed into a category. For example, let’s assume 70 participants conducted the card-sort shown in Figure 22.10. We see that 46% placed “Getting a New Person Started” in the “Joining” category but 39% placed this card in the “Hiring New People” category.
The 95% confidence interval for the “Hiring New People” category is between 35% and 58% and between 28% and 50% for the “Joining” category (see the figure above). The substantial overlap in the confidence intervals means we shouldn’t have much confidence in this difference. An online calculator is available at http://www.measuringusability.com/wald.htm to make the computations.
Due to the large overlap in the intervals we cannot distinguish the 5 percentage point difference from sampling error. If we need to pick one we should go with “Joining” but we should consider both categories as viable options.
22.12.2 Sample Sizes
As with most evaluations, when involving users one of the first questions asked is “How many users do I need?” Surprisingly, there is little guidance on determining your sample size other than the 2004 Tullis and Wood article. Tullis and Wood performed a resampling study with one large card sort involving 168 users and found the cluster results would have been very similar (correlations above .93) at sample sizes between 20-30.
This sample size is based on the particulars of a single study (agreement in card placement and 46 cards) and on viewing the dendrogram so the results are most appropriate if your study is similar to theirs.
Another approach to sample size planning is based on the percent of users who place cards in each category chart (Figure 22.10) or correctly select the right path in tree testing (Figure 22.13). This approach is based on working backwards from the confidence intervals like those generated in the previous section. In the first example we had a margin of error of 16% around the percent of users who would correctly locate strawberries.
If we wanted to generate a more precise estimate, and cut our margin of error in half to +/- 8% we work backwards from the confidence interval and get a required sample size of 147. The following table shows the expected margins of error for 95% confidence intervals at different sample sizes.
|Sample Size|| |
Margin of Error (+/-)
The computations are explained in Chapter 3 and Chapter 6 of Quantifying the User Experience.
22.13 Commentary by David Travis
William Hudson writes knowledgeably and expertly about card sorting — as you would expect from someone who has been practising the technique for well over a decade. William’s chapter in the encyclopaedia will be a great help to those people new to card sorting who need a step-by-step tutorial through the technique.
For people who already have some experience with card sorting, I wanted to add a few words about dealing with some of the problems that come up when you do open and closed card sorting in practice. First: with an open card sort, how do you deal with a very large web site where you may have hundreds of items that need to be sorted? And second: with a closed card sort, how can you present the results back to clients in such a way that they understand the complex, quantitative data you have collected?
22.13.1 An open card sort with a very large web site
A few years ago, I worked with an auction web site to help them revise their online help system. There was a large number of help pages (over 850) and these had grown in an ad hoc manner. To ensure the new help system achieved its expected business benefits, the client needed to structure and organise the content before it was integrated into the new interface. However, even the most dedicated user won’t be happy sorting 850 cards of content, so we first had to do something to make the task manageable.
We began with a content inventory of the on-line help system. This was an important first step in describing the relationships between the different pages since it allowed us to answer questions like ‘Which help pages are most commonly accessed?’, ‘What search terms are most common?’ and ‘How many help pages does the typical user view in a session?’ Answers to these questions helped us classify the content into ‘critical’ and ‘secondary’ content. We also weeded out the ‘ROT’: content that was Redundant, Outdated or Trivial. These steps helped us reduce the sheer amount of content to something that was a bit more manageable.
Our next step was to examine the content and see if there were any obvious, stand-out topics or groups. At this point, we did in fact subject a couple of people (I was one) to the entire inventory sort to see if we could spot any obvious categories. With this approach we were able to find clusters of cards that we thought most people would place together. For example, imagine a corporate intranet that has dozens of HR policies (travel policy, environment policy, maternity policy etc). It’s self-evident that most people will place these policies in the same group, so there is little to be gained by asking people to sort every policy when instead you can use a small handful of exemplars of each group in the card sort.
These two techniques helped us reduce the number of items to around 100, an acceptable number for a card sort.
As a result of our work, the new information architecture reduced the number of support enquiries from users who were unable to find or understand content. Users were now able to solve issues themselves, which indirectly increased the number of listings, sales and registrations.
22.13.2 Presenting the data from a closed card sort
Last year, I worked with the intranet design team in the Royal Bank of Scotland. The bank has over 150,000 employees and the design team had embarked on a major overhaul of the intranet, which contained around half a million pages. The design team wanted to check if staff could find important content in the new structure, which had close to 1000 nodes.
We carried out a closed card sort much along the lines that William describes in his article. However, we wanted to make sure that we canvassed opinions from employees in several countries, including the US, the UK and India. Because of this, we decided to use a remote, unmoderated closed card sort. We asked a representative sample of bank employees to visit a web site that contained the intranet’s top-level navigation terms arranged in a tree structure (this helped us focus on navigation without the distractions of aesthetics). The participants’ task was to choose the right link for various tasks, such as “Find an expenses claim form”. Over 200 participants took part in the study.
The challenge with a study like this is presenting the results back to the design team in such a way that they can make an informed decision on the data. There are some obvious statistics to use — such as the number of participants who succeeded in the task — but equally useful for design is an understanding of the incorrect paths chosen by participants.
Figure 1 shows an example (for one task) of the way we chose to present the results. Note the following features of the graphic:
- The ‘tube map’ diagram shows the main paths participants took to find the answer. The green line shows the correct path and the red lines show commonly taken incorrect paths. A red circle indicates a node where people chose the wrong path.
- ‘Success rate’ shows the percentage of participants who found the correct answer. The error bars show the 95% confidence interval.
- ‘Success rate — detailed breakdown’ provides more background on the success rate measure, showing how many participants needed to backtrack to find the answer (“indirect success”).
- ‘Directness’ is the percentage of participants who didn't backtrack up the tree at any point during the task. The higher this score is, the more confident we can be that participants were sure of their answers (even if the answer is wrong). The error bars show the 95% confidence interval.
- ‘Time taken’ shows the median time taken by participants. The error bar shows the upper quartile. You can think of time taken as a measure of hesitation when completing the task.
- We also included a qualitative judgement on how the design performed on this task based on the measured success rate (“Very poor” through to “Excellent”) and a section that interprets the findings and provides suggestions for improvement.
Other than the tube map visualisation, we were able to extract most of these metrics from the online tool we used to collect the data (Treejack). This made the analysis and presentation relatively straightforward. (Many thanks to Rebecca Shipp, RBS Group, for permission to describe this case study).
22.14 Commentary by Chris Rourke
One of the cruel ironies of the web is that the more information there is on your website, the harder it is to find any one single piece of information. There is more haystack to sort through to find your needle. Well, that trend is not always true, and you can at least do your best to fight that tendency by doing a very good job of organising it all. Putting things into neat, well labelled groups, and using nested hierarchies will add sense to an otherwise overwhelming mass of information.
In the UX designer's toolbox, Card Sorting is the sharpest tool for creating a sensible hierarchical structure for your information. Its cousin Tree Testing is the best for checking the robustness of that structure. Used together they are essential tools for creating a usable information architecture that is the best possible organisation to let people find their information.
William Hudson has earned a reputation as a leading thinker and practitioner in the field of card sorting, and his SynCaps software has proven very useful (and time saving) for capturing and analysing card sorting results for me and many others in the UX field.
William's Card Sorting chapter is comprehensive and educational, supported by several helpful images and a simple context that all readers will understand - the world of fruit and veg. With that as the domain, he proceeds to clearly explain
It is the most comprehensible and readable explanation of card sorting I have read, and will be a key learning source (along with Donna Spencer's publication which was also referenced).
In particular it provides excellent visuals to explain the outputs from card sorting. Thankfully it goes beyond presenting the tree diagram (dendrogram) which unfortunately some practitioners are tempted to take, turn 90 degrees, and exclaim: TaDah! There's my new site map, I'm all done!
More experienced practitioners will know there is a lot more that needs to be done to interpret the tree diagram, and I was especially grateful that he clearly explained that the tree diagram alone does not always tell the clearest story. For instance an item that could have strong affinities to two distinct groups could end up having an apparently moderate weak relationship to them, if the tested people were split down the middle on which they associated it with. It is a clear case where good old fashioned qualitative information from talking to people is needed to make the best decision.
In my experience, how to moderate the sessions is important and can impact the results. For instance one tip I often employ (which is helpful in the situation described above where a card has two or more natural homes) is to ask the participant to place the printed card for the item where they feel it belongs most, but if they feel that it could very comfortably fit into other groups, they can take a blank card, write that item name on it, and place it in other groups they expect it could be. All copies of the card would be processed during data capture, with SynCaps V2 splitting the item between the selected groups. As William mentions, dendrograms only support a single location for each item but the split will be apparent in the items by items and items by groups charts. The split results can be considered by the practitioner in the development of the Information Architecture, perhaps as decent locations for cross links (such as 'see more' type links that take the visitor to related content in other sections).
Another moderating point to consider is the amount of verbal feedback the person is to provide during the session. The core UX method of usability testing relies on a verbal stream of consciousness from the participant as they go through their journey on a website. Personally, I feel that is not appropriate for card sorting, although I recognise verbal feedback is important, especially for understanding the category names and what items are easy or difficult to sort. I usually recommend they spread out the cards to get a bird's eye view of what they are to sort, then not disturb the participant as they see the patterns and "get in the zone", creating their own strategy for solving this particular Information Architecture conundrum. Only after they have sorted about half the cards and applied a few labels do I try to intervene with a gentle "how's it going?" type probe. The moderator should have that 6th sense that a hairdresser ideally has to be able to tell if the participant feels like talking or not, and not to force them to if they don't. Once all the cards are sorted (perhaps some in a "don't know" pile) then by all means a comprehensive debrief should be encouraged.
William gave some explanation to what I find the hardest part of card sorting – choosing what to sort when your information domain is a website with hundreds of items. Inevitably some consolidating of the items is needed, selecting only 1 or 2 representative items from what is an obviously clear group of wider items. This in itself often ends up being controversial or distracting, and always carries the risk of being used as a reason to play down the results of the card sorting (Oh yes, but you didn't include these 3 items in the sort, it could have been very different if you had...).
Finally, my preference is always try to apply the top-down method of tree testing (or reverse car sorting, or category testing) to balance the bottom-up method of card sorting. I find tree testing to be at least as useful a method to get clients to see the importance of a good, user-centred information architecture. After all, the process of tree testing is far more similar to the way people actually forage for information while navigating on a website. Furthermore the quantitative and statistical data that comes from it is very compelling especially when it can be done before and after a revision to the Information Architecture ("previously this topic was found by 50% of people without any errors, now that is up to 75%"). It can also be done remotely, and other resources in addition to the ones mentioned in the chapter include NaviewApp and UserZoom. Williams core area is card sorting, but if more could be presented on the top-down method perhaps it could be re-titled Card Sorting and Information Architecture research.
William Hudson's chapter is nonetheless comprehensive and meets the need of those new to card sorting and those with some experience. It will definitely be a valuable reference to those looking to implement this research to improve their site navigation and Information Architecture.