22. Card Sorting
The term card sorting applies to a wide variety of activities involving the grouping and/or naming of objects or concepts. These may be represented on physical cards; virtual cards on computer screens; or photos in either physical or computer form. Occasionally, objects themselves may be sorted. The results can be expressed in a number of ways, with the primary focus being which items were most frequently grouped together by participants and the names given to the resulting categories.
For the purpose of interaction design, the sorting process — usually performed by potential users of an interactive solution — provides:
- Terminology (what people call things)
- Relationships (proximity, similarity)
- Categories (groups and their names)
We can use this information to decide which items should be grouped together in displays; how menu contents should be organized and labelled; and perhaps most fundamentally, what words we should employ to describe the objects of our users' attention.
22.1 A practical example
Imagine that you are responsible for the information architecture of computerized touch-screen scales of the kind increasingly common in large supermarkets, shown in Figure 1. The screen displays 12 images and captions at a time. There have been some complaints that customers are spending a long time at the scales and are frustrated by how the categories are organized. Table 1 shows a list of sample items that customers need to find. These have been printed on cards with bar codes for easy data capture (see Figure 2 and the Syntagm web site). Figure 3 shows an example of the cards organized into groups. Since this is an 'open' sort, users make up their own groups and names for them. This particular grouping represents the current solution implemented in the scales, referred to as a 'reference sort', discussed later in this chapter.
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Broccoli / Calabrese
Courgettes / Zucchini
Squash / Marrows
Swede / Rutabaga
Copyright status: Unknown (pending investigation). See section "Exceptions" in the copyright terms below.
Take a moment to consider how you might organize these items yourself. For most people there are at least two groups — fruit and vegetables. But in a large supermarket two groups would contain very long lists of items which would not be helpful without further subdivision. Also, there may be some terms that are unfamiliar to you. Courgette is the French name for the long, green marrow (squash) seen in British supermarkets, while zucchini is the Italian name found in the US. Conversely, what is known as a rutabaga in the US is called a swede in the UK as it was introduced to Scotland by the Swedes. Where simple language differences like these are known in advance, listing the alternatives on a single card is probably a satisfactory solution. However, in novel problem domains or in multicultural/multilingual situations where terminology is a larger issue, it may be better for participants to sort photographs or even the objects themselves (with a barcode label attached).
Whatever you are sorting, you will end up with some things (items) arranged in groups, ideally with group names. The next challenge is how to make sense of these, particularly when you have tens or hundreds of participants. No matter how the analysis is done, there are at least two things we want to know:
- What were the groups called and what was in each?
- Which items were grouped together most often?
Be careful to note that these are two separate sets of information. That grapefruit and oranges were always grouped together in the sample study is not affected by the fact that several different group names were used. Also, not surprisingly, other items were grouped with grapefruit and oranges — but the nature of these items varied with the approach taken by participants. If the group was called simply 'fruit' it contained apples, pears and other fruits as well as grapefruit and oranges. If it was called 'citrus', the only addition was lemons. So, to get a good idea of what the sort is telling us, we use different kinds of analysis. The first two correspond to the things we wanted to know:
- An items by groups chart shows what the groups were called and what was in each
- An items by items chart shows which items were grouped together most often
22.1.1 Items by groups chart
You can produce simple versions of the charts yourself with pencil and paper or a spreadsheet and printer. First, the items by groups chart:
- List all of the items that were sorted down the left-hand side of the page. As this needs to be done so that you can find each item quickly, alphabetic order is probably best (a word processing or spreadsheet package can help with sorting).
- Scanning through the sort results, for each new group write its name as a column heading. Place a mark in each item cell that is contained within the group. So if the first group is called 'Citrus Fruit', we would write this as a column heading and then mark the cells for oranges, lemons and grapefruit . Figure 4 shows this example.
- If another participant uses the same group name (or if it is a 'closed' sort where you have provided all of the group names), you will only need to write the column headings once. However, for open sorts, be prepared for many variations in spelling and wording. For example, 'soft fruit' versus 'berries'. It is generally best to keep such different terms separate during data capture and decide whether to merge the results at a later stage.
- If we were to reorder the items using cluster analysis (discussed later), a chart similar to that shown in Figure 5 would result. This has the same layout as the worksheet in Figure 4 — items are listed down the left-hand side and groups across the top. In the body of the chart, the square cells represent the number of times each item appeared in the named group, expressed as a shade of a chosen colour — this corresponds to the number of marks you would have made in your hand-generated version. (Percentage values are available in the application by clicking on a cell; the figure shows the result for ‘Carrots’ in the ‘Root Veg’ group.) Table 2 provides more details of the shading used.
22.1.2 Items by items chart
The items by items chart is a little more challenging to produce:
- List all of the items your participants sorted down the left-hand side of the page in alphabetic order. Repeat the list in the same order across the top of the page. You now have a matrix of items. To avoid confusion and duplicated effort, draw a line through the diagonal — from the top-left to the bottom-right, where each item meets itself, and decide which half of the matrix you are going to use. Then shade the other half. This is so you are forced to put ‘Oranges’ x ‘Grapefruit’ into the same place as ‘Grapefruit’ x ‘Oranges’. You should end up with something similar to the worksheet shown in Figure 6. The top-right of the matrix has been greyed-out and will not be used.
- Using the sorted cards, place a mark in each cell for every pair of items that appears in the same group. For example, if we came across a group called 'citrus' we would probably find it contained grapefruit, oranges and lemons, so we would mark the cells grapefruit x oranges, grapefruit x lemons and oranges x lemons. This is a simple case; for larger groups there are many marks to make: (n2 — n) / 2. This is because we want all possible pairings (n2) excluding items paired with themselves (-n), plus we don't need to distinguish between the order of pairs — so apples x pears is the same as pears x apples. This allows us to halve the matrix and consequently the number of marks to be made (/2 in the formula). So if you have a group of 8 items, sharpen your pencil and get ready to make marks in 28 cells. Twelve items yield 66 marks, as shown in Figure 7. (Bear in mind that these are the values for a single participant. Either keep a running total in each cell or add additional marks as you process subsequent participants. Alternatively, use a single sheet for each participant and simply add the results together at the end. This approach has the distinct advantage of allowing you to find and fix errors as well as making visual comparisons of participants’ sorting methods.)
- Repeat for all participants. When completed, the number of marks in each cell represents how often participants grouped item pairs together. This is called an 'items by items' (or 'pairs') chart. Figure 8 shows a computer-generated version, with the items reordered using cluster analysis. Rather than labelling the rows and columns separately, the item names are shown on the diagonal. Note that because we have removed half the matrix, most items are folded at the diagonal. For example, 'Carrots' starts as a row on the left and then continues as a column running down the page at the diagonal. The dashed lines in the figure separate the clusters — based on the average number of groups created by participants (four).
- As for the items by groups chart (Figure 5), the square cells represent the percentages of participants as a shade of the selected colour as detailed in Table 2. In the items x items chart, however, each cell represents a pair of items that were placed together in the same group.
22.2 What the analyses mean
While it is tempting to think that a card sorting project is going to immediately provide a navigation hierarchy, this is rarely the case. The results inform a design process; they do not provide a packaged solution. The sample fruit-and-vegetable project described here provides a realistic case in point — the results are far from conclusive.
What do we know for sure from the analyses? Refer back to Figure 5 and Figure 8 and see what conclusions you can draw before proceeding.
Both charts include the results of a cluster analysis that divide the items into four groups. The items by groups chart (Figure 5) shows that the most popular names for the four groups were 'Fruit', 'Spices', 'Vegetables' and 'Root Veg'. 'Citrus Fruit' was a strong contender for grapefruit, oranges and lemons, while some participants (about a third) did not distinguish between 'Root veg' and 'Vegetables'.
Anything else? What about fennel? In both charts it should be possible to see that fennel has been grouped with a wide variety of other items. Although the cluster analysis placed it in the group called 'Spices' almost 20% of participants sorted it into the 'Vegetables' group. There may be nothing we can do about this other than providing access to fennel from both groups — easily done on a computerized scale or web site.
Focussing on the items by items chart for a moment, we see an important feature of the items themselves — independent of group names. Very few participants attempted to group the fruits with any of the vegetables. This shows a clear understanding of and distinction between these two main categories that we certainly should build on when designing a suitable information hierarchy. In contrast, the charts show a good deal of participant ambiguity over onions and leeks. These were frequently grouped with root vegetables, but the items by items chart shows an affinity — particularly for onions — with the group most commonly referred to as 'spices'.
What conclusions can we draw from this example? The first is that while we have learned a great deal about our participant's appreciation of the terminology, categories and concepts, the exercise was too limited for the results to be applied to a larger information space. Specifically, the small number of fruits provided in the example encouraged participants to place them in a single group. This may not be realistic in practice, although we do have some suggestions for refinement — 'Citrus Fruits' and 'Berries' from Figure 5. One solution would be to provide participants with a larger range of fruits, including what are called exemplars (representative types) of the categories we expect. An alternative approach would be to brief and monitor participants more closely. This is difficult to do in an online sorting activity — even if the briefing is very detailed, participants may fail to see it, read it or act on it. Most of these issues can be overcome in face-to-face sorting. If facilitators see participants producing too few categories, they can simply cajole them to create more.
So far we have touched on two popular methods of analysing card sorts — there are others which will be discussed later. But first a little background...
22.3 The History of card sorting
Card sorting has a surprisingly long history, especially if the concept of categorization is included. The ancient Greeks are credited with the early development of categories, with Aristotle providing the foundations for the categorization scheme that we use today for plants and animals (Sachs 2002). The practice of sorting cards in the social sciences is somewhat more recent, but still well over 100 years old. Initially, printed playing cards were used for a variety of experiments in the nascent field of psychology (Jastrow 1886), but these were joined relatively quickly by blank cards on which researchers would write words to be categorized by subjects (Bergström 1893). Early card sorting activities were primarily concerned with establishing characteristics of the subjects — the speed of sorting used as an indicator of mental processes and reaction time (Jastrow 1886; Jastrow 1898); memory function (Bergström 1893; Bergström 1894) and imagination — using inkblots on cards (Dearborn 1898). Some of these experiments developed into what is now considered to be a standard test for neurological damage in patients who have suffered head injuries, the Wisconsin Card Sorting Test (Eling et al. 2008). In fact, card sorting was so well received in psychology that an article appeared in Science as early as 1914 espousing the virtues of various types of card-based activities (Kline and Kellogg 1914).
Card sorting also made its way into other fields: criminology (Galton 1891), market research (Dubois 1949), semantics (Miller 1969) and as a standard qualitative tool in the social sciences (Weller and Romney 1988; Bernard and Ryan 2009). However, it was not until the emergence of the World Wide Web in the early 1990's that card sorting was applied to the task of organizing information spaces (Nielsen and Sano 1995), with the rare exception that Tom Tullis applied card sorting to the design of menus for an operating system in the early 1980's (Tullis 1985).
22.3.1 Card sorting and the design of interactive products
Despite the popularity of the web, card sorting remains an under-used tool in the design of interactive products. In a survey of 217 attendees of Usability Week 2008, Nielsen Norman Group reported that the average number of card sorts conducted per year was 2. While this is twice as frequent as eye-tracking studies in the survey (average 1 per year), this is a surprisingly low number given that there are no large up-front investments required. In fact, card sorting has had only a peripheral role in interactive product design since its inception — perhaps reflecting the limited uptake of user-centred design methods in general. Peter Morville and Louis Rosenfeld devote only a few pages to card sorting in their seminal work, Information Architecture — now in its third edition (Morville and Rosenfeld 2006). And at the time of writing, there is only one book available on the topic of card sorting for interactive systems design, Donna Spencer's Card Sorting: Designing Usable Categories (Spencer 2009), which tends to be fairly conservative in terms of analysis.
22.4 Benefits of card sorting
For interaction design, customer research or research in the social sciences, few investigative techniques are as effective as card sorting in dealing with large numbers of concepts. In face-to-face settings, handling and annotating physical cards is a fairly natural and unintimidating process: observing users engaged in this process can result in many insights for researchers and provide a fertile source of questions and conversations about the problem domain being studied and, of course, users themselves. These outcomes and opportunities are hard to obtain through interviews, questionnaires and usability evaluations, although each of these alternatives has its strengths for more limited scopes of investigation. For example, it is relatively easy to discover that a single menu item is mislabelled in a usability study, but prohibitively expensive for several dozen items.
22.5 Qualitative versus quantitative outcomes
At one extreme, card sorts can be conducted on a one-to-one basis as a tool for discovery (knowledge elicitation) and a means of generating meaningful discussion between participants and researchers (Weller and Romney 1988; Bernard and Ryan 2009). The outcomes here are generally a better understanding of the problem domain from a user's perspective with terms, relationships and categories expressed in the resulting groups. At the other extreme, it is very easy to organize online sorts with hundreds of participants to discover whether the terminology and concepts presented are well understood across a large user population (Fincher and Tenenberg 2005). While results in the one-to-one approach are primarily qualitative, those of the large-scale online studies are mostly quantitative. (Note that it is not impossible to obtain qualitative information from online studies; there simply are not as many opportunities to persuade or allow online participants to provide useful feedback. )
22.6 What to sort
Not surprisingly, the choice of what to have participants sort depends largely on what a researcher, information architect or interaction designer is trying to discover. For 'green-field' projects — those that lack any constraints imposed by prior work — a first priority would be to establish a vocabulary. In this context, users could be presented with objects, images or descriptions of items and asked to name them. Once named, they could be grouped, with the groups in turn also named. This is fairly easy to do in face-to-face settings, where numbered or bar-coded labels can be applied to objects or photos (see, for example, the card sorting templates for Microsoft Word at the Syntagm web site). Note that some web-based sorting packages, such as websort.net, do allow photos to be sorted, but provide no means for users to apply names to the items depicted.
- Fixed Items: If terminology is already established and immutable (such as product names), then basic research as described above is unnecessary. The primary goal of a sorting activity would be to discover which items should be grouped together and what these groups should be called. This is a relatively straightforward undertaking for either face-to-face or online approaches. The choice would largely be determined by whether qualitative feedback is desired (for which face-to-face sorting with paper cards would be most appropriate) or if qualitative feedback using larger numbers of participants would be beneficial. Good quality results can be obtained from 15-30 participants in a face-to-face context (Nielsen 2004; Tullis and Wood 2004) while online sorts can be conducted for hundreds of participants at no additional cost except for recruitment. Also, large-scale studies can be useful for increasing engagement within an organization or ensuring that a diverse collection of users have a similar understanding of a problem domain.
- User Goals: Card sorting is frequently applied to navigation design. However, simply listing the names of documents, pages or features that will be present in a solution does not guarantee that users will be able to reach their goals, even if they are organized optimally. Starting with user goals helps to ensure that navigation design is effective. So rather than asking participants to sort items such as “Employee Manual”, “Staff Policies” and “HR Guide” (all of which confusingly overlap), consider instead the goals that users have in accessing these documents: “Find holiday entitlement”, “Can I work at home?”, “How much time can I take off for a new baby?” and so on. (Tom Tullis employed user goals in his design of operating system menus — (See Tullis 1985)). Server logs, particularly search phrases; content audits; and user research can be used to build a list of user goals, with card sorting providing grouping and category names.
- Multilevel Hierarchies: Most sorting and analysis tools do not support the kind of multiple-level hierarchies found in all but the simplest interactive solutions. Even the produce scales used in the sample card sort could use a multilevel hierarchy. For example, a top-level category called 'Fruit' might lead to 'Citrus Fruit', 'Apples and Pears', 'Exotic Fruit' and so on. However, the lack of analysis support for multilevel hierarchies is not an insurmountable problem. In fact, multilevel hierarchies at the analysis stage can increase the complexity of a sorting activity substantially, thereby making it a daunting undertaking for many participants. Instead, conduct multiple single-level sorting activities. Focus on the lowest levels (the 'leaves' of the navigation tree) since category names provided by participants often vary considerably in their levels of abstraction, as we saw in the example. Participants' category names included 'Fruit', 'Soft fruit' and 'Berries'. Each of these could be appropriate for higher-level navigation headings. (Multilevel sorting is discussed in more detail under Section 9, Advanced analysis.)
22.7 How to do a card sort
22.7.1 Choosing an approach
Face-to-face sorting methods are generally better for qualitative research, while online methods (web-based or desktop) are more appropriate for quantitative results. However, this is not always true; for example, it would be possible to sit with a participant or share their desktop while they conducted an online sort. This could result in good qualitative data, but it would be more intimidating for participants and much harder work for the facilitator. Remote desktop sharing can also be technically challenging, especially in the presence of corporate firewalls and security policies.
Researchers or interaction designers can also choose between
- open sorting, where users make up their own categories
- closed sorting, where categories are predefined
- hybrid sorting; some combination of the two
For most purposes, open sorting is the best choice, although supplying some predefined categories is always helpful to participants and is supported by most sorting and analysis tools. Closed sorting can be used when trying to establish changes required to an existing structure, particularly with analysis tools that provide comparisons between a 'reference sort' (such as an existing or proposed solution) and participants' results — see Figure 9.
In this items by groups chart, Figure 9, the current solution is shown with black squares in a cell. So while most participants choose to group all fruit together, the computerized scales used two unusual groups; 'Grapes & Citrus' and 'Exotic Fruit'. However, there were some areas of correspondence: many participants agreed with the current design for the 'root veg' group towards the bottom-centre of the chart.
22.7.2 Recruiting and briefing participants
As with any other form of user-centred design, participants of a card sorting activity should be representative of the users envisaged for the solution. However, given the difficulties that some members of the population may have with technology (older users for example) it is often beneficial to over-sample these groups to ensure that the resulting design is effective for as broad an audience as possible. Where possible, try to use participants who are motivated to participate by interests that are more than purely monetary — existing users or customers for example.
When briefing participants for a sorting activity, it does not pay to be too vague in stating the requirements. In navigation design the number of categories needed for a set of items is not a complete mystery. There is usually a balance to be struck between the number and size of groups (Kiger 1984). Consequently, it is important to provide participants with adequate information about the number and level of groups you require. If you are trying to devise menus for our computerised produce scale having space for 12 items on the screen, do not be shy about letting participants know that. Similarly, horizontal menu bars on websites or desktop applications rarely have space for more than 6 or 8 items. Allowing participants to generate 20 or 30 categories in these cases is potentially a waste of their time and yours.
Similarly, if you have group names that you know, or at least strongly suspect you need, provide those to participants. This can be done in both face-to-face and online settings. But do encourage participants to make up their own group names if they prefer.
Participants should also be advised on how to deal with items they do not understand. While some researchers or interaction designers suggest that all items should be sorted — leaving participants simply to guess at those they do not recognize — this can lead to spurious groupings. Consider asking users simply not to sort items they do not recognize, or create a specific 'unknown' group to receive them. These can then be excluded from the results. Most online sorting tools now do allow items to remain unsorted. However, make sure that analysis results are based on the number of participants rather than the number of times that an item was sorted.
22.7.3 Time to sort
The amount of time required to perform a sort can vary considerably from person to person, but is largely dependent on the number of items to be sorted:
- Approximately 20 minutes for 30 items
- 30 minutes for 50 items
- 60 minutes for 100 items
However, other factors include how familiar the terms and concepts are to participants and how motivated they are to provide results conscientiously. Also, it is possible to sort up to 150 cards in single sessions, but higher quality results might be obtained by splitting such a large project into smaller parts.
22.7.4 Preparing a sort
For face-to-face (paper-based) sorting, getting items and group names onto cards can be a tedious undertaking. Happily, standard mail-merge software can be used to make this task easier, meaning that items can be printed either directly onto cards or self-adhesive labels. Free mail-merge templates for Microsoft Word can be found on the Syntagm web site for both North American and European paper sizes. These also include bar codes that can be used to simplify data collection: instead of typing in an item name or number, the bar codes allow them to be read directly using a simple USB scanner. This is both quicker and less error-prone than manual entry — it makes it relatively easy to process 120 cards or more per minute (full instructions are included on the web page referred to).
Preparation for online sorting is relatively straightforward, requiring only lists of the items and group names (if any) to be uploaded.
However, regardless of the method of sorting, be aware that superficial similarities in the names used can produce unhelpful results. Consider these menu item names from an intranet:
- Manage absence and holidays
- Manage difficult colleagues
- Change management
If faced with a large number of items to sort, participants may simply group similar names together. This is called a superficial match. To overcome this, consider modifying the item names:
- Absence and holidays
- Coping with difficult colleagues
- Change management
In the first two items the word 'manage' was not an essential part of the name. Removing it or using a synonym prevents unwanted grouping.
22.7.5 Choosing names
Apart from the issue of superficial similarities mentioned above, be careful to choose names that are in common use, especially where interactive solutions are being designed for a broad range of abilities. This is not just common sense, but also a requirement of disability discrimination legislation in many countries. Put simply, language should be no more complex than needed to convey the required information. In English, longer words (measured in syllables) are used much less frequently than shorter ones (Klare 1963). And even though participants in a card sort might suggest unusual names for items or groups — such as 'brassicas' — most people will go into their local supermarket or green grocers asking for cabbage rather than use its Latin genus. If in doubt, consult a reference on common words such as the Corpus of Contemporary American English, the British National Corpus or similar sources for other languages.
22.8 How to understand the results
For very small projects, just leafing through the sorted cards or listing of online results can provide useful insights into groupings. However, larger projects will require some form of analysis, ranging from simple tabulation through to cluster analysis. Note that while cluster analysis is potentially a very complex subject (Romesburg 2004; Bernard and Ryan 2009), most card sorting tools use a fairly simple form of cluster analysis that could easily be replicated manually. It is known as 'hierarchical cluster analysis'. The 'hierarchy' in this case refers to the way in which smaller clusters are aggregated to form larger ones until all are included.
22.8.1 Simple analysis
Simple tabulation of items by groups can be performed manually (as described above) or by using a spreadsheet package such as can be found at Boxes and Arrows. However, online sorting tools will do this analysis for you. For printed cards using the Microsoft Word mail-merge templates described earlier, SynCaps V2 and later will produce items by items, items by groups and dendrogram analyses.
Figure 10 is an items by groups chart showing an alternative presentation to that of Figure 5. In both cases, the items are listed down the left-hand side of the chart with the group names across the top. A cluster analysis has been performed to determine which items are most closely related, producing an item ordering that moves from one cluster to the next. The only significant difference between the two figures is that Figure 5 uses shading to show the relative strength of each relationship (figures are available by clicking on a cell) while Figure 10 presents the percentage figures with blue shading only to highlight the most significant results.
22.8.2 Cluster analysis
The type of cluster analysis performed by most card sorting tools is 'hierarchical cluster analysis' or HCA. The usual result is a graphical display called a dendrogram, or sometimes 'dendrogram', which has its roots (literally) in the Greek word for 'tree', which is 'dendron'.
Figure 11 shows a hierarchical cluster analysis in the form of a dendrogram. The example is taken from an intranet navigation sorting activity. The hierarchical nature of the dendrogram is related to the strength of the relationships between items, as measured by how frequently they appeared in the same groups. And as in real trees, shorter branches are stronger. In Figure 11, the six items at the bottom all include the word 'leave'. However, participants have primarily grouped 'Adoption -', 'Parental -' and 'Special' Leave as being closely related, but were less consistent with 'Maternity & Paternity - ', 'Annual -' and 'Sick' Leave. Finally, 'Work Breaks' was sometimes grouped with the leave items, but this relationship is fairly weak compared with the others. If you wanted to know why the work breaks item relationship is weaker, you would need to consult an items by items chart, an items by groups chart or the raw proximity matrix if available — the latter simply showing the number of times each pairing of items appeared together in the same groups.
The dendrogram also gives some insight to the way the cluster analysis works. The method used is called 'agglomerative clustering', meaning simply that we build the clusters from the bottom up. So in the intranet example, the first cluster would have started with the last three 'leave' items — they have the shortest branches — with 'Maternity & Paternity Leave' subsequently subsumed. Then, looking again at Figure 11, the next strongest relationship appears towards the top of the chart, as 'Resignation' and 'Survey of leavers policy'.
As items are agglomerated into clusters, an average score (based again on the number of times pairs of items appeared in the same groups) is calculated. This is shown in the dendrogram by how far the vertical connecting lines are from the labels. As mentioned above, the resulting branches reflect stronger relationships when they are shorter — that is, when the vertical connecting lines are closer to the labels, as for the bottom three items in Figure 11.
In a dendrogram, clusters are joined together into branches until all items have been included. This means that the weakest relationships — between dissimilar clusters — can be found furthest from the item labels. Although Figure 11 does not show a complete dendrogram, it does include three long branches that are continued off to the right. These represent three dissimilar clusters; each will require their own category labels (which could be derived from an items by groups chart). Note that dendrograms take no account of group names; it may well be that even though ‘Adoption -’, ‘Parental -’ and ‘Special Leave’ were grouped together frequently, participants may have applied a wide variety of names to that grouping. Also be aware that in a dendrogram, items can appear in only one place. Therefore, if an item was split equally by participants between two different groups, it would appear only as a weak relationship in one of them. You would need to visit the items by groups chart to notice this.
22.9 Advanced analysis
In trying to make sense of card sorting results, there are two problems that frequently recur. The first is that not all participants have the same motivation, experience or needs. This means that we may have participants whose sort results are simply 'noise' — particularly for online sorts with an attractive incentive. In other cases we may believe we have one relatively homogenous group of participants, when in fact we have multiple. This can be due to general factors such as experience — in which case we need to accommodate these multiple groups in our designs; or it may be due to different contexts of use. In the latter case we should try to understand the differences and to decide whether separate designs are warranted. Unfortunately, traditional card sorting analysis tools are not much help here. But some of this information can be obtained manually — by examining the number and size of groups produced by each participant, for example: those in a hurry tend to have fewer groups and a large number of items in unhelpful categories such as 'don't know' or 'miscellaneous', while those who have a substantially different view of the problem domain may produce an unusual number of groups (relative to the average). Optimal Workshop has added some participant-oriented results to their web-based service. Fairly detailed participant and item spreadsheets can be found in all versions of SynCaps.
The second recurring problem is related to the basic principle of cluster analysis: every item is assigned to exactly one cluster. To a certain extent, this can be worked-around by careful inspection of the items by items and items by groups analyses. For example, an item such as a cucumber might be split equally between 'green vegetables' and 'salad vegetables'. It will appear in the dendrogram in either of these groups — the choice will be arbitrary if the split is exactly 50:50 — with a fairly weak relationship. However, the weakness of the relationship is not because participants were confused about where it should go; they just did not agree. The items by items and items by groups charts would show this clearly. However, because of this limitation of cluster analysis, some researchers have explored other advanced statistical techniques; most notably factor analysis. See Capra 2005 and Giovannini 2012. A more detailed account of card sorting analysis methods can be found in (Corter 1996) and (Coxon 1999).
22.9.1 Multilevel sorting
The primary method of sorting discussed in this chapter can be described as single-level or 'flat'. Participants are given a set of items which they should sort into a single level of groups. So while it might be tempting to nest groups — 'leaf vegetables' within 'green vegetables' within 'vegetables', for example — there are two issues to be aware of:
- Limitations of analysis: The most common methods of analysis use a single measure of closeness or proximity of related items. This is based on how frequently items were placed together by participants. It is not practical to perform a cluster analysis on multiple group levels, but it is relatively straightforward to apply weightings to item proximities according to whether they appeared in the same group, a sub-group, a sub-sub-group and so on. Items that appear together in the same group would receive the highest weighting, pairs split between immediate subgroups a slightly lower weighting; and those split between second-order sub-groups lower still (and so on). For example, cucumber and courgette/zucchini would receive the maximum weighting if they both appeared in a 'Green Vegetables' group but a lower weighting if courgettes/zucchini appeared in a group named 'Green Vegetables' and cucumber in a sub-group named 'Salad Vegetables' (illustrated in Figure 12 using a maximum weighting of 2). This is the approach taken by the (now defunct) EZsort/Usort (Dong et al. 2001) and the free SynCaps V1 packages in their anonymous single-level sub-groups implementation. (Anonymous sub-groups are simply unnamed.) This has been extended to multiple levels by packages such as UXsort (uxsort.com) and SynCaps V3 (Syntagm Ltd). SynCaps V3 also provides an analysis of sub-group names used at each level. See Harloff 2005 for a further discussion of weighted multilevel sorts.
- Scale and complexity: One of the biggest challenges with multilevel card sorting is the considerable increase in the number of items to be sorted and the resulting solutions (Wood and Wood 2008). Consequently, it would be inadvisable to give participants the entire navigation hierarchy of a large intranet or e-commerce site and ask them to organize these as they see fit. Participants in card sorts are users, not information architects. Multilevel card sorting is much more likely to be effective when the potential solutions are partially defined or constrained. Even then, researchers and designers may get more useful information from a series of single-level sorting activities where this is practical.
22.10 Tree Sorting
Tree sorting (also called 'tree testing' and 'reverse card sorting') is a concept related to card sorting, but in many respects quite different. In essence it is a simulation of a navigation tree that would be found in a software application or web site. Online participants are presented with goals and then asked to navigate using the tree simulation. Figure 13 illustrates the process across several screens (step 1 is the first screen; step 2 is the second and so on). In step 1, the participant has chosen 'Fruit', while in step 2 'Soft Fruit' was selected. If the wrong selections are made, participants will need to back-track to find a more appropriate menu. A large number of tasks can be made available, with only a random subset displayed to each participant if required.
On completion of a project, researchers and designers can be presented with success rates, error rates and time taken (or related variations). While closed card sorting can be of some help in validating a navigation design, tree sorting is a more effective approach in most cases. (See plainframe.com and optimalworkshop.com)
22.11 Where to learn more
Aside from the references listed below and particularly Donna Spencer's on card sorting (Spencer 2009), there are a number of helpful web resources:
22.12 Commentary by Jeff Sauro
What is it, how do we use it, where did it come from and how do we interpret the results? That’s what you want to know when using a method like card sorting. Hudson delivers succinct points and comprehensive coverage on this essential UX method. He accurately articulates how card sorting generates both qualitative and quantitative data and illustrates how interpreting one of the signature graphs of card sorting (the dendrogram) involves both data and judgment. Here are a few more points to consider when quantifying the results of a card sort.
22.12.1 Confidence Intervals
Card sorts - like most User Research methods - involve working with a sample (often a small one) of the larger user population. With any sample comes uncertainty as to how stable the numbers are. One of the most effective strategies is to add confidence intervals around the sample statistics. A confidence interval tells us the most plausible range for the unknown population percentages.
For example, let’s assume 20 out of 26 users (77%) were able to successfully find Strawberries under the “Soft Fruit” category (Figure 22.13). Even without measuring all users, we can then be 95% confident between 58% and 89% of all users would successfully locate strawberries (assuming our sample is reasonably representative).
The margin of error around our percentage is +/- ~16%. The lower boundary of the confidence interval tells us that we can be 95% confident 58% or more of users would find the location of Strawberries. If we have as a rudimentary goal to have most users find the fruit then we have evidence of achieving this goal.
We can apply the same method to qualifying the percentages of cards placed into a category. For example, let’s assume 70 participants conducted the card-sort shown in Figure 22.10. We see that 46% placed “Getting a New Person Started” in the “Joining” category but 39% placed this card in the “Hiring New People” category.
The 95% confidence interval for the “Hiring New People” category is between 35% and 58% and between 28% and 50% for the “Joining” category (see the figure above). The substantial overlap in the confidence intervals means we shouldn’t have much confidence in this difference. An online calculator is available at http://www.measuringusability.com/wald.htm to make the computations.
Due to the large overlap in the intervals we cannot distinguish the 5 percentage point difference from sampling error. If we need to pick one we should go with “Joining” but we should consider both categories as viable options.
22.12.2 Sample Sizes
As with most evaluations, when involving users one of the first questions asked is “How many users do I need?” Surprisingly, there is little guidance on determining your sample size other than the 2004 Tullis and Wood article. Tullis and Wood performed a resampling study with one large card sort involving 168 users and found the cluster results would have been very similar (correlations above .93) at sample sizes between 20-30.
This sample size is based on the particulars of a single study (agreement in card placement and 46 cards) and on viewing the dendrogram so the results are most appropriate if your study is similar to theirs.
Another approach to sample size planning is based on the percent of users who place cards in each category chart (Figure 22.10) or correctly select the right path in tree testing (Figure 22.13). This approach is based on working backwards from the confidence intervals like those generated in the previous section. In the first example we had a margin of error of 16% around the percent of users who would correctly locate strawberries.
If we wanted to generate a more precise estimate, and cut our margin of error in half to +/- 8% we work backwards from the confidence interval and get a required sample size of 147. The following table shows the expected margins of error for 95% confidence intervals at different sample sizes.
|Sample Size|| |
Margin of Error (+/-)
The computations are explained in Chapter 3 and Chapter 6 of Quantifying the User Experience.
22.13 Commentary by David Travis
William Hudson writes knowledgeably and expertly about card sorting — as you would expect from someone who has been practising the technique for well over a decade. William’s chapter in the encyclopaedia will be a great help to those people new to card sorting who need a step-by-step tutorial through the technique.
For people who already have some experience with card sorting, I wanted to add a few words about dealing with some of the problems that come up when you do open and closed card sorting in practice. First: with an open card sort, how do you deal with a very large web site where you may have hundreds of items that need to be sorted? And second: with a closed card sort, how can you present the results back to clients in such a way that they understand the complex, quantitative data you have collected?
22.13.1 An open card sort with a very large web site
A few years ago, I worked with an auction web site to help them revise their online help system. There was a large number of help pages (over 850) and these had grown in an ad hoc manner. To ensure the new help system achieved its expected business benefits, the client needed to structure and organise the content before it was integrated into the new interface. However, even the most dedicated user won’t be happy sorting 850 cards of content, so we first had to do something to make the task manageable.
We began with a content inventory of the on-line help system. This was an important first step in describing the relationships between the different pages since it allowed us to answer questions like ‘Which help pages are most commonly accessed?’, ‘What search terms are most common?’ and ‘How many help pages does the typical user view in a session?’ Answers to these questions helped us classify the content into ‘critical’ and ‘secondary’ content. We also weeded out the ‘ROT’: content that was Redundant, Outdated or Trivial. These steps helped us reduce the sheer amount of content to something that was a bit more manageable.
Our next step was to examine the content and see if there were any obvious, stand-out topics or groups. At this point, we did in fact subject a couple of people (I was one) to the entire inventory sort to see if we could spot any obvious categories. With this approach we were able to find clusters of cards that we thought most people would place together. For example, imagine a corporate intranet that has dozens of HR policies (travel policy, environment policy, maternity policy etc). It’s self-evident that most people will place these policies in the same group, so there is little to be gained by asking people to sort every policy when instead you can use a small handful of exemplars of each group in the card sort.
These two techniques helped us reduce the number of items to around 100, an acceptable number for a card sort.
As a result of our work, the new information architecture reduced the number of support enquiries from users who were unable to find or understand content. Users were now able to solve issues themselves, which indirectly increased the number of listings, sales and registrations.
22.13.2 Presenting the data from a closed card sort
Last year, I worked with the intranet design team in the Royal Bank of Scotland. The bank has over 150,000 employees and the design team had embarked on a major overhaul of the intranet, which contained around half a million pages. The design team wanted to check if staff could find important content in the new structure, which had close to 1000 nodes.
We carried out a closed card sort much along the lines that William describes in his article. However, we wanted to make sure that we canvassed opinions from employees in several countries, including the US, the UK and India. Because of this, we decided to use a remote, unmoderated closed card sort. We asked a representative sample of bank employees to visit a web site that contained the intranet’s top-level navigation terms arranged in a tree structure (this helped us focus on navigation without the distractions of aesthetics). The participants’ task was to choose the right link for various tasks, such as “Find an expenses claim form”. Over 200 participants took part in the study.
The challenge with a study like this is presenting the results back to the design team in such a way that they can make an informed decision on the data. There are some obvious statistics to use — such as the number of participants who succeeded in the task — but equally useful for design is an understanding of the incorrect paths chosen by participants.
Figure 1 shows an example (for one task) of the way we chose to present the results. Note the following features of the graphic:
- The ‘tube map’ diagram shows the main paths participants took to find the answer. The green line shows the correct path and the red lines show commonly taken incorrect paths. A red circle indicates a node where people chose the wrong path.
- ‘Success rate’ shows the percentage of participants who found the correct answer. The error bars show the 95% confidence interval.
- ‘Success rate — detailed breakdown’ provides more background on the success rate measure, showing how many participants needed to backtrack to find the answer (“indirect success”).
- ‘Directness’ is the percentage of participants who didn't backtrack up the tree at any point during the task. The higher this score is, the more confident we can be that participants were sure of their answers (even if the answer is wrong). The error bars show the 95% confidence interval.
- ‘Time taken’ shows the median time taken by participants. The error bar shows the upper quartile. You can think of time taken as a measure of hesitation when completing the task.
- We also included a qualitative judgement on how the design performed on this task based on the measured success rate (“Very poor” through to “Excellent”) and a section that interprets the findings and provides suggestions for improvement.
Other than the tube map visualisation, we were able to extract most of these metrics from the online tool we used to collect the data (Treejack). This made the analysis and presentation relatively straightforward. (Many thanks to Rebecca Shipp, RBS Group, for permission to describe this case study).
22.14 Commentary by Chris Rourke
One of the cruel ironies of the web is that the more information there is on your website, the harder it is to find any one single piece of information. There is more haystack to sort through to find your needle. Well, that trend is not always true, and you can at least do your best to fight that tendency by doing a very good job of organising it all. Putting things into neat, well labelled groups, and using nested hierarchies will add sense to an otherwise overwhelming mass of information.
In the UX designer's toolbox, Card Sorting is the sharpest tool for creating a sensible hierarchical structure for your information. Its cousin Tree Testing is the best for checking the robustness of that structure. Used together they are essential tools for creating a usable information architecture that is the best possible organisation to let people find their information.
William Hudson has earned a reputation as a leading thinker and practitioner in the field of card sorting, and his SynCaps software has proven very useful (and time saving) for capturing and analysing card sorting results for me and many others in the UX field.
William's Card Sorting chapter is comprehensive and educational, supported by several helpful images and a simple context that all readers will understand - the world of fruit and veg. With that as the domain, he proceeds to clearly explain
- The need for card sorting
- The types of dilemmas card sorting planners and participants encounter (e.g. the same fruit called 2 different things)
- The process for performing card sorting
- Ways to analyse the data
It is the most comprehensible and readable explanation of card sorting I have read, and will be a key learning source (along with Donna Spencer's publication which was also referenced).
In particular it provides excellent visuals to explain the outputs from card sorting. Thankfully it goes beyond presenting the tree diagram (dendrogram) which unfortunately some practitioners are tempted to take, turn 90 degrees, and exclaim: TaDah! There's my new site map, I'm all done!
More experienced practitioners will know there is a lot more that needs to be done to interpret the tree diagram, and I was especially grateful that he clearly explained that the tree diagram alone does not always tell the clearest story. For instance an item that could have strong affinities to two distinct groups could end up having an apparently moderate weak relationship to them, if the tested people were split down the middle on which they associated it with. It is a clear case where good old fashioned qualitative information from talking to people is needed to make the best decision.
In my experience, how to moderate the sessions is important and can impact the results. For instance one tip I often employ (which is helpful in the situation described above where a card has two or more natural homes) is to ask the participant to place the printed card for the item where they feel it belongs most, but if they feel that it could very comfortably fit into other groups, they can take a blank card, write that item name on it, and place it in other groups they expect it could be. All copies of the card would be processed during data capture, with SynCaps V2 splitting the item between the selected groups. As William mentions, dendrograms only support a single location for each item but the split will be apparent in the items by items and items by groups charts. The split results can be considered by the practitioner in the development of the Information Architecture, perhaps as decent locations for cross links (such as 'see more' type links that take the visitor to related content in other sections).
Another moderating point to consider is the amount of verbal feedback the person is to provide during the session. The core UX method of usability testing relies on a verbal stream of consciousness from the participant as they go through their journey on a website. Personally, I feel that is not appropriate for card sorting, although I recognise verbal feedback is important, especially for understanding the category names and what items are easy or difficult to sort. I usually recommend they spread out the cards to get a bird's eye view of what they are to sort, then not disturb the participant as they see the patterns and "get in the zone", creating their own strategy for solving this particular Information Architecture conundrum. Only after they have sorted about half the cards and applied a few labels do I try to intervene with a gentle "how's it going?" type probe. The moderator should have that 6th sense that a hairdresser ideally has to be able to tell if the participant feels like talking or not, and not to force them to if they don't. Once all the cards are sorted (perhaps some in a "don't know" pile) then by all means a comprehensive debrief should be encouraged.
William gave some explanation to what I find the hardest part of card sorting – choosing what to sort when your information domain is a website with hundreds of items. Inevitably some consolidating of the items is needed, selecting only 1 or 2 representative items from what is an obviously clear group of wider items. This in itself often ends up being controversial or distracting, and always carries the risk of being used as a reason to play down the results of the card sorting (Oh yes, but you didn't include these 3 items in the sort, it could have been very different if you had...).
Finally, my preference is always try to apply the top-down method of tree testing (or reverse car sorting, or category testing) to balance the bottom-up method of card sorting. I find tree testing to be at least as useful a method to get clients to see the importance of a good, user-centred information architecture. After all, the process of tree testing is far more similar to the way people actually forage for information while navigating on a website. Furthermore the quantitative and statistical data that comes from it is very compelling especially when it can be done before and after a revision to the Information Architecture ("previously this topic was found by 50% of people without any errors, now that is up to 75%"). It can also be done remotely, and other resources in addition to the ones mentioned in the chapter include NaviewApp and UserZoom. Williams core area is card sorting, but if more could be presented on the top-down method perhaps it could be re-titled Card Sorting and Information Architecture research.
William Hudson's chapter is nonetheless comprehensive and meets the need of those new to card sorting and those with some experience. It will definitely be a valuable reference to those looking to implement this research to improve their site navigation and Information Architecture.
Premium literature on UX design
Enjoy unlimited downloads of our literature as an IDF member:
- iPad/tablet-optimized version and PDF version of all our online textbooks written by 100+ leading designers, bestselling authors and Ivy League professors.
- Self-service export to all popular formats, such as ePub.
- Pre-publication access to all textbooks - read them before everybody else.