Publication statistics

Pub. period:2008-2012
Pub. count:5
Number of co-authors:11


Number of publications with 3 favourite co-authors:

Jeffrey Pierce:
Philip J. Guo:
Joseph M. Hellerstein:



Productive colleagues

Sean Kandel's 3 most productive colleagues in number of publications:

Hector Garcia-Moli..:47
Andreas Paepcke:43
Jeffrey Heer:27

Sean Kandel


Publications by Sean Kandel (bibliography)

Kandel, Sean, Parikh, Ravi, Paepcke, Andreas, Hellerstein, Joseph M. and Heer, Jeffrey (2012): Profiler: integrated statistical analysis and visualization for data quality assessment. In: Proceedings of the 2012 International Conference on Advanced Visual Interfaces 2012. pp. 547-554.

Data quality issues such as missing, erroneous, extreme and duplicate values undermine analysis and are time-consuming to find and fix. Automated methods can help identify anomalies, but determining what constitutes an error is context-dependent and so requires human judgment. While visualization tools can facilitate this process, analysts must often manually construct the necessary views, requiring significant expertise. We present Profiler, a visual analysis tool for assessing quality issues in tabular data. Profiler applies data mining methods to automatically flag problematic data and suggests coordinated summary visualizations for assessing the data in context. The system contributes novel methods for integrated statistical and visual analysis, automatic view suggestion, and scalable visual summaries that support real-time interaction with millions of data points. We present Profiler's architecture -- including modular components for custom data types, anomaly detection routines and summary visualizations -- and describe its application to motion picture, natural disaster and water quality data sets.

© All rights reserved Kandel et al. and/or ACM Press

Kandel, Sean, Paepcke, Andreas, Hellerstein, Joseph and Heer, Jeffrey (2011): Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems 2011. pp. 3363-3372.

Though data analysis tools continue to improve, analysts still expend an inordinate amount of time and effort manipulating data and assessing data quality issues. Such "data wrangling" regularly involves reformatting data values or layout, correcting erroneous or missing values, and integrating multiple data sources. These transforms are often difficult to specify and difficult to reuse across analysis tasks, teams, and tools. In response, we introduce Wrangler, an interactive system for creating data transformations. Wrangler combines direct manipulation of visualized data with automatic inference of relevant transforms, enabling analysts to iteratively explore the space of applicable operations and preview their effects. Wrangler leverages semantic data types (e.g., geographic locations, dates, classification codes) to aid validation and type conversion. Interactive histories support review, refinement, and annotation of transformation scripts. User study results show that Wrangler significantly reduces specification time and promotes the use of robust, auditable transforms instead of manual editing.

© All rights reserved Kandel et al. and/or their publisher

Robson, Christine, Kandel, Sean, Heer, Jeffrey and Pierce, Jeffrey (2011): Data collection by the people, for the people. In: Proceedings of ACM CHI 2011 Conference on Human Factors in Computing Systems 2011. pp. 25-28.

Data Collection by the People, for the People is a CHI 2011 workshop to explore data from the crowd, bringing together mobile crowdsourcing&participatory urbanism researchers with data analysis and visualization researchers. The workshop is two-day event beginning with day of field work in the city of Vancouver, trying out mobile crowdsourcing applications and data analysis tools. Participants are encouraged to contribute applications and tools which they wish to share. Our goal is to provoke discussion and brainstorming, enabling both data collection researchers and data manipulation/analysis researchers to benefit from mutually learned lessons about crowdsourced data.

© All rights reserved Robson et al. and/or their publisher

Guo, Philip J., Kandel, Sean, Hellerstein, Joseph M. and Heer, Jeffrey (2011): Proactive wrangling: mixed-initiative end-user programming of data transformation scripts. In: Proceedings of the 2011 ACM Symposium on User Interface Software and Technology 2011. pp. 65-74.

Analysts regularly wrangle data into a form suitable for computational tools through a tedious process that delays more substantive analysis. While interactive tools can assist data transformation, analysts must still conceptualize the desired output state, formulate a transformation strategy, and specify complex transforms. We present a model to proactively suggest data transforms which map input data to a relational format expected by analysis tools. To guide search through the space of transforms, we propose a metric that scores tables according to type homogeneity, sparsity and the presence of delimiters. When compared to "ideal" hand-crafted transformations, our model suggests over half of the needed steps; in these cases the top-ranked suggestion is preferred 77% of the time. User study results indicate that suggestions produced by our model can assist analysts' transformation tasks, but that users do not always value proactive assistance, instead preferring to maintain the initiative. We discuss some implications of these results for mixed-initiative interfaces.

© All rights reserved Guo et al. and/or ACM Press

Kandel, Sean, Paepcke, Andreas, Theobald, Martin, Garcia-Molina, Hector and Abelson, Eric (2008): Photospread: a spreadsheet for managing photos. In: Proceedings of ACM CHI 2008 Conference on Human Factors in Computing Systems April 5-10, 2008. pp. 1749-1758.

PhotoSpread is a spreadsheet system for organizing and analyzing photo collections. It extends the current spreadsheet paradigm in two ways: (a) PhotoSpread accommodates sets of objects (e.g., photos) annotated with tags (attribute-value pairs). Formulas can manipulate object sets and refer to tags. (b) Photos can be reorganized (tags and location changed) by drag-and-drop operations on the spreadsheet. The PhotoSpread design was driven by the needs of field biologists who have large collections of annotated photos. The paper describes the PhotoSpread functionality and the design choices made.

© All rights reserved Kandel et al. and/or ACM Press

