Journal of Latex Class Files, Vol. 14, No. 8, August 2015
Scatterplots are a common tool for exploring multidimensional datasets, especially in the form of scatterplot matrices
(SPLOMs). However, scatterplots suffer from overplotting when categorical variables are mapped to one or two axes, or the same continuous variables are used for both axes. Previous methods such as histograms or violin plots for these cases aggregate marks, which makes brushing and linking difficult. To improve this, we propose gatherplots, an extension of scatterplots to manage overplotting for categorical data, while keeping individual object identities. In gatherplots, every data point that maps to the same position coalesces to form a stacked entity, thereby making it easier to see the overview of data groupings. The size and aspect ratio of data points can also be changed dynamically to make it easier to compare the composition of different groups. In the case of a categorical variable vs. a categorical variable, we propose a heuristic to decide bin sizes for optimal space usage. This means that make better use of visual
space to show the overall distribution. To validate our work, we conducted a crowdsourced user study that shows that gatherplots enable users to judge the relative portion of subgroups more quickly and more correctly than when using jittered scatterplots.