Monday, January 21, 2013

Small Multiple Scatter Plots with Marginal Densities

An example data visualization in R inspired by Tufte's example on pp118-121 of Beautiful Evidence; source below the fold.

Here are a few of the design elements:
  • Small multiples: put opportunity for multiple comparisons within one eye-span
  • Use the alpha channel (transparency) to convey an indication of the underlying probability density: the more data points overlap, the darker the region appears
  • Use transparency to lighten the "heavy grid prison" of the scatter plot axes. The dark parts of the figure are the density lines and the tightly clustered data points themselves.
  • Display less redundant information than the default matrix style multiple scatter plot, and spill less ink on the 'bureaucratic' parts of the figure (on this see more below)
  • Use a bullet point list to describe the graphic ; - )
Default Data Frame Scatter Plot
The default method of plotting the data in a data frame spends lots of ink on the little frames around each plot, and has too much margin between figures and around the edges for my taste. The field is small--use it all!

The default approach plots every variable in the data frame versus every other. The upper triangular portion of the matrix is simply the transpose of the lower triangular portion, so only half the plot is actually conveying unique information. For my purposes I find being able to plot the inputs versus the outputs is more useful. Taking this tabular approach means each plot conveys a unique relationship (though I'm only plotting 6 rather than 10 of the default approach).

The R script to generate the multi-scatter plot graphic is shown below.

1 comment:

  1. I just found the seaborn python visualization library which has the pairplot function that accomplishes some pretty nice pairwise scatter plots with the univariate densities down the diagonal of the plot grid. I like this one for quick and easy visualization to get familiar with a data set.