Invalid Date
A good figure should:
Here we’ll mostly look at lots of examples.
Color is one of the most frequently used aesthetics and is easy to misuse.
There are three types of color scales.
Qualitative scales are non-monotonic sets of colors.
Useful for displaying categorical variables with few levels.
Sequential scales are monotonic sets of colors spanning a color gradient.
Useful for continuous variables.
Example sequential color scale
Diverging scales are sequential scales centered at a neutral color.
Useful for continuous variables with a ‘natural’ center.
Common mistakes:
Avoid encoding more than 5 categories using color
The color scale doesn’t match the data well, since the rainbow scale emphasizes arbitrary data values. In addition, colors here are too intense.
A diverging scale is appropriate here because 50% is a natural midpoint in context.
Color vision deficiency (CVD) or colorblindness refers to difficulty distinguishing specific colors.
Some color scales still retain visible contrast for different types of color vision deficiency (CVD).
Here is a simulation (for those without CVD).
Color scale shown for different types of colorblindness using CVD simulator
Other scales get muddled.
When in doubt, use a CVD simulator to check figures
When possible, use ‘redundant coding’ – map the same variable to color and one other aesthetic.
When possible, use ‘redundant coding’ – map the same variable to color and one other aesthetic.
Redundancy provides a failsafe against any circumstance that might compromise the effectiveness of color:
You’ve already made a faceted plot.
Notice the redundant use of color!
Facets are another way to encode categorical variables when side-by-side comparisons are of interest.
The most common blunders with faceting are:
Often a big panel of scatterplots can be a useful exploratory graphic.
The figure shows a lot:
Example of facets with different y axes
Suggests, misleadingly, that Education declined by the same amount as social science and history.
Same as before, with common fixed axis scales.
One axis is fixed, one is free.
A figure from HW2
The variable of interest, Gap, is still comparable across facets. So only one axis needs to be fixed.
What would it look like if all axis scales were fixed? Would comparisons be easier or harder?
The most common blunders with regard to labels are:
For sizing, it’s important to pay attention to the balance of labels, whitespace, and graphical elements.
Usually figure defaults look fine on your IDE but render too small when graphics are exported.
These will be illegible in slide presentations, reports, etc.
These labels are legible, but still too small – they take up a minimum of space in the figure.
Unbalanced text/graphic/whitespace
Use larger labels than you think you’ll need.
Balanced
Note also the mark size is increased a bit.
Don’t overdo it.
Unbalanced again
If the figure will be reproduced in a scaled-down size, increase all sizes in proportion.
Series from NYC Life Expectancy Dropped 4.6 Years in 2020
Positive:
Negative:
Series from NYC Life Expectancy Dropped 4.6 Years in 2020
Positive:
Negative:
Series from NYC Life Expectancy Dropped 4.6 Years in 2020
Positive:
Negative:
Series from NYC Life Expectancy Dropped 4.6 Years in 2020
Positive:
Negative:
Remark:
Positive:
Negative:
Suggestions:
Graphics should avoid conflating data semantics.
In addition, they should avoid conflating observed from inferred quantities.
The starting plot in lab 3 is actually a bad plot because all years are shown together – so observationational units (countries) are not clearly distinguished.
This is tidy, because within facets:
In data exploration, it’s more important to generate lots of figures quickly than put a lot of care into details.
In developing presentation graphics, details matter.
Here is my approach to presenting a graphic. I use this for both written and oral presentations.