All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Beautiful EvidencePaper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
All 4 clothbound books, autographed by the author $180
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $5
The Cognitive Style of Powerpoint $5
Seeing Around + Feynman Diagrams $5
Data Analysis for Politics and Policy $9catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
What do you consider in choosing a baseline figure for the vertical amount scale of a graph? In The Visual Display of Quantitative Information (second edition), pages 68 and 74-75, I noticed that you chose nonzero baselines.
-- John Holm (email)
In general, in a time-series, use a baseline that shows the data not the zero point. If the zero point reasonably occurs in plotting the data, fine. But don't spend a lot of empty vertical space trying to reach down to the zero point at the cost of hiding what is going on in the data line itself. (The book, How to Lie With Statistics, is wrong on this point.)
For examples, all over the place, of absent zero points in time-series, take a look at any major scientific research publication. The scientists want to show their data, not zero.
The urge to contextualize the data is a good one, but context does not come from empty vertical space reaching down to zero, a number which does not even occur in a good many data sets. Instead, for context, show more data horizontally! .
-- Edward Tufte
Sometimes using a zero base line makes no sense at all. For example, a graph of the variations in a patient's temperature over time is useful only if the baseline slightly below the normal temperature of 97.3 degrees F in order to readily reveal slight changes and the trend.
-- Loren R. Needles (email)
The New York Times regularly publishes graphs depicting newsworthy changes in the stock price of selected publicly-traded companies. In one regular feature in its Financial Section, stock-price-change graphs for a dozen or so companies are shown in a single-panel, small-multiples format but each graph has--until recently--been constructed with varying baselines and y-axis scales so the extent of price variation is not clearly revealed.
The practice of showing many graphs with different scales in juxtaposition has always been vexing to me since my eye tends to be drawn to notice and unconsciously compare the magnitude of price change depicted in the trend line of each graph without adjusting for variations in the y-axis scale. If, OTOH, I try to consciously think through the significance of the depicted change from graph to graph by mentally adjusting for the observable differences on the y-axis, I find I am working way too hard and the supposed value of the visual information goes negative.
Fortunately, the NYT recently reconsidered its designs and now chooses base lines for its cluster of multiples so that the magnitude of the change depicted from graph to graph is proportional. In other words, a $1 change in the price of a $10 per share price is shown to be twice as great as a $1 change in a $20 per share price.
Economists usually show comparisons of change in long economic time series by using log scales with all the data lines shown on a single graph to assure proportional change among various time series is properly revealed. However, general interest audiences are not comfortable with that method.
-- Loren R. Needles (email)
I think that the general answer is, as ET stated, to select a baseline and scale that accurately highlights the information you need to convey. The value of the baseline isn't nearly so important as the information conveyed in the rest of the plot. You might do well to remove axis ticks and labels when initially creating your figures, then add them back in at the end of the process.
Below are some slightly more specific examples from my own work.
For control charts of in-control processes, typical published graphs have a baseline that places the lower control limit about one-fifth of the total scale up from the baseline, and a scale that places the upper control limit about one-fifth of the total scale from the top of the y-axis.
For the kind of data I often work with, I find it convenient to set vertical scales to the total expected (or acceptable) measurable range, even though that is typically rather larger than the range on the current data. This way I can see the variation of the current data set within the context of the total range that it might vary over. For instance, there may be a "floor" value of 10 and a "ceiling" (or "cut-off") value of 15, and the current data set might actually vary from 12 to 13. I can see what the variation looks like and where it falls within the overall limits. I would set the baseline to the "floor" value. There are some obvious limitations to this approach. Data that varies by less than about 10% of the total range will look artificially flat, for instance, though such a case may be ideal for a small multiple with one plot scaled to the data's range and the other plot scaled to the total range. This is rather similar to Loren Needles' example of body temperature.
In other cases, I try to ensure a baseline and scale that highlights patterns in the data, in a manner similar to the example in Visual Display of Quantitative Information of sunspot activity (if I remember correctly) scaled to highlight the sinusoidal nature of the variation, or to Loren Needles' example from the New York Times. Depending on the audience and medium, I might have to back the baseline way off from the data, or set it to the minimum data point's y-value.
-- Tom Hopper (email)