All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Paper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer
connected to the internet:
La représentation de l'information
quantitative 200 pages $12
La Representación Visual de Información
Cuantitativa 200 páginas $12
Visual and Statistical Thinking $2
The Cognitive Style of Powerpoint $2
Seeing Around + Feynman Diagrams $2
Data Analysis for Politics and Policy $2
catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
Portland, August 6
Bethesda, September 28
Baltimore, September 29
Arlington, October 1, 2
Indianapolis, November 9
Columbus, November 10
Cleveland, November 12
Sparkline theory and practice Edward Tufte
A sparkline is a small intense, simple, word-sized graphic with typographic resolution. Sparklines mean that graphics are no longer cartoonish special occasions with captions and boxes, but rather sparkline graphic can be everywhere a word or number can be: embedded in a sentence, table, headline, map, spreadsheet, graphic. From Edward Tufte's book Beautiful Evidence.
Diluting Perceptual Cluster/Streak Bias:
Informal, Inline, Interocular Trauma Tests
When people look at random number tables, they sees all kinds of clusters
and streaks (in a completely random set of data). Similarly, when people are
asked generate a random series of bits, they generate too few long streaks
(such as 6 identical bits a row), because their model of what is random
greatly underestimates the amount of streakiness in truly random data.
Sports and election reporters are notorious for their
narrative over-reach. xkcd did this wonderful critique:
To dilute streak-guessing, randomize on time over the same data,
and compare random streaks with the observed data.
Below, the top sparkline shows the season's win-loss sequence
(the little horizontal line = home games, no line = road games).
Weighting by overall record of wins/losses and home/road effects
yields ten random sparklines. Hard to see the difference between
real and random.
The 10 random sparkline sequences can be regenerated again and
again by, oddly enough, clicking on "Regenerate random seasons."
This is looking a bit like bootstrap calculation. For the real and amazing
bootstrap, applied to data graphics and contour lines, see Persi Diaconis
and Bradley Efron, "Computer Intensive Methods in Statistics."
The test of the 10 randomized sparklines vs. the actual data is an
(Thanks to Adam Schwartz for his excellent work on randomized sparklines. ET)
"Interocular Trauma Test" because the comparison hits the analyst right
between the eyes. This little randomization check-up, which can be repeated
again and again, is seen by the analyst at the very moment of making
inferences based on a statistical graphic of observed data.
-- Edward Tufte
Sparklines: Intense, Simple, Word-Sized Graphics
Placed in the relevant context, a single number gains meaning. Thus
The most common data display is a noun accompanied by a number.
For example, a medical patient's current level of glucose is reported
in a clinical record as a word and number:
the most recent measurement of glucose should be compared with
earlier measurements for the patient. This data-line shows the path
of the last 80 readings of glucose:
Lacking a scale of measurement, this free-floating line is dequantified.
At least we do know the value of the line's right-most data point,
which corresponds to the most recent value of glucose, the number
recorded at far right. Both representations of the most recent reading
are tied together with a color accent:
Some useful context is provided by showing the normal range of
glucose, here as a gray band. Compared to normal limits, readings
above the band horizon are elevated, those below reduced:
For clinical analysis, the task is to detect quickly and assess wayward
deviations from normal limits, shown here by visual deviations outside
the gray band. Multiplying this format brings in additional data from
the medical record; a stack, which can show hundreds of variables and
thousands of measurements, allows fast effective parallel comparisons:
These little data lines, because of their active quality over time, are
named sparklines—small, high-resolution graphics usually embedded
in a full context of words, numbers, images. Sparklines are datawords:
data-intense, design-simple, word-sized graphics.
Sparklines and sparkline-like graphs can also move within complex
multivariate spaces, as in these 9-step sequential results (reading down
the columns) in merge-sorting 5 different types of input files. Four
variables and 18,000 numbers are depicted in these small multiples.
Below, Robert Sedgewick, Algorithms in C (Reading, Massachusetts, 1998), 353.
Sparklines have obvious applications for financial and economic data—
by tracking and comparing changes over time, by showing overall trend
along with local detail. Embedded in a data table, this sparkline depicts
an exchange rate (dollar cost of one euro) for every day for one year:
Colors help link the sparkline with the numbers: red = the oldest and
newest rates in the series; blue = yearly low and high for daily exchange
rates. Extending this graphic table is straightforward; here, the price of
the euro versus 3 other currencies for 65 months and for 12 months:
Daily sparkline data can be standardized and scaled in all sorts of ways
depending on the content: by the range of the price, inflation-adjusted
price, percent change, percent change off of a market baseline. Thus
multiple sparklines can describe the same noun, just as multiple columns
of numbers report various measures of performance. These sparklines
reveal the details of the most recent 12 months in the context of a
65-month daily sequence (shown in the fractal-like structure below).
Consuming a horizontal length of only 14 letterspaces, each sparkline
in the big table above provides a look at the price and the changes in
price for every day for years, and the overall time pattern. This financial
table reports 24 numbers accurate to 5 significant digits; the accompanying
sparklines show about 14,000 numbers readable from 1 to 2 significant digits.
The idea is to be approximately right rather than exactly wrong. 1
By showing recent change in relation to many past changes, sparklines
provide a context for nuanced analysis—and, one hopes, better decisions.
Moreover, the year-long daily history reduces recency bias, the persistent
and widespread over-weighting of recent events in making decisions.
Tables sometimes reinforce recency bias by showing only current levels
or recent changes; sparklines improve the attention span of tables.
Tables of numbers attain maximum densities of only 300 characters per
square inch or 50 characters per square centimeter. In contrast, graphical
displays have far greater resolutions; a cartographer notes "the resolving
power of the eye enables it to differentiate to 0.1 mm where provoked to
do so." 2 Distinctions at 0.1 mm mean 250 per linear inch, which implies
60,000 per square inch or 10,000 per square centimeter, which is plenty.
1 On being "approximately right rather than exactly wrong,"
see John W. Tukey, "The Technical Tools of Statistics,"
American Statistician, 19 (1965), 23-28.
2 D. P. Bickmore, "The Relevance of Cartography," in J. C. Davis
and M. J. McCullagh, eds., Display and Analysis of Spatial Data
(London, 1975), 331.
Here is a conventional financial table comparing various return rates of
10 popular mutual funds: 3
This is a common display in data analysis: a list of nouns (mutual funds,
for example) along with some numbers (assets, changes) that accompany
the nouns. The analyst's job is to look over the data matrix and then decide
whether or not to go crazy—or at least to make a decision (buy, sell, hold)
about the noun based on the data. But along with the summary clumps
of tabular data, let us also look at the day-to-day path of prices and their
changes for the entire last year. Here is the sparkline table: 4
Astonishing and disconcerting, the finely detailed similarities of these
daily sparkline histories are not all that surprising, after the fact anyway.
Several funds use market index-tracking or other copycat strategies, and
all the funds are driven daily by the same amalgam of external forces
(news, fads, economic policies, panics, bubbles). Of the 10 funds, only
the unfortunately named PIMCO, the sole bond fund in the table, diverges
from the common pattern of the 9 stock funds, as seen by comparing
PIMCO's sparkline with the stacked pile of 9 other sparklines below.
In newspaper financial tables, down the deep columns of numbers,
sparklines can be added to tables set at 8 lines per inch (as in our example
above). This yields about 160 sparklines per column, or 400,000 additional
daily graphical prices and their changes per 5-column financial page. Readers
can scan the sparkline tables, making simultaneous multiple comparisons,
searching for nonrandom patterns in the random walks of prices.
3 "Favorite Funds," The New York Times, August 10, 2003, p. 3-1.
4 In our redesigned table, the typeface Gill Sans does quite well
compared to the Helvetica in the original Times table. Smaller than
the Helvetica, the Gill Sans appears sturdier and more readable, in
part because of the increased white space that results from its
smaller x-height and reduced size. The data area (without column
labels) for our sparkline table is only 21% larger than the original's
data area, and yet the sparklines provide an approximate look at
5,000 more numbers.
Finally, the practical construction of sparklines requires thinking
about their design and production:
Aspect ratio A graphic's width/height ratio makes a big difference in
displaying data. For all types of statistical graphics, the data-shape varies
as the aspect ratio varies. Below, for 6 sparklines all showing the same
data, note the substantial changes in shape as the y-scale increases by 25%
for each line while the x-scale is held constant.
How should a sparkline aspect ratio be chosen? Like a narrow ribbon, sparklines
have one long dimension and one short, as their wordlike shapes constrain their
aspect ratios. This financial sparkline is 5 to 1:
the full baseball season is 20 to 1:
the DNA chromosome sparklines run about 300 to 1.
In general, statistical graphics should be moderately greater in length
than in height. And, as William Cleveland discovered, for judging slopes
and velocities up and down the hills in time-series, best is an aspect ratio
that yields hill-slopes averaging 45°, over every cycle in the time-series.
Variations in slopes are best detected when the slopes are around 45°,
uphill or downhill. 5 To put this idea informally, aspect ratios should
be such that time-series graphics tend toward a lumpy profile (below left)
rather than a spiky profile (below right) or a flat profile. Both graphs
here show the same data. The aspect ratio for this lumpy graphic is
chosen in accord with the 45° rule.
The lumpy graphic reveals that sunspot cycles tend to rise rapidly and
decline slowly, a behavior strongest for cycles with high sharp peaks, less
strong for medium peaks, and absent for cycles with small low peaks.
None of this is visible in the graph of spikes! Cleveland's idea is essential
for sparkline displays of high-resolution time-series, such as in acoustics,
medicine, science, engineering, finance. For multiple sparklines, as in the
mutual fund data below, a global aspect ratio is obtained by averaging
over the relevant data-lines to yield an overall lumpy quality.
These considerations yield practical advice for choosing aspect ratios for
sparklines: use the maximum reasonable vertical space available under the
word-like constraint, then adjust the horizontal stretch of the time-scale
to meet the lumpy criterion. Occasionally the analytical task or character
of the data may suggest a better alternative.
5 William S. Cleveland, Visualizing Data (Summit, New Jersey, 1993), 87-91, 218-227;
William S. Cleveland, The Elements of Graphing Data (Summit, New Jersey, revised
edition, 1994), 66-79.
Unintentional optical clutter Above left, these binary-outcome sparklines
mainly show accidental arrangements of white space rather than binary
outcomes. Then, above right, a less cluttered version of the same data.
Closely spaced lines produce moiré vibration, usually at its worst when
data-lines (the figure) and spaces (the ground) between data-lines are
approximately equal in size, and also when figure and ground contrast
strongly in color value. The result is hyperactive optical clutter—for
example, below. In contrast, note the serene and cleanly differentiated
lines on this spring's new bamboo culm to the right.
Changing the relative weight of the data-lines and also muting the
contrast between data and background reduces optical noise, as these
before/after designs of sparklines suggest:
The standard method for printing color (4-color process) sometimes
produces unintentional noise when printing finely detailed material,
such as type and sparklines. In 4-color printing (cyan, magenta, yellow,
black), tiny dots of color mix together to make the desired color (for
example, cyan dots + yellow dots = apparent green). These color dots do
not align perfectly, and both type and thin lines can become gritty when
printed by conventional 4-color process, shown below.
High-quality maps avoid color dot combinations, as a close look at the
Swiss mountain map will indicate. Sparklines should be printed in a single
color, or by a judicious mix of 2 colors (magenta + yellow = red), or in flat
color (the ink itself is the desired color), or by stochastic color methods.
Areas surrounding data-lines may generate unintentional optical
clutter. Strong frames produce melodramatic but content-diminishing
visual effects. At left, the dominant visual elements are, of all things,
the strong stripes of the negative spaces between the heavy frames:
A good way to assess a display for unintentional optical clutter is to ask
"Do the prominent visual effects convey relevant content?" In the
exhibits above earning the unfortunate X, the most prominent visual
effect is usually the clutter produced by activated negative space.
Resolution of sparklines Sparklines work at intense resolutions, at the level
of good typography and cartography. Currently such intensities can be
found only on paper, film, and metal—where resolutions >1,200 dpi
are easily and inexpensively achieved. Today's computer monitors operate
at about 10% of paper's resolution, producing coarse typography in the
smaller point sizes as well as sparklines lacking in fine detail. Of course
sparklines can be displayed on computer screens but for serious work,
sparklines should be printed on paper. Combining paper's resolution
with the computer screen's interactivity is often effective.
Resolution of layouts of multiple sparklines For monitoring processes that
produce lots of data (financial trading, sporting events, control rooms,
scientific and medical analysis, system administration), sparklines should
be printed and viewed at a density of 500 sparklines on A3 size paper
(about 25 x 45 cm, or 11 x 17 in). This is the data-equivalent of about 15
large computer screens or 300 PowerPoint slides. Unlike relentlessly
sequential screens and slides, 500 sparklines on a large piece of paper are
adjacent in space rather than stacked in time. By showing vast amounts
of data within the eyespan, spatial adjacency assists comparison, search,
pattern-finding, exploration, replication, review.
Just as sparklines are like words, so then distributions of sparklines on
a page are like sentences and paragraphs. The graphical idea here is make
it wordlike and typographic—an idea that leads to reasonable answers for
most questions about sparkline arrangements.
Imagine new software or a new computer display that enormously
improved the resolution of data graphics. How wonderful and valuable
that would be. Sparklines provide such improvements by design, by
direct, public, open-source methods.
Sparklines vastly increase the amount of data within our eyespan and
intensify statistical graphics up to the everyday routine capabilities of
the human eye-brain system for reasoning about visual evidence, seeing
distinctions, and making comparisons. And data graphics are no longer
a special occasion in a separate place with a frame on some slide with a
label "Fig. 17-B". Sparklines are everywhere. With resolutions 5 to 100
times conventional graphics and tables, sparklines can help us learn from
the flood of numbers produced by modern measurement, monitoring,
and surveillance technologies. Providing a straightforward and contextual
look at intense evidence, sparkline graphics give us some chance to be
approximately right rather than exactly wrong.
-- Edward Tufte
Multiple Sparklines and Bars; Implementation Example from Europe
Below you will see our implementation of your sparkline idea into a standard software application for data analysis. We have always been very impressed of your rich examples and ideas for visualization. The example shows data of a pharmaceutical manufacturer. The chart helps in comparing the performance of their sales districts.
having inhaled your books I set up a list of several dozen improvements we are going to implement. We would appreciate and look forward to any comment or feedback from you.