Huntsville Alabama Times scoops New York Times: sports-data sparklines
April 25, 2005 | Edward Tufte
11 Comment(s)
Lyn McDaniel, design editor of the Huntsville Times, reports the use of sparklines in their sports section on March 21, 2005:
“Here is a modest first use of sparklines that appeared in The Huntsville Times March 21.
The graph lines seem fatter because of the process of making a pdf. In reality they are about half-point, and could have been far thinner. Even 85-line presses can reproduce a hairline easily; we have not tried to find the minimum printable thickness on our 110-line press.
They were created to the same scale with Adobe Illustrator’s graphing function and imported onto the page individually to give me some flexibility in placement. Actually making the charts took only a few minutes. Freehand has a similar graphing function.”
Topics: E.T.
This is a great comparison the Huntsville Times has created, and it really pulled me in to examine the comparable trends. For example look at how Hank Aaron was so consistent over his long career, while the active players (Sosa, Bonds) started slow and improved over time — also admirable, but in this day and age also suspicious! And we can wonder how many Ruth would have it if he hadn’t pitched for 4-5 years.
However, did anyone else stumble over the “year 0” the paper added to baseline all these sparklines to the base of each respective text line? If the sparklines show the year-to-year totals of HRs at the end of each playing year, then this looks like all the players had a late season callup or something their first year, where they might have hit a homer or two. But actually Aaron hit 13 home runs his first playing year, Maris hit 14 and McGwire hit 3 (McGwire hit 49 as an “official” rookie in his second calendar year of play).
I guess it’s not as big a difference and I first thought — I thought McGwire’s 49 was in his first playing season.
But what’s the preferred way to baseline a set of data like this — tack on zeros in front the set, or marking the 0 axis in some other way, perhaps with a small dot or short line? Or is this way fine and I’m making a lot out of nothing?
I’m sorry but I don’t think these sparklines are that informative or easy to read. I know that a good graphic takes time to understand but most of Dr Tufte’s advice has always been to minimize the amount of time spent deciphering the presentation and more time analysing what the data says (i.e. minimize the obstacles between the reader and your data). In this case the sparklines lack reference points that are easily deciphered.
The principal aim of the graphic is to show differences in the patterns of play between players, some players spike and some trend. However in the absence of a y scale, it is not clear what constitutes a spike. For example, the graph for Yastrzemshi shows a twin spike, the highest point is presumably 44 but is this 44 on a baseline of 10 or a baseline of 41? I cannot tell. It could be worked out if one that the graphs have a common scale but that is a difficult task of deciphering.
Spikes and trends are determined in part by scale and when that is missing it becomes impossible to assess the data. Huff and Cleveland differ on whether a zero should be included in a y-axis but both agree that knowing the scale is important. What would make this graphic useful would be lowest and highest point labels to give an idea of the range occupied by the data in each case.
There are other problems:
The x axis is not clearly marked- it took me several steps to understand that the x-axis was career years and not a calendar years
(keep in mind that I am not American and these names are unfamiliar to me – I briefly thought they were contemporaries). A label would help.
I like the idea of sparklines but in using them we must be careful not to be so entranced by simplifying the design that we dequantify the data. If making a duck of good data is the extreme of too much decoration then the other extreme is surely turning data into a meaningless graphic by removing too much.
John Walker
The Y scale axis holds a bit more mystery. Assuming baseline is zero on each, Bond’s spike of 73 is the same height as Ruth’s peak at 60. Unless of course Ruth’s baseline is zero, and Bond’s baseline is 13, But by then I’m going cross-eyed.
Assuming a baseline of ten, here is a graph without normalized output.
You can see Bond’s spike is far more pronounced in relation to his career. Specifically being asked to compare spikes of dissimilar Y scale graphs is more like answering a trick question.
While the sparklines are nice, I’d suggest that bar charts might be a better presentation of the data for several reasons:
Matthew, the bar graphs are fantastic.
pax,
Estes
I agree. The bar graphs outperform the [spark]line graphs at the same scale. Two small changes, I think, could distill them even more.
1. Use red instead of gold as the highlight color.
2. Print the highest single season total just above the red bar and print the key “highest season total” in red – omitting the connecting lines.
Another interesting byproduct of the bar chart
is that the total careers are also easily compared.
(area under the curve)
As a non-baseball fan, it would appear that Hank Aaron
and Babe Ruth have the best most career home runs but it
looks like Barry Bonds cannot be far behind.
This was not at all visible in the Sparkline version.
I just noticed that Yahoo! Finance uses a Sparkline display on some of its pages.
(Page Link)
It’s very cool that this concept is taking hold among the “big players” of communication industries.
The New York Times sports section on June 23, 2006 displayed a sparkline-like graph showing the outcomes of the last 218 games of the New York Knicks, a basketball team. As our scale of measurement shows, the graph is 12 in (30 cm); a real sparkline would be a little over 2 in (5 cm).
The graph is nicely integrated with surrounding text, numbers, and images. The whiskers showing wins are too light compared to the whiskers showing losses. The win-whisker blue-tint is also used as a background tint immediately above, an unnecessary congruence. Wins might be shown in red not pale blue. The instructions for reading the sparkline “Below, each tick mark represents a victory or loss…” is not needed especially since the two rows of ticks are labelled “victories” and “losses”. (Why not “wins” rather the “victories”? Perhaps because “wins” is both a verb and a noun.) Those labels are not needed; every reader of the sports page will know what the whiskers mean. After all the reader is expected to understand these big words in the reports accompanying the graph:
sycophantic perpetuity passive-aggressive tenure enigma pariah dysfunctional
Both the column and the news story psychologize about the personalities of those in the team bureaucracy rather than the performances of basketball players and the competition, both of which might have something to do with the team performance. Of course not every column or news story need fully account for all the sources of variance, but these accounts seem a bit one-sided in their analysis.
The graphic reporter is not credited for her or his good work, although the columnist, reporter, and photographer are. Publicly acknowledged creatorship signals responsibility for work and also often improves the quality of work.
Hello Professor Tufte, I can take credit (as well as any blame) for the Times
graphic.
I’m not sure that the whisker lines without labels would have been legible to all readers (an
early draft without labels at far left was not immediately recognized as showing wins and
losses by two in-house readers, prompting the more prominent labels in the final version).
But I do agree that the “Below, each tick mark …” instructions are redundant. They were
only included because they were unobtrusive, and in case any skeptical readers went
looking for clarification.
The word “victories” instead of “wins” is Times style, for the reason you mention,
and I didn’t add my name to the graphic in part because the data it is based on was
readily available, and in part because the overall presentation seemed simple enough to
not merit a credit.
best,
Google now has sparklines on their analytics dashboard, see:
image – http://bp3.blogger.com/_CkizHsl86-c/RkDDFjVXmPI/AAAAAAAAAAM/iPz8euj5qYs/s1600-h/dashboard1.jpg
blog – http://analytics.blogspot.com/2007/05/new-version-of-google-analytics.html