Huntsville Alabama Times scoops New York Times: sports-data sparklines

Lyn McDaniel, design editor of the Huntsville Times, reports the use of sparklines in their sports section on March 21, 2005:

"Here is a modest first use of sparklines that appeared in The Huntsville Times March 21.

The graph lines seem fatter because of the process of making a pdf. In reality they are about half-point, and could have been far thinner. Even 85-line presses can reproduce a hairline easily; we have not tried to find the minimum printable thickness on our 110-line press.

They were created to the same scale with Adobe Illustrator's graphing function and imported onto the page individually to give me some flexibility in placement. Actually making the charts took only a few minutes. Freehand has a similar graphing function."

-- Edward Tufte

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

I'm sorry but I don't think these sparklines are that informative or easy to read. I know that a good graphic takes time to understand but most of Dr Tufte's advice has always been to minimize the amount of time spent deciphering the presentation and more time analysing what the data says (i.e. minimize the obstacles between the reader and your data). In this case the sparklines lack reference points that are easily deciphered.

The principal aim of the graphic is to show differences in the patterns of play between players, some players spike and some trend. However in the absence of a y scale, it is not clear what constitutes a spike. For example, the graph for Yastrzemshi shows a twin spike, the highest point is presumably 44 but is this 44 on a baseline of 10 or a baseline of 41? I cannot tell. It could be worked out if one <assumes> that the graphs have a common scale but that is a difficult task of deciphering.

Spikes and trends are determined in part by scale and when that is missing it becomes impossible to assess the data. Huff and Cleveland differ on whether a zero should be included in a y-axis but both agree that knowing the scale is important. What would make this graphic useful would be lowest and highest point labels to give an idea of the range occupied by the data in each case.

There are other problems:

The x axis is not clearly marked- it took me several steps to understand that the x-axis was career years and not a calendar years (keep in mind that I am not American and these names are unfamiliar to me - I briefly thought they were contemporaries). A label would help.

I like the idea of sparklines but in using them we must be careful not to be so entranced by simplifying the design that we dequantify the data. If making a duck of good data is the extreme of too much decoration then the other extreme is surely turning data into a meaningless graphic by removing too much.

John Walker

-- John Walker (email)

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

The Y scale axis holds a bit more mystery. Assuming baseline is zero on each, Bond's spike of 73 is the same height as Ruth's peak at 60. Unless of course Ruth's baseline is zero, and Bond's baseline is 13, But by then I'm going cross-eyed.

Assuming a baseline of ten, here is a graph without normalized output.

You can see Bond's spike is far more pronounced in relation to his career. Specifically being asked to compare spikes of dissimilar Y scale graphs is more like answering a trick question.

-- Jeffrey Berg (email)

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

While the sparklines are nice, I'd suggest that bar charts might be a better presentation of the data for several reasons:

  • A bar chart creates a common baseline of 0 for each player and a more defined height for each season, allowing the reader to better gauge the difference in number of home runs per season between players, rather than just seeing the overall trend over the career.
  • The number of home runs hit each year is a discrete quantity that starts again at zero the next season. Line charts are better for continuous data. A line between two points tends to imply that at some time between the two points, the data being charted had that value. However, home runs are not continuous -- Roger Maris hitting 61 home runs in one season and 33 in the next does not mean he somehow hit 40-some home runs halfway between the two points being plotted on the chart.
  • Each bar clearly denotes a season. The reader does not have to guess whether a straight line between two points covers only two seasons, or if additional seasons happen to fall on the line. Plus, you can indicate the player's top season by changing the bar's color.

-- Matthew Ericson (email)

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

Matthew, the bar graphs are fantastic.



-- Estes

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

I agree. The bar graphs outperform the [spark]line graphs at the same scale. Two small changes, I think, could distill them even more.

1. Use red instead of gold as the highlight color.
2. Print the highest single season total just above the red bar and print the key "highest season total" in red - omitting the connecting lines.

-- John Morse (email)

Response to Huntsville Times (Alabama) scoops New York Times (New York) in reporting sports data with sparklines

Another interesting byproduct of the bar chart
is that the total careers are also easily compared.
(area under the curve)

As a non-baseball fan, it would appear that Hank Aaron
and Babe Ruth have the best most career home runs but it
looks like Barry Bonds cannot be far behind.

This was not at all visible in the Sparkline version.

-- Tchad (email)

Yahoo! Finance using Sparklines

I just noticed that Yahoo! Finance uses a Sparkline display on some of its pages.

Yahoo! Finance
(Page Link)

It's very cool that this concept is taking hold among the "big players" of communication industries.

-- Zack Steinkamp (email)

The New York Times sports section on June 23, 2006 displayed a sparkline-like graph showing the outcomes of the last 218 games of the New York Knicks, a basketball team. As our scale of measurement shows, the graph is 12 in (30 cm); a real sparkline would be a little over 2 in (5 cm).

The graph is nicely integrated with surrounding text, numbers, and images. The whiskers showing wins are too light compared to the whiskers showing losses. The win-whisker blue-tint is also used as a background tint immediately above, an unnecessary congruence. Wins might be shown in red not pale blue. The instructions for reading the sparkline "Below, each tick mark represents a victory or loss..." is not needed especially since the two rows of ticks are labelled "victories" and "losses". (Why not "wins" rather the "victories"? Perhaps because "wins" is both a verb and a noun.) Those labels are not needed; every reader of the sports page will know what the whiskers mean. After all the reader is expected to understand these big words in the reports accompanying the graph:    sycophantic         perpetuity         passive-aggressive         tenure         enigma         pariah         dysfunctional

Both the column and the news story psychologize about the personalities of those in the team bureaucracy rather than the performances of basketball players and the competition, both of which might have something to do with the team performance. Of course not every column or news story need fully account for all the sources of variance, but these accounts seem a bit one-sided in their analysis.

The graphic reporter is not credited for her or his good work, although the columnist, reporter, and photographer are. Publicly acknowledged creatorship signals responsibility for work and also often improves the quality of work.

-- Edward Tufte

Response to New York Knicks graphic

Hello Professor Tufte, I can take credit (as well as any blame) for the Times graphic.

I'm not sure that the whisker lines without labels would have been legible to all readers (an early draft without labels at far left was not immediately recognized as showing wins and losses by two in-house readers, prompting the more prominent labels in the final version). But I do agree that the "Below, each tick mark ..." instructions are redundant. They were only included because they were unobtrusive, and in case any skeptical readers went looking for clarification.

The word "victories" instead of "wins" is Times style, for the reason you mention, and I didn't add my name to the graphic in part because the data it is based on was readily available, and in part because the overall presentation seemed simple enough to not merit a credit.


-- Jonathan Corum (email)

