Edward Tufte forum: Cancer survival rates: tables, slopegraphs, barcharts

All 5 books, Edward Tufte paperback $180
All 5 clothbound books, autographed by ET $280
Visual Display of Quantitative Information

Envisioning Information

Visual Explanations

Beautiful Evidence

Seeing With Fresh Eyes
catalog + shopping cart

Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $5

The Cognitive Style of Powerpoint $5

Seeing Around + Feynman Diagrams $5

Data Analysis for Politics and Policy $9
catalog + shopping cart

New ET Book

Seeing with Fresh Eyes:
Meaning, Space, Data, Truth

catalog + shopping cart

Analyzing/Presenting Data/Information
All 5 books + 4-hour ET online video course, keyed to the 5 books.

registration page

Current Topics | All Topics

Cancer survival rates: tables, slopegraphs, barcharts

-- Edward Tufte

Showing data about cancer survival rates

Many papers reported the recent findings concerning long-term survival rates of cancer patients that indicated an alternative method of looking at the data yielded more favorable rates.

Some of the news stories compared the old and new estimates for a few types of cancers. But the important issue is: What are the new estimates? The original article provided this table of relative survival rate (and standard error) for various types of cancer.

The survival cohorts consist of different people and so there are wobbles in the data, as indicated by the standard errors in the table. Thus thyroid cancer, for example, is not the road to eternal life. The journalistic allergy to tables of data in the news section (not in the sports or financial section however) denied their readers some interesting information.

Here is that table:

Source: Hermann Brenner, "Long-term survival rates of cancer patients achieved by the end of the 20th century: a period analysis," The Lancet, 360 (October 12, 2002), 1131-1135.

-- Edward Tufte

Some redesigns by ET

The original table can be redesigned to express particular aspects of the data.
Here the types of cancer are ordered by 5-year survival rates:

For most presentations, this table with its structure and reporting of standard errors will be the best way to see the cancer data. The table-graphic below, however, gives an idea of survival time gradients for each cancer. In the table-graphic and in the original table, every visual element contributes directly to understanding.

To begin to think about variability in mortality, see the essential article Stephen Jay Gould here at our site.

On to some other data display methods.

Applying the widely-used default designs for statistical graphics in PowerPoint to this nice straightforward table yields these analytical disasters below. "Sweet songs never last too long on broken radios," as John Prine wrote. The data explode into 6 separate chaotic slides, consuming several times the area of the table. Everything is wrong with these smarmy, nearly unreadable graphs: incoherent, uncomparative, low data-density, encoded legends, color without content, logotype branding, chartjunk, indifference to content and evidence. Chartjunk is a clear sign of statistical stupidity; use these designs in a presentation and your audience will rightly conclude that you don't know all that much about statistical data. Poking a finger into the eye of thought, these graphics would turn into a particularly nasty prank if ever used for a serious purpose, such as cancer patients seeking to assess their survival chances. To deal with a product that clutters and corrupts data with such systematic intensity must require an enormous insulation from statistical reasoning by Microsoft PP executives and programmers, PP textbook writers, and presenters of such chartjunk.

PP-style graphical chartjunk shows up in evidence presentations in scientific journals. Below, the clutter half-conceals thin data with some vibrating pyramids framed by an unintentional Necker illusion, as the 2 back planes optically flip to the front:

For such small data sets, usually a simple table shows the data more effectively than a graph, let alone a chartjunk graph.

Source of graph: N. T. Kouchoukos, et al., "Replacement of the Aortic Root with a Pulmonary Autograft in Children and Young Adults with Aortic-Valve Disease," New England Journal of Medicine, 330 (January 6, 1994), p. 4. On chartjunk, see Edward R. Tufte, The Visual Display of Quantitative Information (Cheshire, CT, 1983; second edition, 2001), chapter 5.

-- Edward Tufte

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

Alas the Keynote examples are as data-thin as PowerPoint. Only a few data points, no multivariate examples. Both Keynote and PP are tinker toyish.

By the way, real scientists don't show the zero point; they show the data. In general, the zero point should only be shown if it occurs reasonably near the range of the actual data. Instead of empty space vertically reaching down to a number which never occurs empirically, the way to show context is more data horizontally. Note that The Visual Display of Quantitative Information never recommends showing zero-points. See pp. 74-75 for a sequence of displays that provide increasing context by showing more data horizontally rather than reaching down to a zero point.

-- Edward Tufte

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

Interesting reply, as I expected. In biology serum pH would never sensibly include 0 in a graph. One would be dead shortly after dipping below 7.0. Perhaps we should be appreciative of graphs with thin data. That way we can see when conclusions are being drawn with insufficient data! One of the most expensive parts of a research enterprise is the gathering of the data. Ironic, isn't it, that when the analysis is well done the presentation becomes smaller and simultaneously more lucid. Perhaps there is a psychological need to display lots of graphs just to make it look like a lot has been done.

-- Steve (email)

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

I concur with the comment of David Nash. The spacing between values in the five-year survival rate column is not proportional, resulting a visual element inconsistent with the actual values. In some cases a difference of just one percent results in a larger gap than one of as much as six percent. Perhaps an additional column with the initial survival rates plotted proportionally could be added while preserving the elegance of the original graphic.

-- Jack Raglin (email)

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

I disagree.

Conveying proportional relationships between the types of Cancer is extraneous to the intent of the graphic. The graphic needs to merely show survival time gradients for each Cancer type no matter how gravely low their numbers may be.

The data stacking in order of starting percentage variables at 5 years seems incidental and an echo from the first redraw of the orginal table. In the first redraw the white space does not hint anywhere that it should convey information, so the stacking order manifests itself as convenience rather than significant data. It is difficult to guess how the original by Hermann Brenner determined its stacking order. I wondered if it was by number of patients. Such a stacking order might be more important to the trend information in terms of lives lost.

Using the percentage variable at 5 years as a stacking order could cause confusion (and it seems it has?) that the graphic should convey more than it does. The graphic could list the Cancer types alphabetically for a more impartial trend analysis where the starting percentage variable is moot.

-- Jeffrey berg (email)

Response to Cancer survival rates: tables, graphics, PP

A consistent scale does create a cumbersome chart, full of overlapping points and crossing lines:

The original table's order suggests groupings for multiples that invite further comparisons:

Many overlapping points and crossing lines disappear, and the remaining ones are clearer. Still, the cancers of the colon and rectum require additional labels.

-- Dave Nash (email)

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

The severity-drop graphs are really interesting. Which program would you use to draw one?

-- nixon (email)

Response to Graphic of the Day: Cancer Survival Rates and Redesigns, including PowerPoint

Adobe Illustrator

-- Edward Tufte

Response to Cancer survival rates: tables, graphics, PP

Wow. Or more accurately, taking a cue from the Princeton acceptance letter dialogue:

Wow!

An elegant balance between complex content -- still readily available for the looking -- and an economy of form that actually adds content. Genuinely empathetic ... it comes across as a trustworthy invitation to learn a lot very quickly about a truly forbidding subject.

It's hard to imagine that Dave Nash's most recent treatment could be much improved.

-- Doug Cleveland (email)

Response to Cancer survival rates: tables, graphics, PP

This is excellent, particularly the upper panel.

In the upper panel, could Kindly Contributor Dave Nash put the names of each cancer type at the righthand end (20 year relative mortality rate) of the lines at the right? This will make it easier to track the overlapping lines and, more importantly, show the ordering by the 20-year survival. This new column of names should be flush left. The repetitive labels, left and right, are not a problem since the ordering changes.

I prefer the flush left type of the left column, as in the original. Also the Gill Sans (tracked out) of the original reads more clearly and the Gill numbers are better. There's an investigation of Gill vs. other fonts for tabular material in the sparklines essay, posted at this board.

The gray lines should be thinner. Note the balance of weights for the numbers and the lines in the original.The underlines for the column years should be thinner and lighter.

Excellent.

-- Edward Tufte

Response to Cancer survival rates: tables, graphics, PP

I see now that the flush-left labels in the table-graphic make it easier to find the cancers, just as it would in an ordinary table. The recommendations all improve the legibility.

-- Dave Nash (email)

Response to Cancer survival rates: tables, graphics, and PP

I like your graphical presentation of the survival curves -- how did you create it?

-- Elizabeth Tracey (email)

Response to Cancer survival rates: tables, graphics, and PP

Done in Adobe Illustrator by my design assistant based on a sketch modeled after my design on page 158 in The Visual Display of Quantitative Information, which I did 25 years ago on the typewriter with pencil rulings and later paste-down rules.

These days, it could probably be done in a low-end word-processing program or a low-end drawing program with a little bit of hacking.

Adobe Illustrator is a big serious program that can do almost anything on the visual field (other than Photoshop an image). Most of my sparkline work was done in Illustrator. Fortunately all graphic designers and graphic design students have the program and know how to use it, so find a colleague who knows about graphic design.

-- ET

Response to Cancer survival rates: tables, graphics, and PP

I work closely with English Cancer Treatment Networks and am an health information analyst, and this thread is like fresh air. The missing dimension in these cancer survival graphs however is the frequency/ incidence of these cancers, and known causes.

Some of those with very high survival rates are also very common (breast cervix and prostate), whilst some of the most deadly are also thankfully very rare (pancreas, and liver). Others are both common and deadly (lung). The survival rates graphic presents all tumour sites as equally likely, which they are not.

From a communication perspective we need dimensions that articulates the commonality/ survival axis by the lifestyle component that creates the risk factor.

The viewer of the graphical information has an unarticulated question: "How likely am I to develop these cancers"? and then and only then, "How likely am I to survive them"?

Not only are a good half dozen commonly related to smoking behaviour,others are related to alcohol consumption. Others are lifestyle "blameless". To enable the graphic to go beyond graphical integrity and become useful to ordinary people to facilitate the process of change, we need some ideological dimensions too.

-- Andrew Wilk (email)

Frequency of occurrence, more specificity as to cancer type, differential rates by subgroups, and possible causes belong in follow-up text, tables, and graphics--with viewers to go to in accord with their interests. The redesigned table and graphic (the second and third displays at the top of this thread) are already carrying a substantial message. After that, readers might want to see mortality distributions over time, as in Gould's essential essay on facing a diagnosis of cancer or information about treatments, causes, specialized sources, and so on.

In talking about cancer, it is particularly important to have a clean, crisp, epidemiological, policy-relevant language concerning causality--and to blame the cause, not the particular cancer patient.

When I hear an overly definitive analysis of medical causality applied to a single individual, I ask the analyst "Where, then, is your Nobel Prize in Medicine?" Even cancers blamed on "lifestyle" (an awful euphemism) now sometimes appear to be a product of the blamed behavior interacting with certain inherent genetic factors specific to the patient. Also, as a result of triage by blame, a blamed patient may receive lower quality medical care than an unblamed patient.

In having had all too many dogs treated for cancer, I have noticed among veterinarians and in most of the writings on the subject a wonderful absence of blame, punitive metaphors, accusation, and guilt-provocation. Instead, cancer at the level of treating the dog patient is simply a very nasty problem to be thought through deeply and rationally--and, if possible, solved.

-- Edward Tufte

Gould's long tail of the distribution is exactly that, a long tail that contains only a few percent of those with cancer. And only those few are candidates for writing about the promise of the long tail of the death distribution.

For an account of some mortality over-estimates, see David Brown, Washington Post, April 1, 2007.

-- Edward Tufte

Response to Cancer survival rates: tables, graphics, and PP

The graphical representation emphasizes something strange in the data. Shouldn't the survival rates be monotonically decreasing? Yet for "Liver, bile duct" the data seem to imply that survival rates are higher at 20 years than at either 10 or 15 years.

Is this a statistical anomaly, an error, or am I misunderstanding the data?

-- Zuil Serip (email)

Response to Cancer survival rates: tables, graphics, and PP

Kindly Contributor "Zuil Serip" asks shouldn't the survival rates be monotonically decreasing? Yet for "Liver, bile duct" the data seem to imply that survival rates are higher at 20 years than at either 10 or 15 years.

All other things being equal, we would not expect a 2% resurrection rate. However, these are presumably not data for the same cohort, but %age of patients alive from those treated 5, 10, 15, and 20 year ago. Time of treatment is a covariant!

I am not a doctor or medical historian, the following is purely speculative.

The anomaly could mean that treatment outcomes for these cancers were better for those treated 20 years ago compared to 15 years ago, after insurance cost containment / HMO's kicked in. If this is the ONLY disease with better 20yr survival than 10 or 15 year, the cost containment has been less malign than widely suspected.

Conversely, *if* there was a particularly aggressive chemo/radio therapy in vogue for Liver/bile cancers 10-15 years ago, it might have done more harm than good, compared to prior practice (and hopefully current?).

The anomaly could also be statistical contamination. If some non-life-threatening disease X was 20 years ago bundled under "Liver Cancer", the blended cohort of cancer survivors and X "survivors" would have a combined mortality better than a pure cohort. Perhaps 20 years ago Cirrhosis was billed to insurers as Liver Cancer due to ignominy of Cirrhosis. Perhaps differential diagnosis is better today: some patients / cases formerly diagnosed with and treated under "Cancer [Liver, bile duct]" are now coded under a non-malignant DX code. This would result in a surge in cases (with 90%+ survival) in the new code, a reduction number of cases in that cancer code -- all of which are removed from survival %age. And drop in number of cases might be masked by rise in detection/treatment.

Many other anomalies due to improved diagnostic practice and insurance incentives to vary coding or use preferred treatments could drive a change between cohorts.

-- Bill Ricker (email)

Response to Cancer survival rates: tables, graphics, and PP

This is a belated response to Zuil's and Bill's questions about the survival rates for some cancers going up. Although my sole exposure to health statistics was a summer job in the National Center for Health Services Research many years ago, I believe the important thing here is that these are relative survival rates. I don't know the methodology, but I presume a relative survival rate is obtained by dividing the survival rate for those diagnosed with the cancer by that of a control group. I assume that the control group would be demographically the same as those diagnosed with the cancer. That would explain why the relative survival rate for prostate cancer is so high; most of those diagnosed with it are older men. When a relative survival rate goes up, it doesn't mean that cancer victims are coming back to life; it just means that some of those in the control group have died.

I found the table of survival rates to be just fine. I took a parochial interest in it two years ago this month, when I was diagnosed with non-Hodgkin's lymphoma. I looked at the survival rate and wished it was higher: it was lower than the survival rate for colon cancer, which had just killed Tony Snow. I envied a high school classmate of mine who had come down with Hodgkin's lymphoma in college and had survived it. I underwent chemotherapy (R+CHOP) from August 2008 to January 2009 and have had several negative scans since then.

-- Robert O'Rourke (email)

Is there a way to create a graphic like the cancer survival ones above using a program other than Adobe Illustrator?

-- Robert Biggert (email)

There have been some open-source projects to implement ET's slopegraphs in R, Python, and other languages. See, for example, Lukasz Piwek's excellent site documenting Tufte in R with several different libraries like ggplot2.

Even these, however, will need some editing in a program like Illustrator or InDesign for optimal design. For example, ET's recent graphs exploring Glenn Gould's Goldberg Variations:

-- Emily

Threads relevant to PowerPoint:
Apple's Keynote vs Microsoft's PowerPoint: Don't get your hopes up Don't get your hopes up. Columbia Accident Investigation Board: The Boeing PowerPoint Slide Lousy PowerPoint presentations: The fault of PP users? A look at a rich and complex question: What are the the causes of presentations? Metaphors for Presentations: Conway's Law Meets PowerPoint NASA seeks to curb "PowerPoint engineering"		New edition of "The Cognitive Style of PowerPoint" Plagiarism detection in PowerPoint presentations An intriguing but under-explored topic. PowerPoint Does Rocket Science--and Better Techniques for Technical Reports Account of the role of PP in the shuttle Columbia accident, followed by many good alternative methods and examples for technical presentations. PowerPoint and Military Intelligence Mainly recent examples of leaked PP slides in the Iraq war.