Corrupt Techniques in Evidence Presentations
Here are a few pages from a draft of a Beautiful Evidence chapter (which is 16 pages long in the published book) on evidence corruption, along with some of the comments on this draft material.
The emphasis is on consuming presentations, on what alert members of an audience or readers of a report should look for in assessing the credibility of the presenter.
-- Edward Tufte
Posner on Blink
An interesting and powerful critique of an evidence presentation is Richard A. Posner's
review of Malcolm Gladwell's book Blink: The Power of Thinking without Thinking:
Posner makes me seem bland!
-- Edward Tufte
Advocacy vs. evidence
Edward, you certainly make me take pause with this piece. Over the holiday break I completed the last two chapters of an argumentation book where I espouse many of the ideas that you work through.
There must be some middle ground. The allure of bullet points is not intrinsic to the uninformed or the unintelligent. We all like the simplicity of the ability to condense large points of view into small aesthetically pleasing chunks.
Where I agree with you is that as a society we are moving toward this pole without any consideration of the long-term implications of this synthesis. I caution my readers about the harms of enthymematic thought and communication, but I still believe that we must find a way to simplify information to a level that our audience can process.
Where we also seem to part ways is on the notion of advocacy. Why is it not appropriate for us to "cherry pick"? Suggesting that it is somehow less than ethical for a person to present evidence that support their position is not the real issue. I agree that mere sophistry where one tries to inlfuence another to do something where the evidence counterindicates is unethical, but providing support for a valid ethical position does not appear to be prima facie unethical, and in fact, it seems pretty reasonable. Aren't we all bound to the same marketplace of ideas where someone with an equally reasonable opposing position can attempt to influence an audience the other direction. To suggest that one person must do both seems to be a burden that few could adequately shoulder. Can you (as a respected expert/author) live with the implications of that point of view?
Occassionally it helps to look to our ancestrial roots to determine whether something is intrinsically "human" and I just cannot imagine such cognitively complex dialogue before picking one hunting ground over another. It seems that perhaps brute force was the compelling factor...the analogy today is for us to speak louder, and loudness is a often a metaphor for the use of media.
I'll ask my grad students to read and contemplate the chapter. Perhaps we'll find a dialectic position that allows us both a way out of this dilemma.
Thanks for considering this. What you are doing here is nothing short of revolutionary, and just plain smart. Hats off. Chris.
-- Chris Crawford (email)
Chris Crawford raises an important point made well by Darwin: we obtain data to
questions, and in consequence the data collected are biased - for worse or better - by the
About 30 years ago there was much talk that geologists ought to
observe and not to theorize; and I well remember someone saying that at this rate a man
might as well go into a gravel pit and count the pebbles and describe the colors. How odd
it is that anyone should not see that all observation must be for or against some view if it
is to be of any service!
-- Alex Merz (email)
Passive voice useful at times
I also think that, at least in scientific writing, the passive voice has [;-)] unfairly acquired a
bad rap. Is it really better to say that "we injected 0.625 ml of saline into CL6
mice," than to say that "CL6 mice were injected with 0.625 ml of sterile saline?" After all,
we are not the subject of the experiment, the mice are. What's more, scientists often say
that "we" performed experiments when in fact the experiments were performed by people
(technicians, students) who are not named authors on the paper. Through use of the active
voice, an account of experimentation becomes a tale of experimenters. I am not convinced
that the "hyperactive voice" facilitates either the clear description of experiments or the
accurate analysis of results. What the hyperactive voice may promote is the culture of
scientific celebrity, in which personality and reputation outpace experimentation and
-- Alex Merz (email)
Spotting problems by spotting passive verbs
the "hyper active voice" would be just as wrong, if those scientists didn't do themsleves. The correct wording would be "students injected the mice with ..." That way you find likely sources of error.
Years ago when I worked on rockets, an easy way to spot a problem was when the speaker switched from active voice "we tested the circuit" to passive voice "a circuit was tested." Nine times out of ten, there was a problem on the latter test. Sub-contractor was always amazed about how we zeroed in on problem areas.
-- Chris (email)
Who did what in scientific work
A good trend in scientific reports is the addition of a note like the one at the end of this article:
Author contributions: MG and GE designed the study. AW, MG, DKG,
SFS, PN, and JEC performed the experiments. AW, MG, DKG, PS, GKM, TV, KK, JEC, and GE
analysed the data. GKM, TV, HC, KW, IA, WG, and YJKE contributed reagents/materials/
analysis tools. AW, GKM, JEC, and GE wrote the paper.
Only a few journals are actively promoting this practice, which should be standard for
[link updated February 2005]
-- Alex Merz (email)
What did Darwin mean?
Alex Merz has raised several interesting points in the past few days.
He used a quotation from Darwin that also appears in a book that I was editing 15 or so years ago, and although I agree with the interpretation that I think Alex puts on it, it is worth noting that another view is possible: the person to whom the remark was directed in my book replied as follows:
They misconstrue [Darwin's] meaning when they add that if anyone believes that there is a significant role in science for pure observation uncontaminated by any theory or hypothesis he is disagreeing with Darwin. A more inappropriate opponent of the importance of pure observation could hardly be imagined. On the basis of observations uncontaminated by any theory or hypothesis, Darwin was led to the most important unifying generalization in biology, and perhaps the most important contribution to scientific understanding of all time.
What I think he meant by this was that the relevance of Darwin's observations to the theory of natural selection was only evident in hindsight, and they weren't gathered for that purpose. I think this view was probably mistaken, at least in part, but given that it came from one of the most distinguished microbiologists of the century I'm reluctant to dismiss it out of hand.
Alex continues by saying that he thinks that the passive voice has acquired a bad rap in scientific writing. Again, I agree basically with what he says, but I think that a lot of the problems arise from a false perception that an objective scientific style is somehow different from ordinary style, and in general I don't think it is. There are many contexts in everyday writing where the passive sounds better than the active, primarily when the target of an action is more important or interesting or familiar than the agent. Which would a journalist write: "In November George W. Bush was reelected President of the United States", or "In November the people of of the United States reelected George W. Bush as President"? In most contexts the person reelected is more significant than the people who did the reelecting, and to my mind it makes a stronger sentence to put him first. As far as I can see the same considerations apply to scientific writing. The main difference (and possibly the main reason for greater use of the passive), is that the target of the action is more often the point of interest in scientific writing than in everyday writing.
I was taught at school in chemistry and physics classes to write reports exclusively in the passive, but I abandoned this injunction many years ago, initially for a half-way stage in which I wrote "we" even if I was the only author. I think this use of "we" is still quite common among authors who don't restrict themselves to the passive, but it has also become much commoner than it used to be for single authors to refer to themselves as "I". Nowadays I find it quite unnatural to use "we" instead of "I" unless the sentence is clearly including the reader as subject as well as the author. Thus "we [meaning I] used reagents from Merck" sounds unnatural to me, whereas "we [meaning you and me together] can readily deduce from this equation that ..." sounds OK.
Nowadays, of course, single-author papers have themselves become a rarity, but that's another story, bringing us to Alex's last point (last for the moment, anyway; I mean the one of 27th January). Long before it became the practice for journals to encourage statements of who did what I was thinking that if I were editor-in-chief of a journal like Nature or Science (something very unlikely to come about), I would not only encourage but would require such information to appear in all papers with more than three or four authors. Incidentally, the link in Alex's post doesn't work, but no matter, we know that such statements exist. I've only once (as far as I remember) been an author of an article with many authors, and on that occasion the first six authors appeared out of alphabetical order, and the last 36 (including me) followed them in alphabetical order. Although in that instance there was no actual statement about who did what it wouldn't take a genius to work out who were the six people who actually wrote it and who were the 36 hangers-on.
-- Athel Cornish-Bowden (email)
Latour big think
Bruno Latour explains the use of passive voice in scientific communications in both
"Laboratory Life" and "Science in Action." He calls the phenomenon "modality:" the idea is
to make nature the significant player in science, not the scientist; hence, the elimination of
all pronouns. Not "I observed a temperature" but "a temperature of x was observed."
Modality is a spectrum that connects the earliest activities in a laboratory to encyclopedia-
solid facts. Changing it is not as simple as just insisting upon active voice; it would mean
changing some of the deepest assumptions underlying scientific activity.
-- Mark Hineline (email)
It's more complicated than that
Anent Mark Heinline's statement - Not "I observed a temperature" but "a temperature of x was observed." - this actually raises a clutch of issues. In many scientific experiments, no direct observation takes place -- instead, some instrument is read or assessed. So the question of "who is the actor" can get a bit confused.
If I say "the temperature was X", then it seems to me I am saying one thing, whereas if I say "the thermometer registered the temperature as X" I am saying something else, and if I say "I observed a temperature of X", something else again, and finally "a temperature of X was observed" says yet again something else.
In the first case, I am making a direct statement about the phenomenon, in the second, I am making a statement about the instrument used to measure/display the phenomenon, in the third I am making a statement about the observer of the phenomenon [for example, I could have been wrong, was in a hurry, had too much to drink, etc., so that the report of the phenomenon may reflect my observation accurately, but not in fact report the phenomenon accurately], while in the last I am almost smuggling in the notion that the observation is more "universal" than the third case -- nearly closing the circle to have the connotation of the first case.
All of this is further confused by, perhaps implicit assumptions -- so that the 'scientific' passive really means 'the people who wrote this paper did x, y, z', whereas the normal passive really does connote something to which we can reasonably object.
Of course sometimes the shoe fits: "It was a dark and stormy night..." -- the night didn't *do* anything to be dark and stormy.
-- John Howard Oxley (email)
The play on words (i.e., pun) does merit review; overall, I found this piece structured, analytical, articulate and reader friendly. This writing is noteworthy (and dare I add "genius"?).
-- Colleen Salgado (email)
New draft now posted at top of thread, ET replies
Another new draft posted. This is very close to the final draft. "Puns" and "economisting" will stay, the passive voice claims are focused, a new case study on evidence selection in published statistical tests appears in a long sidenote 5. And a beautiful thought from Nero Wolfe in the epigraphs: "Once the fabric is woven it may be embellished at will." I've also made many minor changes.
Students of the passive voice may wish to consult Artful Sentences: Syntax as Style by Virginia James Tufte (Distinguished Professor Emerita of English at the University of Southern California), a book soon to be published by Graphics Press. The book contains a long discussion on the uses of the passive voice.
Several contributors sought a deeper analysis of PP. (Such requests appear disingenuous when coming from a Microsoft employee.) At any rate, there are numerous threads on this board on PP, and of course there's my essay "The Cognitive Style of PowerPoint."
I appreciate your comments.
Thank you especially Jake Gibson for (1) casually dropping the phrase "speaking of German etymology" in his contribution, and (2) for pointing out (indirectly via a polite link) that "mist" is the German word for bullshit, as in "economisting," with accents on the "con" and "mist." What a fortunate coincidence. The English meaning of "mist" already does excellent service. After all, according to the authoritative Soul Future Dream Dictionary, the dream symbol "mist" means "Lack of clarity, having clouded vision, lack of foresight, a cover up, a homonym for missed." Speaking of homonyms, Alan Cooper, the great interface theorist and designer, has an excellent site "Alan Cooper's Homonyms" at
-- Edward Tufte
More on use of "pun"
Regarding the "Pun" problem:
I don't think anyone has offered a solution that already exists in the Enlgish language and therefore does not rely on any linguistic "Trix"... Historian David Hackett Fischer refers to "The Fallacy of Equivocation" wherein "a term is used in two or more senses within a single argument, so that a conclusion appeas to follow when in fact it does not" (Fischer, Historians' Fallacies, New York: Harper & Row, 1970, p. 274).
Equivocation seems to be precisely the fallacy involved in the use of "value" cited in your sample chapter. As Fischer notes, somtimes equivocation is intentional, sometimes unintentional (being merely the result of muddy thinking) -- but in either case, its presence serves to cast doubt on the reliability of an author and call his or her argument into question.
-- Steve Tillis (email)
Excellent idea, exactly on point; I'll take a look at Historians' Fallacies.
-- Edward Tufte
Reasoning by sloppy analogy
One corrupt technique that I see see or hear in business presentations is "reasoning via analogy". This is where the presenter introduces an analogy -- masquerading as an aid to comprehension -- but then goes on to reason about the subject using the analogy rather than any evidence or analysis.
A (paraphrased and bowdlerised) example: "New product revenue at company X is like a rollercoaster ride. After the last quarter's sharp ramp we expect flat to down new product revenue this quarter". While historical analysis of new product revenue may suggest that it will continue to oscillate, no such evidence is presented -- nor is it explained why the past is a good predictor of the future.
-- Mathew Lodge (email)
Dequantification to smooth results
I enjoyed the discussion of Galenson's methods for dequantifying his evidence. I
thought I would share a graphic I made to demonstrate what I think is a double standard
in the convention for visually "quantifying" categorical versus continuous data sets:
All of these panels show the relationship between one independent (predictor)
variable and one dependent (response) variable, with varying levels of quantification. When
showing the relationship between a continuous predictor and a continuous response
(upper panels), I most often see scatterplots that have both the fitted equation and the
underlying data points (top right). But for categorical predictors (lower
panels), I most often see bar graphs with standard errors (bottom center). Bar graphs show
only the fitted means (or other measures of central tendency) without showing the
underlying data. Furthermore, standard errors are only effective if there is a large,
normally distributed sample. (On page 223 of The Elements of Graphing Data,
William Cleveland discusses other reasons why standard errors are graphically
suboptimal.) These problems can influence interpretation. For example, in
the more quantified bottom right panel, we can see that the distribution of values for
'small sagebrush' may be skewed toward higher values.
Galenson's dequantification is more flagrant, but dequantification of categorical data
is more common. Because I believe that the dequantification of categorical data impedes
interpretation--just as E.T. showed for Galenson's continuous data--I would be excited to
see this convention overturned.
-- Anthony Darrouzet-Nardi (email)
An excellent demonstration by Kindly Contributor Anthony Darrouzet-Nardi. Showing only the path of the category means is often the sign of a serious problem in the analysis; at a minimum, it is a failure to report variability and the sensitivity of the analysis to outliers. Every editor of every scientific journal should read this contribution and act on it, by insisting that averages must be accompanied by detailed statements of variability. And the reporting of smooth curves only (without any actual data points) has always been suspicious. I believe that Professor David Freedman (statistics, Berkeley) has discussed these matters in his devastating critiques of statistical analysis.
Tukey's box-and-whiskers plot is an effort to provide a responsible summary; the
bottom figures could also be represented by 3 box-and-whiskers-plots in parallel. See
The Visual Display of Quantitative Information, pp. 123-125, for such an example (and for
my redesign of box-and-whiskers plots).
-- Edward Tufte
Re: puns and other terms for this concept. In federal criminal law (where I toil daily) there is a term of art used to describe the phenomenon of a single charge (count) being used to describe separate offenses (counts). The term is "duplicity."
A duplicious indictment has lumped more than one offense into a single charge, making it difficult for the defense to raise a proper response. (The opposite, by the way, is "multiplicity," where there are multiple counts, but only one offense.)
Punning, in this context, is employing a single word to cover many meanings: a duplicity.
-- Jeffry Finer (email)
Duplicity vs. pun
Duplicity is far and away the best word to describe this dubious technique. Especially as it
it the root of duplicitous practice. Or is that punning? While it is a technical legal term, a
quick check of the OED gives (IMHO) Prof. Tufte meaning up as
1. The quality of being 'double' in action or conduct (see DOUBLE a. 5); the character or
practice of acting in two ways at different times, or openly and secretly; deceitfulness,
double-dealing. (The earliest and still the most usual sense.)
-- Jim Reid (email)
Kindly Contributors Jim Reid and Brent Riggs:
"Duplicity" is an interesting word, catching a sense of punning as in double-dealing,
although the cumulative pun in the art history piece engages in multiplicity. The word
"duplicity" probably should show up at least once in a revised draft.
No one, until Brent Riggs, caught the typo on page 6. Thanks.
Evidence selection in global warming studies (and the non-warming critiques) is an
interesting topic but requires a big piece of research first. Until that work is done, most
postings will probably reflect prior views rather than fresh analysis.
Another topic where an analysis of evidence selection would be helpful is in the evidence
presented for new drugs. For the cox-2 painkillers, the evidence system appears to have
been able to detect small differences (improvements of treatment vs controls of, say, 10 or
20 percentage points) favoring the new drug yet was unable, until years later, to detect large differences (2-fold to 5-fold) in some serious and common side effects.
I hope readers will take a look at the new long sidenote on page 6, on the distribution of
published statistical tests. This illustrates again the effectiveness of looking at a series of
studies to detect cherry-picking.
-- Edward Tufte
From today's New York Times, on the FDA advisory group hearings on cox-2 pain pills:
"The panel's decisions were not good news for Pfizer, despite the rise in its stock. The
company had come into the advisory committee with a strategy to deny almost entirely the
notion that Celebrex increased the risks of heart attacks and strokes and to suggest that
any evidence linking Bextra to such risks was shaky and irrelevant.
This strategy seemed to backfire. Panelists were incredulous that Pfizer's presentation did
not include any information about a large federally sponsored trial in which patients taking
Celebrex had more than three times as many heart attacks as those given a placebo.
"You're telling us that you don't have data that you published two days ago in The New
England Journal?" Dr. Wood asked.
Dr. LaMattina said the panel's skepticism was unfair. In one case, Pfizer excluded data
from one study because the F.D.A. said that that study would be discussed by someone
else, he said."
-- Edward Tufte
I remember Gerry Spence's graphical illustration of prosecutorial cherry picking (presented on CNN during the OJ prelim trial). Spence set up a table with about 70 white Dixie cups and around 10 randomly placed red Dixie cups. He said they represented all the evidence in a case, then he ham-handedly shoved all the white cups off the table and said "See, the cups are all red!"
-- Mark Bradford (email)
Grading oneself not good evidence
In this article from yesterday's New York Times on evaluating medical computer systems, note especially:
"Another article in the journal looked at 100 trials of computer systems intended to assist physicians in diagnosing and treating patients. It found that most of the glowing assessments of those clinical decision support systems came from technologists who often had a hand in designing the systems. In fact, 'grading oneself' was the only factor that was consistently associated with good evaluations," observed the journal's editorial on computer technology in clinical settings, titled "Still Waiting for Godot." "
Here is the full article:
March 9, 2005
Doctors' Journal Says Computing Is No Panacea
By STEVE LOHR
The Bush administration and many health experts have declared that the nation's health care system needs to move quickly from paper records and prescriptions into the computer age. Modern information technology, they insist, can deliver a huge payoff: fewer medical errors, lower costs and better care.
But research papers and an editorial published today in The Journal of the American Medical Association cast doubt on the wisdom of betting heavily that information technology can transform health care anytime soon.
One paper, based on a lengthy study at a large teaching hospital, found 22 ways that a computer system for physicians could increase the risk of medication errors. Most of these problems, the authors said, were created by poorly designed software that too often ignored how doctors and nurses actually work in a hospital setting.
The likelihood of errors was increased, the paper stated, because information on patients' medications was scattered in different places in the computer system. To find a single patient's medications, the researchers found, a doctor might have to browse through up to 20 screens of information.
Among the potential causes of errors they listed were patient names' being grouped together confusingly in tiny print, drug dosages that seem arbitrary and computer crashes.
"These systems force people to wrap themselves around the technology like a pretzel instead of making sure the technology is responsive to the people doing the work," said Ross J. Koppel, the principal author of the medical journal's article on the weaknesses of computerized systems for ordering drugs and tests. Dr. Koppel is a sociologist and researcher at the Center for Clinical Epidemiology and Biostatistics at the University of Pennsylvania School of Medicine.
The research focused on ways that computer systems can unintentionally increase the risk of medical errors. The study did not try to assess whether the risks of computer systems outweigh the benefits, like the elimination of errors that had been caused by paper records and prescriptions.
Yet Dr. Koppel said he was skeptical of the belief that broad adoption of information technology could deliver big improvements in health care. "These computer systems hold great promise, but they also introduce a stunning number of faults," he said. "The emperor isn't naked, but pretty darn threadbare."
Another article in the journal looked at 100 trials of computer systems intended to assist physicians in diagnosing and treating patients. It found that most of the glowing assessments of those clinical decision support systems came from technologists who often had a hand in designing the systems.
"In fact, 'grading oneself' was the only factor that was consistently associated with good evaluations," observed the journal's editorial on computer technology in clinical settings, titled "Still Waiting for Godot."
The principal author of the editorial, Dr. Robert L. Wears, a professor in the department of emergency medicine at the University of Florida College of Medicine in Jacksonville, said the message from the research studies was that computer systems for patient records, the ordering of treatments and clinical decision support have not yet shown themselves to be mature enough to be useful in most hospitals and doctors' offices.
"These systems are as much experiments as they are solutions," said Dr. Wears, who also holds a master's degree in computer science.
The medical journal's articles, according to some physicians and technology experts, tend to be too broad in their criticisms because the technology is still developing rapidly and some of the computer systems reviewed were old.
Still, even those experts conceded that the articles raised some good points.
"They are absolutely right that the people who design these systems need to be in tune with the work," said Dr. Andrew M. Wiesenthal, a physician who oversees information technology projects at Kaiser Permanente, the nation's largest nonprofit managed care company. "But the newer systems are designed more that way."
Dr. David J. Brailer, the administration's national coordinator for health information technology, termed the articles a "useful wake-up call," though he said the findings were not surprising. In health care, as in other industries, he said, technology alone is never a lasting solution.
"The way health information technology is developed, the way it is implemented and the way it is used are what matter," Dr. Brailer said.
But Dr. Brailer did take issue with the suggestion that the Bush administration is encouraging a headlong rush to invest in health information technology.
For the next year, he said, his policy efforts will be to try to encourage the health industry to agree on common computer standards, product certification and other measures that could become the foundation for digital patient records and health computer systems.
"We're not ready yet to really accelerate investment and adoption," Dr. Brailer said. "We have about a year's worth of work."
Dr. David W. Bates, medical director for clinical and quality analysis in information systems at Partners HealthCare, a nonprofit medical group that includes Massachusetts General Hospital and Brigham and Women's Hospital, said careful planning and realistic expectations were essential for technology in health care.
"But the danger is if people take the view that computerized physician order entry and other systems are a bad idea," said Dr. Bates, who is a professor at the Harvard Medical School. "That would be throwing out the baby with the bath water."
External link to the original article
-- Edward Tufte
More on overreaching. Here is an interesting supplement to Steven Weinberg's argument
concerning metaphor vs. implication: "Does Gödel Matter? The romantic's favorite
mathematician didn't prove what you think he did," by Jordan Ellenberg at
-- Edward Tufte
Why most published research findings are false
For support for the chapter above, see " Why Most Published Research Findings Are False,"
by John P. A. Ioannidis from the Public Library of Science: http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10%2E1371%2Fjournal%2Epmed%2E0020124
The title of the article overreaches.
Note how problems such as selection effects and other biases are revealed over a series of
The initial argument is based on a little model; the intriguing but speculative corollaries
have some support based on a selected
history of research literature in various medical fields. I think most of the corollaries are
correct at least in the particular cases cited and are probably correct overall for many
fields of study. And the paper has a lot more systematic evidence, via the citations, than
my chapter. Lurking in the background is the powerful work of Tom Chalmers, whose
remarkable table appears in the chapter.
The powerful defense of the scientific process is that it all eventually works out over a
series of studies, at least for important matters. What emerges ultimately from the process
is the wonderful capacity to identify what is true or false. So at least the truth may come
out eventually. The article exemplifies this: to be able to say what is false requires
knowledge of what is true.
For medical research, with enormous and demonstrable problems of false reports, there
are some big costs along the way to ultimate truth. Induced to take a harmful drug based
on biased and false evidence vigorously marketed, medical patients may not be consoled
by the fact that the truth about the drug will discovered in the long run. Medical patients
may feel they have better things to do than being part of medicine's learning and
Note the great virtues of the Public Library of Science here: the study is out
quickly, publicly and freely available, with the citations linked to online. See http://medicine.plosjournals.org/perlserv/?request=get-document&doi=10%2E1371%2Fjournal%2Epmed%2E0010031
-- Edward Tufte
Detecting scientific fabrication
Matching image fingerprints helped detect fraud in this account of the Woo Suk Hwang
And the incipient retraction of 2 papers by the editor of Science:
The fabrications by Jan Hendrik Schon, the Lucent Bell Labs researcher, were in part given away by duplicative graphs:
And the full internal Lucent report which is very interesting:
As someone who's refereed a lot of papers in the social sciences and statistics, I think is
difficult to detect wholesale fabrication on the basis of evidence that comes with the
manuscript itself. Fabrication is not the obvious competing hypothesis, well until recently,
to the hypothesis advanced by the submitted article. Perhaps one principle that would
apply to both these cases above is that extraordinary findings require extraordinary
reviews, possibly on-site reviews of the research, before publishing. Of course
extraordinary findings will doubtless receive extraordinary reviews after publication,
thereby eventually detecting fabrication. Big-time fabrication of extraordinary results will
consequently be caught fairly quickly; fabrication in trivial studies maybe not ever because
nobody cares. Recall that the median citation rate for published scientific papers is 1.
Pre-publication reviews within the authors' research laboratories would be useful in detecting problems; there are certainly greater incentives for internal pre-publication reviews nowadays.
And what must be the elaborate self-deceptions of the fabricators as they slide down the
slippery slope of fabrication, since in both these cases above the fudging appears to have lasted over several years in a series of papers?
-- Edward Tufte
In "Surely You're Joking, Mr. Feynman", Feynman talks about his revealing a new Mayan codex was indeed a fake ...
"This new codex was a fake. In my lecture I pointed out that the numbers were in the style of the Madrid codex, but were 236, 90, 250, 8 - rather a coincidence! Out of the hundred thousand books originally made we get another fragment, and it has the same thing on it as the other fragmetns! It was obviously, again, one of these put-together things which had nothing original in it."
Feynman then goes into the psychology of how one could go about faking a discovery.
"These people who copy things never have the courage to make up something really different. If you find something that is really new, it's got to have something different. A real hoax would be to take something like the period of Mars, invent a mythology to go with it, and then draw pictures associated with this mythology with numbers appropriate to Mars - not in an obvious fashion; rather, have tables of multiples of the period with some mysterious "errors," and so on. The numbers should have to be worked out a little bit. Then people would say, "Geez! This has to do with Mars!" In addition, there should be a number of things in it that are not understandable, and are not exactly like what has been seen before. That would make a good fake."
-- Michael Round (email)
Example of cherry picking
The Telegraph published Bob Carter's There IS a problem with global warming... it stopped in 1998 in April of 2006. Without going too deeply into the article or the politics of global climate change, Carter states in the first paragraph: "Consider the simple fact, drawn from the official temperature records of the [Climatic] Research Unit at the University of East Anglia, that for the years 1998-2005 global average temperature did not increase". Contrast this statement with the cited data as published by the Climatic Research Unit (the dark curve is a moving average).
-- Patrick Mahoney (email)
Funding our conclusions
From PLoS Medicine,
Here's the summary of the article, which replicates similar results in drug research:
"Relationship between Funding Source and Conclusion among Nutrition-Related Scientific
Lenard I. Lesser1??, Cara B. Ebbeling1, Merrill Goozner2, David Wypij3,4, David S. Ludwig1*
1 Department of Medicine, Children's Hospital, Boston, Massachusetts, United States of
America, 2 The Center for Science in the Public Interest, Washington, D. C., United States
of America, 3 Department of Cardiology, Children's Hospital, Boston, Massachusetts,
United States of America, 4 Clinical Research Program, Children's Hospital, Boston,
Massachusetts, United States of America
Industrial support of biomedical research may bias scientific conclusions, as demonstrated
by recent analyses of pharmaceutical studies. However, this issue has not been
systematically examined in the area of nutrition research. The purpose of this study is to
characterize financial sponsorship of scientific articles addressing the health effects of
three commonly consumed beverages, and to determine how sponsorship affects
Methods and Findings
Medline searches of worldwide literature were used to identify three article types
(interventional studies, observational studies, and scientific reviews) about soft drinks,
juice, and milk published between 1 January, 1999 and 31 December, 2003. Financial
sponsorship and article conclusions were classified by independent groups of
coinvestigators. The relationship between sponsorship and conclusions was explored by
exact tests and regression analyses, controlling for covariates. 206 articles were included
in the study, of which 111 declared financial sponsorship. Of these, 22% had all industry
funding, 47% had no industry funding, and 32% had mixed funding. Funding source was
significantly related to conclusions when considering all article types (p = 0.037). For
interventional studies, the proportion with unfavorable conclusions was 0% for all industry
funding versus 37% for no industry funding (p = 0.009). The odds ratio of a favorable
versus unfavorable conclusion was 7.61 (95% confidence interval 1.27 to 45.73),
comparing articles with all industry funding to no industry funding.
Industry funding of nutrition-related scientific articles may bias conclusions in favor of
sponsors' products, with potentially significant implications for public health.
-- Edward Tufte
"A Devout Wish"
Talking, rational, telepathic animals one more time. See the article on the African Grey Parrot by Robert Todd Carroll, The Skeptic's Dictionary:
Richard Feynman's principle "The first principle is that you must not fool yourself--and you are the easiest person to fool" might be revised to cover the propagation of foolishness:
"The first principle is that you must not fool yourself (and you are the easiest person to fool), but also that you must not attempt to fool others with your foolishness."
[Also posted to the analytical reasoning thread.]
-- Edward Tufte
See David Sanger's article
on how reporters are handling leaks and intelligence briefings in light of getting fooled in
From the story: "Nicholas Lemann, the dean of Columbia Journalism School,
teaches a course called "Evidence and Inference," and says he is now "hammering into the
head of everyone around here that when someone tells you something, you have to say,
`Walk me through how you came to your conclusion.' "
-- Edward Tufte
I have been teaching bioscientists for the past 15 years how to conduct QUANTITATIVE
experiments using microscopes. The technical solution is easy (stereology) but the
persistent problem we have is pareidolia (wikipedia describes this as "involving a vague and
random stimulus, often as image or sound, being perceived as significant"). People read
things into 2D images of 3D structure that either cannot be or are not there. How does
pareidolia affect the practice of information visualisation and display?
-- MR (email)
MR, speaking as a medical student at the end of my second year . . .
Let us say you want them to recognize hof (a cytoplasmic clearing near the nucleus of a plasma cell where ribosomes are transcribing RNA to make antibody proteins) and the amount of hof as an indication of a plasma cell's activity and maturation. There are other clear things in the cytoplasm and there are both basophilic and acidophilic things that may obscure the clearing. If I'd had my druthers for the last two years, I think I would have liked an introduction by way of particular-general-particular, and then a problem to be solved at the next level, in which analysis of hof is but one on a prerequisites to the solution is fairly effective. In this case, particular - general - particular takes the form of introducing hof by way of a single, exemplary specimen followed by a plethora of also-rans with their own perturbations in rapid succession, and returning to the exemplar. Google images is excellent for this sort of thing, and slideware makes it easy to stack 100 slides with 100 full-frame pictures.
|6 plasma cells in a peripheral blood smear exhibiting varying degrees of hof (click to see larger)|
The problem could be a case study from presentation to treatment of a plasma cell dyscrasia, which might require grading the dyscrasia. One's diagnosis would be added by recognizing hof, and less average hof is likely not such a good thing.
A couple of ready statements to have at hand: "Only a few cells in any given prep are really good examples, so when you're looking at a slide of cells, find what you think is a really good example of something you might be looking for and then see if there are others" also, for students headed for the clinic, "Your initial differential should add up to no more than 98%, because there's at least a 2% chance you left the real diagnosis out." Such admissions of ambiguity are, perhaps paradoxically, reassuring to most students.
Finally, while I have rarely encountered a teacher who was not patient, let me take this opportunity to publicly thank all teachers for their patience.
-- Niels Olson (email)
If, at the margin, more medicine fails to produce better health, what does that imply about the medical research literature?
See Robin Hanson, "Medicine as Scandal:"
The issues here are very much the same as the issues raised by the Coleman Report on education years ago. See Mosteller and Moynihan on the Coleman Report and equality of educational opportunity.
-- Edward Tufte
From Paul Starr's Social Transformation of American Medicine (Basic Books, 1983 pp 122-123, here himself quoting Fleming, William H. Welch pp 177-78)
Why, Simon Flexner and other asked, should academic positions in clinical medicine require less commitment than positions in the laboratory sciences? In 1907 Dean Welch of Johns Hopkins gave his support to full-time clinical professorships; [William] Osler, now at Oxford, dissented, warning that teacher and student might become wholly absorbed in research and neglect "those wider interests to which a great hospital must minister." It would be "a very good thing for science, but a bad thing for the profession."
Paul Starr's book has been an influential, and as yet unsurpassed, standard in the health care policy debate. Simon Flexner's review of medical education 100 years ago continues to define the form of American medical education. William Osler more or less founded the Johns Hopkins School of Medicine in Baltimore, generally considered the first modern medical school in the New World.
Here in New Orleans we have a number of hospitals remain unopened after the storm. Charity, with its 1000 beds. De Paul's psych hospital. The most acute need is for psych beds. There is little doubt that some who were kept alive by modern medicine died after the storm because they lost access to services, ranging prescription refills (antihypertension and psych meds come to mind), to dialysis. There has been another wave of people who succumbed to disease sooner because of the general stress of post-storm life. Without a good census, numbers become hard to find or place in context, but the anecdotal stories don't fit with experiences in other cities in other times.
-- Niels Olson (email)
To further explain my previous post. The microscope is used in many different ways by bio-medics and scientists. The example given by Niels Olson above is a great example of how to use the microscope in a clinical setting for diagnostic purposes. Under these circumstances it would be ridiculous to ignore a particular indication because the field of view had not been randomly chosen. The key outcome is capturing a potentially adverse diagnosis and we can accept that an error in one direction (i.e. a false positive diagnosis) is preferable to a false negative.
What I had in mind was the use of a microscope as the transduction device (i.e. converting a stimulus in->data out) for a designed experimental study e.g. comparision of n control animals vs n treatment animals. For example, using microscopy as a means to visualise and then count (for an estimate) the number of a particular type of neuron within a defined anatomical region of the brain. In this setting the subjective selection of images is injurious to the scientific validity of the experiment. Here we rqeuire unbiased selection based on randomised fields of view.
-- MR (email)
Words extend findings beyond the data:
Alan Schwarz, "N.F.L. Study Authors Dispute Concussion Finding", in the The New York
-- Edwrd Tufte
Wikipedia List of Cognitive Biases (Please read this before continuing below.)
After reading through the list, one wonders how people ever get anything right. That's called the "cognitive biases bias," or maybe the "skepticism bias" or "paralysis by analysis."
There's also the "bias bias," where lists of cognitive biases are used as rhetorical weapons to attack any analysis, regardless of the quality of the analysis. The previous sentence then could be countered by describing it as an example of the "bias bias bias,"
and so on in an boring infinite regress of tu quoque disputation, or "slashdot."
The way out is to demand evidence for a claim of bias, and not just to rely on an assertion of bias. Thus the critic is responsible for providing good evidence for the claim of bias and by demonstrating that the claimed bias is relevant to the findings of the original work. Of course that evidence may be biased. . . . And, at some point, we may have to act what evidence we have in hand, although such evidence may have methodological imperfections.
The effects of cognitive biases are diluted by peer review in scholarship, by the extent of opportunity for advancing alternative explanations, by public review, by the presence of good lists of cognitive biases, and, most of all, by additional evidence.
The points above might well be included in the Wikipedia entry, in order to dilute the bias ("deformation professionnelle") of the bias analysis profession.
In Wikpedia, I particularly appreciated:
"Deformation professionnelle is a French phrase, meaning a tendency to look at things from the point of view of one's own profession and forget a broader perspective. It is a pun on the expression, "formation
professionelle," meaning "professional training." The implication is that, all (or most) professional training results to some extent in a distortion of the way the professional views the world."
Thus the essay "The Economisting of Art" is about what I view as the early limits in the microeconomic approach to understanding the prices of art at auction.
In my wanderings through various fields over the years, I have become particularly aware of deformation professionnelle and, indeed, have tried to do fresh things that break through local professional customs, parochialisms, and deformations.
-- Edward Tufte
If Sufficiently Tortured, Data Will Confess to Anything
In his consistently excellent Economists View
blog, Professor Mark Thoma points out an excellent - and amazingly transparent - example of corrupting visual evidence to advance a preconceived point of view. That this amazingly innumerate chart comes from the Wall Street Journal (a presumably respected member of the mainstream media) makes this even more amazing.
Determined to see a Laffer curve in there somewhere, the author blissfully ignores almost all of the data points and confidently draws the one curve that justifies his preconceived agenda that corporate taxes are too high! Note how the curve starts at the artificial (0,0) data point for the UAE, goes to the clear Norway outlier and then drops in an arbitrarily precipitous manner. Arbitrary, that is, other than the fact that the US finds itself to the right of the curve, thereby confirming the authors opinion that US corporate taxes are too high and that the government could increase revenue by decreasing it. Q.E.D.!
Mark demonstrates how an alternative linear function - also somewhat arbitrary, though arguably more defensible - would prove the (not exactly radical) notion that government revenue tends to increase with increased tax rates. But that is exactly the oposite of what the article's author wanted to prove...
Further evidence that data will confess to anything if you torture it enough!
-- Zuil Serip (email)
Why not fit a quadratic model as well, and do a nested models comparison (or, probably equivalently, a test on the quadratic term) to determine whether there is curvature in the data? Better yet, you can probably bin the data and do a straight-out lack-of-fit test.
While significance would not "prove" the quadratic model, it would be strong evidence that the linear model is not sufficient. Then you can start looking for the nature of the curvature.
I wonder how the WSJ and lines compare in terms of sum of squares of errors or mean absolute deviation? (Ok, I don't wonder, but it would be nice to see it presented.)
-- John Johnson (email)
As a statistician working at a medical center, I am becoming increasingly confused by modeling of data, and its relationship to data analysis. The presentation of the two different curves fit to the Corporate Taxes and Revenue data by country, and the response by Mr Johnson, increased this tension.
Zuil Serip showed two curves that Prof. Thoma fit to the same data, one a Laffer Curve, one a linear regression. The demonstration noted that the two curves lead to differing interpretations of the data. Mr Johnson replied that some diagnostics on the fitted models would shed more light on their appropriateness in modeling the data. I share his interest in seeing measures of fit but disagree with his ideas about fitting a quadratic model or other to determine if there is curvature in the data. My concerns, however, deal less with the issues of the strategies of models and fitting [although I tend to be opposed to binning due to information loss issues] and more with the question of "What exactly are we doing here?" and "Why?".
When the discussion turns from the data to the model, we seem to be moving away from an issue of data analysis and toward something different all together. The model is just a collection of summary statistics, essentially functions of the data. Data analysis requires to examine the component parts, the data atoms themselves, and determine the impact of corporate tax rates on tax revenue in those countries. What do the data say? What is special about Norway? Why is U.A.E. so low on both scales? Does its government really have essentially no tax revenue, or is the GDP just gigantic? What are the data points that are not labeled? Is it even rational to try to fit a model to these data? Are they all apples? Or do we also have oranges and grapes and perhaps even window screens? Can we even think of the relationship between tax rates and tax revenue (as a function of GDP) in UAE, Iceland, and Luxembourg as being the same thing? So suppose we fit a quadratic and determine that there *is* curvature in the data, then what? We have 30 data, why are we bothering to reduce these data to a model? Can we not see what is going on by looking at the scatter?
The data analysis question asks: "What are the sources of variation in the tax revenues?" These data and the scatter plot answer by telling us that corporate tax rates may or may not influence this variation.
But the real learning comes from the labeling of the points where we learn, as we often do, that "it's more complicated than that." Analysis here requires that we drop the idea that tax revenue is a worthwhile predictor; it's more complicated than that. I'm certainly no economist, but I would pose that whole premise of the question here (that it is sensible to compare this relationship across (un-named) countries) is not justified. Perhaps it is possible to examine this relationship within a particular country over time, adjusting for appropriate covariates. But the scatter plot tells me it is just silly to try to fit a simple model to these data; it's more complicated than that.
-- rafe donahue (email)
Here is an essay by Dr. John Ioannidis which was published in PLoS Medicine,
August 2005, Volume 2, Issue 8.
Another essay by Ramal Moonesinghe and Muin Khoury which was published in PLoS Medicine, February 2007,
Volume 4, Issue 2.
-- Edward Tufte
Data sharing denied
Andrew Vickers, a biostatistican at Sloan-Kettering, reports on failures to share data sets
-- Edward Tufte
Perhaps this lies a bit outside the goal of this particular thread,
but the web site http://economicindicators.gov , whose stated mission is
"to provide timely access to the daily releases of key economic
indicators from the Bureau of Economic Analysis and the U.S. Census Bureau,"
will be shutting down March 1 because of budget problems. Irony? Burying data?
Tanking economy in an election year? One certainly can't have a reasoned
conversation about the meaning of economic data if one can't get the data.
Perhaps a more interesting general question
is why the data they do release is presented in such opaque form, long tables
of numbers, comparisons to the same month the previous year, etc.
-- Michael Kaufman (email)
As documents refuse to die, so too their lessons. Though this item is now three years old, it bears reiteration. Nobel laureate Paul Krugman identified the pernicious infiltration of "slideware-speak" into the highest levels of thinking within _our_ Executive Branch: "The National Security Council document released this week under the grandiose title 'National Strategy for Victory in Iraq' is neither an analytical report nor a policy statement. It's simply the same old talking points, 'victory in Iraq is a vital U.S. interest,' 'failure is not an option,' repackaged in the form of a slide presentation for a business meeting. It's an embarrassing piece of work." Paul Krugman, Bullet Points Over Baghdad, The New York Times, Dec. 2, 2005.
See for yourself, the document is still online:
-- David Johnson (email)
Corrupt results in physics textbooks
See "Introductory physics: The new scholasticism" by Sanjoy Mahajan,
Physics Department, University of Cambridge, and David W. Hogg,
Physics Department, New York University at
-- Edward Tufte
I found the article about air resistance very interesting, and, like the authors of one of the texts quoted, I was
surprised at how big the effect is.
A similar (more elementary) case in biology where the textbooks say something that contradicts the everyday
experience of some readers is the description of human eye colour in elementary accounts of Mendelian genetics. The
general idea is that brown eyes are dominant over blue, so if both parents have blue eyes then so do their children; if
either parent is homozygotic for brown eyes then the children have brown eyes, but if either or both parents are
heterozygotic the child's eyes may be either brown or blue. All that is true enough if we confine attention to people
whose eyes are bright blue or dark brown, but that leaves out a lot of people who will read the textbook account and
wonder how it applies to them. My eyes are neither brown nor blue, and there are plenty of other people who can say
the same. Of course, it's fine for a textbook to describe the simplest case, but it's important to say that it is
the simplest case and that real life is often more complicated.
-- Athel Cornish-Bowden (email)
The New York Times, March 11, 2009:
"In what may be among the longest-running and widest-ranging cases of academic fraud, one of the most prolific
researchers in anesthesiology has admitted that he fabricated much of the data underlying his research."
"The researcher...never conducted the clinical trials that he wrote about in 21 journal articles... ."
Big pharma is mentioned...
NYT home page search: "neuropathic pain medicines"
-- David (email)
Cherry-picking and more
Shankar Vedantam, "A Silenced Drug Study Creates An Uproar," Washington Post, March 18, 2009, here.
-- Edward Tufte
BP oil spill recovery presentation
A BP vice president confusing integral and derivative here.
As the link explains,
"In a new video explaining the 'top kill' strategy, BP senior vice president Kent Wells shows this graphic for the amount
of oil being captured from the Deepwater Horizon by the suction tube. Wells says that BP has been tweaking the tube
to 'maximize' the collection of oil from the gushing well.
"'There's been a lot of questions around how much oil is being collected,' Well says at around 4:11, pointing to the
graph. But if you look closely at the chart ... those green bars go up because the tube has been in place since May 16.
The longer it stays, the more gallons it collects. It's not necessarily collecting more oil on successive days, let alone
"most" of the oil as Wells says they're trying to do.
"Wells mentions some of the technical adjustments to the siphon, then says, 'Here you can see how we've continued to
ramp up.' If only that were so.
"From commenter Brandon Green: 'Wow, if you look at the tapering off in the last few bars, it would seem the graph
proves the exact opposite point they are trying to use it to make - that they are somehow managing to become LESS
efficient at collecting the oil.'"
-- Anonymous (email)
here is a group at UCL who are asking some very interesting questions of inference and evidence;
Is there a concept of evidence that applies universally?
Are there specific or generic techniques for manipulating evidence that can be applied across disciplinary boundaries?
They also provide quite a bit of material on evidence in complex legal and/or forensic cases and use of Wigmore charts.
-- Matt R (email)
deformation professionelle is well enough captured in the popular adage, "If all you have is a hammer, ..."
Currently relevant, "If all you have is CO2 sensitivity, ..."
-- Brian Hall (email)
How about this smack in the face of honest
In Fox News' on-screen chart of unemployment rates over the last year, not only do the
positions of the data points bear no relation to the actual y-axis scaling, but also don't
even make sense relative to each other (8.6% unemployment is represented as approximately
the same as 9%, and charted above 8.8%). Even the Autocontent Wizard could have
done better than that!
-- Alex (email)