Classic articles on statistical thinking

April 2, 2011 | Edward Tufte

4 Comment(s)

The Median Isn’t the Message by Stephen Jay Gould

Prefatory Note by Steve Dunn

Stephen Jay Gould was influential evolutionary biologist who taught at Harvard University. He was the author of at least ten popular books on the evolution, and science, including, among others, The Flamingo’s Smile, The Mismeasure of Man, Wonderful Life, and Full House.

As far as I’m concerned, Gould’s The Median Isn’t the Message is the wisest, most humane thing ever written about cancer and statistics. It is the antidote both to those who say that, “the statistics don’t matter,” and those who have the unfortunate habit of pronouncing death sentences on patients who face a difficult prognosis. Anyone who researches the medical literature will confront the statistics for their disease. Anyone who reads this will be armed with reason and with hope. The Median Isn’t the Message is reproduced here by permission of the author.

My life has recently intersected, in a most personal way, two of Mark Twain’s famous quips. One I shall defer to the end of this essay. The other (sometimes attributed to Disraeli), identifies three species of mendacity, each worse than the one before – lies, damned lies, and statistics.

Consider the standard example of stretching the truth with numbers – a case quite relevant to my story. Statistics recognizes different measures of an “average,” or central tendency. The mean is our usual concept of an overall average – add up the items and divide them by the number of sharers (100 candy bars collected for five kids next Halloween will yield 20 for each in a just world). The median, a different measure of central tendency, is the half-way point. If I line up five kids by height, the median child is shorter than two and taller than the other two (who might have trouble getting their mean share of the candy). A politician in power might say with pride, “The mean income of our citizens is $15,000 per year.” The leader of the opposition might retort, “But half our citizens make less than $10,000 per year.” Both are right, but neither cites a statistic with impassive objectivity. The first invokes a mean, the second a median. (Means are higher than medians in such cases because one millionaire may outweigh hundreds of poor people in setting a mean; but he can balance only one mendicant in calculating a median).

The larger issue that creates a common distrust or contempt for statistics is more troubling. Many people make an unfortunate and invalid separation between heart and mind, or feeling and intellect. In some contemporary traditions, abetted by attitudes stereotypically centered on Southern California, feelings are exalted as more “real” and the only proper basis for action – if it feels good, do it – while intellect gets short shrift as a hang-up of outmoded elitism. Statistics, in this absurd dichotomy, often become the symbol of the enemy. As Hilaire Belloc wrote, “Statistics are the triumph of the quantitative method, and the quantitative method is the victory of sterility and death.”

This is a personal story of statistics, properly interpreted, as profoundly nurturant and life-giving. It declares holy war on the downgrading of intellect by telling a small story about the utility of dry, academic knowledge about science. Heart and head are focal points of one body, one personality.

In July 1982, I learned that I was suffering from abdominal mesothelioma, a rare and serious cancer usually associated with exposure to asbestos. When I revived after surgery, I asked my first question of my doctor and chemotherapist: “What is the best technical literature about mesothelioma?” She replied, with a touch of diplomacy (the only departure she has ever made from direct frankness), that the medical literature contained nothing really worth reading.

Of course, trying to keep an intellectual away from literature works about as well as recommending chastity to Homo sapiens, the sexiest primate of all. As soon as I could walk, I made a beeline for Harvard’s Countway medical library and punched mesothelioma into the computer’s bibliographic search
program. An hour later, surrounded by the latest literature on abdominal mesothelioma, I realized with a gulp why my doctor had offered that humane advice. The literature couldn’t have been more brutally clear: mesothelioma is incurable, with a median mortality of only eight months after discovery. I sat
stunned for about fifteen minutes, then smiled and said to myself: so that’s why they didn’t give me anything to read. Then my mind started to work again, thank goodness.

If a little learning could ever be a dangerous thing, I had encountered a classic example. Attitude clearly matters in fighting cancer. We don’t know why (from my old-style materialistic perspective, I suspect that mental states feed back upon the immune system). But match people with the same cancer for age,
class, health, socioeconomic status, and, in general, those with positive attitudes, with a strong will and purpose for living, with commitment to struggle, with an active response to aiding their own treatment and not just a passive acceptance of anything doctors say, tend to live longer. A few months later I asked Sir Peter Medawar, my personal scientific guru and a Nobelist in immunology, what the best prescription for success against cancer might be. “A sanguine personality,” he replied. Fortunately (since one can’t reconstruct oneself at short notice and for a definite purpose), I am, if anything, even-tempered and confident in just this manner.

Hence the dilemma for humane doctors: since attitude matters so critically, should such a sombre conclusion be advertised, especially since few people have sufficient understanding of statistics to evaluate what the statements really mean? From years of experience with the small-scale evolution of Bahamian land snails treated quantitatively, I have developed this technical knowledge – and I am convinced that it played a major role in saving my life. Knowledge is indeed power, in Bacon’s proverb.

The problem may be briefly stated: What does “median mortality of eight months” signify in our vernacular? I suspect that most people, without training in statistics, would read such a statement as “I will probably be dead in eight months” – the very conclusion that must be avoided, since it isn’t so, and since attitude matters so much.

I was not, of course, overjoyed, but I didn’t read the statement in this vernacular way either. My technical training enjoined a different perspective on “eight months median mortality.” The point is a subtle one, but profound – for it embodies the distinctive way of thinking in my own field of evolutionary biology and natural history.

We still carry the historical baggage of a Platonic heritage that seeks sharp essences and definite boundaries. (Thus we hope to find an unambiguous “beginning of life” or “definition of death,” although nature often comes to us as irreducible continua.) This Platonic heritage, with its emphasis in clear distinctions and separated immutable entities, leads us to view statistical measures of central tendency wrongly, indeed opposite to the appropriate interpretation in our actual world of variation, shadings, and continua. In short, we view means and medians as the hard “realities,” and the variation that permits their calculation as a set of transient and imperfect measurements of this hidden essence. If the median is the reality and variation around the median just a device for its calculation, the “I will probably be dead in eight months” may pass as a reasonable interpretation.

But all evolutionary biologists know that variation itself is nature’s only irreducible essence. Variation is the hard reality, not a set of imperfect measures for a central tendency. Means and medians are the abstractions. Therefore, I looked at the mesothelioma statistics quite differently – and not only because I am an optimist who tends to see the doughnut instead of the hole, but primarily because I know that variation itself is the reality. I had to place myself amidst the variation.

When I learned about the eight-month median, my first intellectual reaction was: fine, half the people will live longer; now what are my chances of being in that half. I read for a furious and nervous hour and concluded, with relief: damned good. I possessed every one of the characteristics conferring a probability of longer life: I was young; my disease had been recognized in a relatively early stage; I would receive the nation’s best medical treatment; I had the world to live for; I knew how to read the data properly and not despair.

Another technical point then added even more solace. I immediately recognized that the distribution of variation about the eight-month median would almost surely be what statisticians call “right skewed.” (In a symmetrical distribution, the profile of variation to the left of the central tendency is a mirror image of variation to the right. In skewed distributions, variation to one side of the central tendency is more stretched out – left skewed if extended to the left, right skewed if stretched out to the right.) The distribution of variation had to be right skewed, I reasoned. After all, the left of the distribution contains an irrevocable lower boundary of zero (since mesothelioma can only be identified at death or before). Thus, there isn’t much room for the distribution’s lower (or left) half – it must be scrunched up between zero and eight months. But the upper (or right) half can extend out for years and years, even if nobody ultimately survives. The distribution must be right skewed, and I needed to know how long the extended tail ran – for I had already concluded that my favorable profile made me a good candidate for that part of the curve.

The distribution was indeed, strongly right skewed, with a long tail (however small) that extended for several years above the eight month median. I saw no reason why I shouldn’t be in that small tail, and I breathed a very long sigh of relief. My technical knowledge had helped. I had read the graph correctly. I had asked the right question and found the answers. I had obtained, in all probability, the most precious of all possible gifts in the circumstances – substantial time. I didn’t have to stop and immediately follow Isaiah’s injunction to Hezekiah – set thine house in order for thou shalt die, and not live. I would have time to think, to plan, and to fight.

One final point about statistical distributions. They apply only to a prescribed set of circumstances – in this case to survival with mesothelioma under conventional modes of treatment. If circumstances change, the distribution may alter. I was placed on an experimental protocol of treatment and, if fortune holds, will be in the first cohort of a new distribution with high median and a right tail extending to death by natural causes at advanced old age.

It has become, in my view, a bit too trendy to regard the acceptance of death as something tantamount to intrinsic dignity. Of course I agree with the preacher of Ecclesiastes that there is a time to love and a time to die – and when my skein runs out I hope to face the end calmly and in my own way. For most situations, however, I prefer the more martial view that death is the ultimate enemy – and I find nothing reproachable in those who rage mightily against the dying of the light.

The swords of battle are numerous, and none more effective than humor. My death was announced at a meeting of my colleagues in Scotland, and I almost experienced the delicious pleasure of reading my obituary penned by one of my best friends (the so-and-so got suspicious and checked; he too is a
statistician, and didn’t expect to find me so far out on the right tail). Still, the incident provided my first good laugh after the diagnosis. Just think, I almost got to repeat Mark Twain’s most famous line of all: the reports of my death are greatly exaggerated.

Postscript By Steve Dunn

Many people have written me to ask what became of Stephen Jay Gould. Sadly, Dr. Gould died in May of 2002 at the age of 60. Dr. Gould lived for 20 very productive years after his diagnosis, thus exceeding his 8 month median survival by a factor of thirty! Although he did die of cancer, it apparently wasn’t mesothelioma, but a second and unrelated cancer.

In March 2002, Dr. Gould published his 1342 page “Magnum Opus”, The Structure of Evolutionary Theory. It is fitting that Gould, one of the world’s most prolific scientists and writers, was able to complete the definitive statement of his scientific work and philosophy just in time. That text is far too long and dense for almost any layman – but the works of Stephen Jay Gould will live on. Especially I hope, The Median Isn’t The Message.

Topics: E.T.

Comments

Edward Tufte says:

April 2, 2011 at 10:40 pm

One of the fundamental cognitive tasks in analytical thinking is to reason about causality. Thus one of the fundamental principles of analytical design is to show causality. Austin Bradford Hill’s classic essay on thinking about causal evidence is reproduced here.

E.T.

Austin
Bradford Hill, “The Environment and Disease: Association or Causation?,”

Proceedings of the Royal Society of Medicine, 58 (1965),
295-300.

The Environment and Disease:

Association or Causation?

By Sir Austin Bradford Hill CBE DSC FRCP (hon) FRS

(Professor Emeritus of Medical Statistics, University of London)

Amongst the objects of this newly-founded Section of Occupational Medicine and firstly; to provide a means, not readily afforded elsewhere, whereby physicians and surgeons with a special knowledge of the relationship between sickness and injury and conditions of work may discuss their problems, not only with each other, but also with colleagues in other fields, by holding joint meetings with other Sections of the Society; and secondly, to make available information about the physical, chemical and psychological hazards of occupation, and in particular about those that are rare or not easily recognized.

At this first meeting of the Section and before, with however laudable intentions, we set about instructing our colleagues in other fields, it will be proper to consider a problem fundamental to our own. How in the first place do we detect these relationships between sickness, injury and conditions of work? How do we determine what are physical, chemical and psychological hazards of occupation, and in particular those that are rare and not easily recognized?

There are, of course, instances in which we can reasonably answer these questions from the general body of medical knowledge. A particular, and perhaps extreme, physical environment cannot fail to be harmful; a particular chemical is known to be toxic to man and therefore suspect on the factory floor. Sometimes, alternatively, we may be able to consider what might a particular environment do to man, and then see whether such consequences are indeed to be found. But more often than not we have no such guidance, no such means of proceeding; more often than not we are dependent upon our observation and enumeration of defined events for which we then seek antecedents. In other words we see that the event B is associated with the environmental feature A, that, to take a specific example, some form of respiratory illness is associated with a dust in the environment. In what circumstances can we pass from this observed association to a verdict of causation? Upon what basis should be proceed to do so?

I have no wish, nor the skill, to embark upon philosophical discussion of the meaning of. The cause of illness may be immediate and direct; it may be remote and indirect underlying the observed association. But with the aims of occupational, and almost synonymous preventive, medicine in mind the decisive question is where the frequency of the undesirable event B will be influenced by a change in the environmental feature A. How such a change exerts that influence
may call for a great deal of research, However, before deducing and taking action we shall not invariably have to sit around awaiting the results of the research. The whole chain may have to be unraveled or a few links may suffice. It will depend upon circumstances.

Disregarding then any such problem in semantics we have this situation. Our observations reveal an association between two variables, perfectly clear-cut and beyond what we would care to attribute to the play of chance. What aspects of that association should we especially consider before deciding that the most likely interpretation of it is causation?

(1) Strength. First upon my list I would put the strength of the association. To take a very old example, by comparing the occupations of patients with scrotal cancer with the occupations of patients presenting with other diseases, Percival Pott could reach a correct conclusion because of the enormous increase of scrotal cancer in the chimney sweeps. Even as late as the second decade of the twentieth century writes Richard Doll (1964), the mortality of chimney sweeps from scrotal cancer was some 200 times that of workers who were not specially exposed to tar or mineral oils and in the eighteenth century the relative difference is likely to have been much greater.

To take a more modern and more general example upon which I have now reflected for over fifteen years, prospective inquiries into smoking have shown that the death rate from cancer of the lung in cigarette smokers is nine to ten times the rate in non-smokers and the rate in heavy cigarette smokers is twenty to thirty times as great. On the other hand the death rate from coronary thrombosis in smokers is no more than twice, possibly less, the death rate in non-smokers. Though there is good evidence to support causation it is surely much easier in this case to think of some feature of life that may go hand-in-hand with smoking features that might conceivably be the real underlying cause or, at the least, an important contributor, whether it be lack of exercise, nature of diet or other factors. But to explain the pronounced excess of cancer of the lung in any other environmental terms requires some feature of life so intimately linked with cigarette smoking and with the amount of smoking that such a feature should be easily detectable. If we cannot detect it or reasonably infer a specific one, then in such circumstances I think we are reasonably entitled to reject the vague contention of the armchair critic you can’t prove it, there maybe such a feature.

Certainly in this situation I would reject the argument sometimes advanced that what matters is the absolute difference between the death rates of our various groups and not the ratio of one to the other. That depends upon what we want to know. If we want to know how many extra deaths from cancer of the lung will take place through smoking (i.e. presuming causation), then obviously we must use the absolute differences between the death rates – 0.07 per 1,000 per year in nonsmoking doctors, 0.57 in those smoking 1-14 and 2.27 for 25 or more daily. But it does not follow here, or in more specifically occupational problems, that this best measure of the effect upon mortality is also the best measure in relation to etiology. In this respect the ratios of 8, 20 and 32 to 1 are far more informative. It does not, of course, follow that the differences revealed by ratios are of any practical importance. Maybe they are, maybe they are not; but that is another point altogether.

We may recall John Snow’s classic analysis of the opening weeks of the cholera epidemic of 1854 (Snow 1855). The death rate that he recorded in the customs supplied with the grossly polluted water of the Southwark and Vauxhall Company was in truth quite low 71 deaths in each 10,000 houses. What stands out vividly is the fact that the small rate is 14 times the figure of 5 deaths per 10,000 houses supplied with the sewage free water of the Lambeth Company.

In thus putting emphasis upon the strength of an association we must, nevertheless, look at the obverse of the coin. We must not be too ready to dismiss a cause and effect hypothesis merely on the grounds that the observed association appears to be slight. There are many occasions in medicine when this is in truth so. Relatively few persons harboring the meningococcus fall sick of the meningococcal meningitis. Relatively few persons occupationally exposed to rat’s urine contract Weill’s disease.

(2) Consistency: Next on my list of features to be specially considered I would place the consistency of the observed association. Has it been repeatedly observed by different persons, in different places, circumstances and times?

This requirement may be of special importance for those rare hazards singled out in the section’s terms of reference. With many alert minds at work in the industry today many an environmental association may be thrown up. Some of them on the customary tests of statistical significance will appear to be unlikely to be due to chance. Nevertheless whether chance is the explanation or whether a true hazard has been revealed may sometimes be answered only by a repetition of the circumstances and the observations.

Returning to my more general example, the Advisory Committee to the Surgeon-General of the United States Public Health Service found the association of smoking with cancer of the lung in 29 retrospective and 7 prospective inquiries (US Department of Health, Education and Welfare 1964). The lesson here is that broadly the same answer has been reached in quite a wide variety of situations and techniques. In other words, we can justifiably infer that the association is not due to some constant error or fallacy that permeates every inquiry. And we have indeed to be on our guards against that.

Take, for instance, an example given by Heady (1958). Patients admitted to hospital for operation for peptic ulcer are questioned about recent domestic anxieties or crises that may have precipitated the acute illness. As controls, patients admitted for operation for a simple hernia are similarly quizzed. But, as Heady points out, the two groups may not be in pari materia. If your wife ran off with the lodger last week you still have to take your perforated ulcer to hospital without delay. But with a hernia you might prefer to stay at home for a while – to mourn (or celebrate) the event. No number of exact repetitions would remove or necessarily reveal that fallacy.

We have, therefore, the somewhat paradoxical position that the different results of a different inquiry certainly cannot be held to refute the original evidence; yet the same results from precisely the same form of inquiry will not invariably greatly strengthen the original evidence. I would myself put a good deal of weight upon similar results reached in quite different ways, e.g. prospectively and retrospectively.

Once again looking at the obverse of the coin there will be occasions when repetition is absent or impossible and yet we should not hesitate to draw conclusions. The experience of the nickel refiners of South Wales is an outstanding example. I quote from the Alfred Watson Memorial Lecture that I gave in 1962 to the Institute of Actuaries:

The population at risk, workers and pensioners, numbered about one thousand. During the ten years 1929 to 1938, sixteen of them had died from cancer of the nasal sinuses. At the age specific death rates of England and Wales at that time, one might have anticipated one death from cancer of the lung (to compare with the 16), and a fraction of a death from cancer of the nose (to compare with the 11). In all other bodily sites cancer had appeared on the death certificate 11 times and one would have expected it to do so 10-11 times. There had been 67 deaths from all other causes of mortality and over the ten years; period 72 would have been expected at the national death rates. Finally division of the population at risk in relation to their jobs showed that the excess of cancer of the lung and nose had fallen wholly upon the workers employed in the chemical processes.

More recently my colleague, Dr. Richard Doll, has brought this story a stage further. In the nine years 1948 to 1956 there had been, he found, 48 deaths from cancer of the lung and 13 deaths from cancer of the nose. He assessed the numbers expected at normal rates of mortality as, respectively 10 to 0.1.

In 1923, long before any special hazard had been recognized, certain changes in the refinery took place. No case of cancer of the nose has been observed in any man who first entered the works after that year, and in these men there has been no excess of cancer of the lung. In other words, the excess in both sites is uniquely a feature in men who entered the refinery in, roughly, the first 23 years of the present century.

No causal agent of these neoplasms has been identified. Until recently no animal experimentation had given any clue or any support to this wholly statistical evidence. Yet I wonder if any of us would hesitate to accept it as proof of a grave industrial hazard? (Hill 1962).

In relation to my present discussion I know of no parallel investigation. We have (or certainly had) to make up our minds on a unique event; and there is no difficulty in doing so.

(3) Specificity: One reason, needless to say, is the specificity of the association, the third characteristic which invariably we must consider. If as here, the association is limited to specific workers and to particular sites and types of disease and there is no association between the work and other modes of dying, then clearly that is a strong argument in favor of causation.

We must not, however, over-emphasize the importance of the characteristic. Even in my present example there is a cause and effect relationship with two different sites of cancer-the lung and the nose. Milk as a carrier of infection and, in that sense, the cause of disease can produce such a disparate galaxy as
scarlet fever, diptheria, tuberculosis, undulant fever, sore throat, dysentary and typhoid fever. Before the discovery of the underlying factor, the bacterial origin of disease, harm would have been done by pushing too firmly the need for specificity as a necessary feature before convicting the dairy.

Coming to modern time the prospective investigations of smoking and cancer of the lung have been criticized for not showing specificity-in other words the death rate of smokers is higher than the death rate of non-smokers from many causes of death (though in fact the results of Doll and Hill, 1964, do not show that). But here surely one must return to my first characteristics, the strength of the association. If other causes of death are raised 10, 20 or even 50% in smokers whereas cancer of the lung is raised 900-1000% we have specificity-a specificity in the magnitude of the association.

We must also keep in mind that diseases may have more than one cause. It has always been possible to acquire a cancer of the scrotum without sweeping chimneys of taking to mulespinning in Lancashire. One-to-one relationships are not frequent. Indeed I believe that multi-causation is generally more likely than single causation though possibly if we knew all the answer we might get back to a single factor.

In short, if specificity exists we may be able to draw conclusions without hesitation; if it is not apparent, we are not thereby necessarily left sitting irresolutely on the fence.

(4) Temporality: My fourth characteristic is the temporal relationship of the association-which is the cart and which is the horse? This is a question which might be particularly relevant with diseases of slow development. Does a particular diet lead to disease or do the early stages of the disease lead to those particular dietetic habits? Does a particular occupation or occupational environment promote infection by the tubercle bacillus or are the men and women who select that kind of work more liable to contract tuberculosis whatever the environment-or, indeed, have they already contracted it? This temporal problem may not arise often, but it certainly needs to be remembered, particularly with selective factors at work in the industry.

(5) Biological gradient: Fifthly, if the association is one which can reveal a biological gradient, or dose-response curve, then we should look most carefully for such evidence. For instance, the fact that the death rate from cancer of the lung rises linearly with the number of cigarettes smoked daily, adds a very great deal to the simpler evidence that cigarette smokers have a higher death rate than non-smokers. The comparison would be weakened, though not necessarily destroyed, if it depended upon, say, a much heavier death rate in light smokers and a lower rate in heavier smokers. We should then need to envisage some much more complex relationship to satisfy the cause and effect hypothesis. The clear dose-response curve admits of a simple explanation and obviously puts the case in a clearer light.

The same would clearly be true of an alleged dust hazard in industry. The dustier the environment the greater the incidence of disease we would expect to see. Often the difficulty is to secure some satisfactory quantitative measures of the environment which will permit us to explore this dose-response. But we should invariably seek it.

(6) Plausibility: It will be helpful if the causation we suspect is biologically plausible. But this is a feature I am convinced we cannot demand. What is biologically plausible depends upon the biological knowledge of the day.

To quote again from my Alfred Watson Memorial Lecture (Hill 1962), there was

‘no biological knowledge to support (or to refute) Pott’s observation in the 18th century of the excess of cancer in chimney sweeps. It was lack
of biological knowledge in the 19th that led to a prize essayist writing on the value and the fallacy of statistics to conclude, amongst other “absurd” associations, that “it could be no more ridiculous for the strange who passed the night in the steerage of an emigrant ship to ascribe the typhus, which he there contracted, to the vermin with which bodies of the sick might be infected.” And coming to nearer times, in the 20th century there was no biological knowledge to support the evidence against rubella.’

In short, the association we observe may be one new to science or medicine and we must not dismiss it too light-heartedly as just too odd. As Sherlock Holmes advised Dr. Watson, ‘when you have eliminated the impossible, whatever remains, however improbable, must be the truth.’

(7) Coherence: On the other hand the cause-and-effect interpretation of our data should not seriously conflict with the generally known facts of the natural history and biology of the disease-in the expression of the Advisory Committee to the Surgeon-General it should have coherence.

Thus in the discussion of lung cancer the Committee finds its association with cigarette smoking coherent with the temporal rise that has taken place in the two variables over the last generation and with the sex difference in mortality-features that might well apply in an occupational problem. The known urban/rural ratio of lung cancer mortality does not detract from coherence, nor the restriction of the effect to the lung.

Personally, I regard as greatly contributing to coherence the histopathological evidence from the bronchial epithelium of smokers and the isolation from cigarette smoke of factors carcinogenic for the skin of laboratory animals. Nevertheless, while such laboratory evidence can enormously strengthen the hypothesis and, indeed, may determine the actual causative agents, the lack of such evidence cannot nullify the epidemiological associations in man. Arsenic can undoubtedly cause cancer of the skin in man but it has never been possible to demonstrate such an effect on any other animal. In a wider field John Snow’s epidemiological observations on the conveyance of cholera by water from the Broad Street Pump would have been put almost beyond dispute if Robert Koch had been then around to isolate the vibrio from the baby’s nappies, the well itself and the gentleman in delicate health from Brighton. Yet the fact that Koch’s work was to be awaited another thirty years did not really weaken the epidemiological case though it made it more difficult to establish against the criticisms of the day-both just and unjust.

(8) Experiment: Occasionally it is possible to appeal to experimental, or semi-experimental, evidence. For example, because of an observed association some preventive action is taken. Does it in fact prevent? The dust in the workshop is reduced, lubricating oils are changed, persons stop smoking cigarettes. Is the frequency of the associated events affected? Here the strongest support for the causation hypothesis may be revealed.

(9) Analogy: In some circumstances it would be fair to judge by analogy. With the effects of thalidomide and rubella before us we would surely be ready to accept slighter but similar evidence with another drug or another viral disease in pregnancy.

Here then are nine different viewpoints from all of which we should study association before we cry causation. What I do not believe-and this has been suggested-that we can usefully lay down some hard-and-fast rules of evidence that must be obeyed before we can accept cause and effect. None of my nine viewpoints can bring indisputable evidence for or against the cause-and-effect hypothesis and none can be required as a sine qua non. What they can do, with greater or less strength, is to help us to make up our minds on the fundamental question-is there any other way of explaining the set of facts before us, is there any other answer equally, or more, likely than cause and effect?

Tests of Significance

No formal tests of significance can answer those questions. Such tests can, and should, remind us of the effects that the play of chance can create, and they will instruct us in the likely magnitude of those effects. Beyond that they contribute nothing to the ‘proof’ of our hypothesis.

Nearly forty years ago, amongst the studies of occupational health that I made for the Industrial Health Research Board of the Medical Research Council was one that concerned the workers in the cotton-spinning mills of Lancashire (Hill 1930). The question that I had to answer, by the use of the National Health Insurance records of that time, was this: Do the workers in the cardroom of the spinning mill, who tend the machines that clean the raw cotton, have a sickness experience in any way different from that of the other operatives in the same mills who are relatively unexposed to the dust and fibre that were features of the card room? The answer was an unqualified ‘Yes’. From age 30 to age 60 the cardroom workers suffered over three times as much from respiratory causes of illness whereas from non-respiratory causes their experience was not different from that of the other workers. This pronounced difference with the respiratory causes was derived not from abnormally long periods of sickness but rather from an excessive number of repeated absences from work of the cardroom workers.

All this has rightly passed into the limbo of forgotten things. What interests me today is this: My results were set out for men and women separately and for half a dozen age groups in 36 tables. So there were plenty of sums. Yet I cannot find that anywhere I thought it necessary to use a test of significance. The evidence was so clear cut, the differences between the groups were mainly so large, the contrast between respiratory and non-respiratory causes of illness so specific, that no formal tests could really contribute anything of value to the argument. So why use them?

Would we think or act that way today? I rather doubt it. Between the two world wars there was a strong case for emphasizing to the clinician and other research workers the importance of not overlooking the effects of the play of chance upon their data. Perhaps too often generalities were based upon two men and a laboratory dog while the treatment of choice was deducted from a difference between two bedfulls of patients and might easily have no true meaning. It was therefore a useful corrective for statisticians to stress, and to teach the needs for, tests of significance merely to serve as guides to caution before drawing a conclusion, before inflating the particular to the general.

I wonder whether the pendulum has not swung too far-not only with the attentive pupils but even with the statisticians themselves. To decline to draw
conclusions without standard errors can surely be just as silly? Fortunately I believe we have not yet gone so far as our friends in the USA where, I am told, some editors of journals will return an article because tests of significance have not been applied. Yet there are innumerable situations in which they are totally unnecessary-because the difference is grotesquely obvious, because it is negligible, or because, whether it be formally significant or not, it is too
small to be of any practical importance. What is worse the glitter of the t table diverts attention from the inadequacies of the fare. Only a tithe, and an unknown tithe, of the factory personnel volunteer for some procedure or interview, 20% of patients treated in some particular way are lost to sight, 30% of a randomly-drawn sample are never contracted. The sample may, indeed, be akin to that of the man who, according to Swift, ‘had a mind to sell his house and carried a piece of brick in his pocket, which he showed as a pattern to encourage purchasers.’ The writer, the editor and the reader are unmoved. The magic formulae are there.

Of course I exaggerate. Yet too often I suspect we waste a deal of time, we grasp the shadow and lose the substance, we weaken our capacity to interpret the data and to take reasonable decisions whatever the value of P. And far too often we deduce ‘no difference’ from ‘no significant difference.’Like fire, the chi-squared test is an excellent servant and a bad master.

The Case for Action

Finally, in passing from association to causation I believe in ‘real life’ we shall have to consider what flows from that decision. On scientific grounds we should do no such thing. The evidence is there to be judged on its merits and the judgment (in that sense) should be utterly independent of what hangs upon it-or who hangs because of it. But in another and more practical sense we may surely ask what is involved in our decision. In occupational medicine our object is usually to take action. If this be operative cause and that be deleterious effect, then we shall wish to intervene to abolish or reduce death or disease.

While that is a commendable ambition, it almost inevitably leads us to introduce differential standards before we convict. Thus on relatively slight evidence we might decide to restrict the use of a drug for early-morning sickness in pregnant women. If we are wrong in deducing causation from association no great harm will be done. The good lady and the pharmaceutical industry will doubtless survive.

On fair evidence we might take action on what appears to be an occupational hazard, e.g. we might change from a probably carcinogenic oil to a non-carcinogenic oil in a limited environment and without too much injustice if we are wrong. But we should need very strong evidence before we made people burn a fuel in their homes that they do not like or stop smoking the cigarettes and eating the fats and sugar that they do like. In asking for very strong evidence I would, however, repeat emphatically that this does not imply crossing every ‘t’, and swords with every critic, before we act.

All scientific work is incomplete- whether it be observational or experimental. All scientific work is liable to be upset or modified by advancing knowledge. That does not confer upon us a freedom to ignore the knowledge we already have, or to postpone the action that it appears to demand at a given time.

Who knows, asked Robert Browning, but the world may end tonight? True, but on available evidence most of us make ready to commute on 8:30 the next day.
Edward Tufte says:

April 2, 2011 at 10:57 pm

A good metaphor for making presentations is good teaching. Teachers seek to explain content and thinking; and to present that material with credibility, with authority, and without patronizing authoritarianism. Here is an excellent paper by the great statistician Frederick Mosteller on teaching.

E.T.

Frederick Mosteller, “Classroom and Platform Performance”, The American Statistician, February 1980, Vol. 34, No.1, pp. 11-17.

Reprinted with permission from The American Statistician. Copyright 1980 by the American Statistical Association. All rights reserved.
Edward Tufte says:

April 2, 2011 at 11:00 pm

T.A. Bancroft, ed., Statistical Papers in Honor of George W. Snedecor, (Ames, Iowa 1972), pp. 293-316, copyright © 1972
by The Iowa State University Press
Pati says:

April 23, 2011 at 7:44 pm

Thank you, ET, for posting this and also the piece on graphical summaries for medical patients. I had seen them before,
and recently told someone about the basic graph in the medical patient article. As we’re inundated with more and more
“stuff,” some of the good things like your postings get pushed back, memorable as they are. So, I’m glad to be
reminded of them. Always helpful and interesting. Thanks, again. Pati