All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Beautiful EvidencePaper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
All 4 clothbound books, autographed by the author $180
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $2
The Cognitive Style of Powerpoint $2
Seeing Around + Feynman Diagrams $2
Data Analysis for Politics and Policy $2catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
Boston MA, April 24, 25, 26
Seattle WA, May 13, 14
Portland OR, May 16, 17
Cambridge MA, June 3, 4
Max Singer published a classic article on policy-making numbers that over-reach and exaggerate the scope and urgency of the problems: "The Vitality of Mythical Numbers, Public Interest (Spring, 1971), 3-9.
For years in teaching evidence for policy-making at Princeton and Yale, I used this article as the first item on my reading list. The conclusion that a particular policy number over-reaches has of course to be earned by some relevant evidence (and not by the generality that policy numbers over-reach). Such evidence can sometimes come from simple calculations, approximations, and cross-checks, as Singer illustrates in his paper.
Journalists pick up, quote, and repeat scary numbers in faux-trend stories.
Here is Singer's classic article:
THE VITALITY OF MYTHICAL NUMBERS
by Max Singer
It is generally assumed that heroin addicts in New York City steal some two to five billion dollars worth of property a year, and commit approximately half of all the property crimes. Such estimates of addict crime are used by an organization like RAND, by a political figure like Howard Samuels, and even by the Attorney General of the United States. The estimate that half the property crimes are committed by addicts was originally attributed to a police official and has been used so often that it is now part of the common wisdom.
The amount of property stolen by addicts is usually estimated in something like the following manner:
There are 100,000 addicts with an average habit of $30.00 per day. This means addicts must have some $1.1 billion a year to pay for their heroin (100,000 x 365 x $30.00). Because the addict must sell the property he steals to a fence for only about a quarter of its value, or less, addicts must steal some $4 to $5 billion a year to pay for their heroin.
These calculations can be made with more or less sophistication. One can allow for the fact that the kind of addicts who make their living illegally typically spend upwards of a quarter of their time in jail, which would reduce the amount of crime by a quarter. (_The New York Times_ recently reported on the death of William "Donkey" Reilly. A 74-year-old ex-addict who had been addicted for 54 years, he had spent 30 of those years in prison.) Some of what the addict steals is cash, none of which has to go to a fence. A large part of the cost of heroin is paid for by dealing in the heroin business, rather than stealing from society, and another large part by prostitution, including male addicts living off prostitutes. But no matter how carefully you slice it, if one tries to estimate the value of property stolen by addicts by assuming that there are 100,000 addicts and estimating what is the minimum amount they would have to steal to support themselves and their habits (after making generous estimates for legal income), one comes up with a number in the neighborhood of $1 billion a year for New York City.
But what happens if you approach the question from the other side? Suppose we ask, "How much property is stolen--by addicts or anyone else?" Addict theft must be less than total theft. What is the value of property stolen in New York City in any year? Somewhat surprisingly to me when I first asked, this turned out to be a difficult question to answer, even approximately. No one had any estimates that they had even the faintest confidence in, and the question doesn't seem to have been much asked. The amount of officially reported theft in New York City is approximately $300 million a year, of which about $100 million is the value of automobile theft (a crime that is rarely committed by addicts). But it is clear that there is a very large volume of crime that is not reported; for example, shoplifting is not normally reported to the police. (Much property loss to thieves is not reported to insurance companies either, and the insurance industry had no good estimate for total theft.)
It turns out, however, that if one is only asking a question like, "Is it possible that addicts stole $1 billion worth of property in New York City last year?" is relatively simple to estimate the amount of property stolen. It is clear that the two biggest components of addict theft are shoplifting and burglary. What _could_ the value of property shoplifted by addicts be? All retail sales in New York City are on the order of $15 billion a year. This includes automobiles, carpets, diamond rings, and other items not usually available to shoplifters. A reasonable number for inventory loss to retail establishments is 2%. This number includes management embezzlers, stealing by clerks, shipping departments, truckers, etc. (Department stores, particularly, have reported a large increase in shoplifting in recent years, but they are among the most vulnerable of retail establishments and not important enough to bring the overall rate much above 2%.) It is generally agreed that substantially more than half of the property missing from retail establishments is taken by employees, the remainder being lost to outside shoplifters. But let us credit shoplifters with stealing 1% of all the property sold at retail in New York City--this would be about $150 million a year.
What about burglary? There are something like two and one- half million households in New York City. Suppose that on the average one out of five of them is robbed or burglarized every year. This takes into account that in some areas burglary is even more commonplace, and that some households are burglarized more than once a year. This would mean 500,000 burglaries a year. The average value of property taken in a burglary might be on the order of $200. In some burglaries, of course, much larger amounts of property are taken, but these higher value burglaries are much rarer, and often are committed by non-addict professional thieves. If we use the number of $200 x 500,000 burglaries, we get $100 million of property stolen from people's homes in a year in New York City.
Obviously, none of these estimated values is either sacred or substantiated. You can make your own estimate. The estimates here have the character that it would be very surprising if they were wrong by a factor of 10, and not very important for the conclusion if they were wrong by a factor of two. (This is a good position for an estimator to be in.)
Obviously not all addict theft is property taken from stores or from people's homes. One of the most feared types of addict crime is property taken from the persons of New Yorkers in muggings and other forms of robbery. We can estimate this, too. Suppose that on the average, one person in 10 has property taken from his person by muggers or robbers each year. That would be 800,000 such robberies, and if the average one produced $100 (which it is very unlikely to do), $8 million a year would be taken in this form of theft.
So we can see that if we credit addicts with _all_ of the shoplifting, _all_ of the theft from homes, and _all_ of the theft from persons, total property stolen by addicts in a year in New York City amounts to some $300 million. You can throw in all the "fudge factors" you want, add all the other miscellaneous crimes that addicts commit, but no matter what you do, it is difficult to find a basis for estimating that addicts steal over half a billion dollars per year, and a quarter billion looks like a better estimate, although perhaps on the high side. After all, there must be some thieves who are not addicts. Thus, I believe we have shown that whereas it is widely assumed that addicts steal from $2 billion to $5 billion a year in New York City, the actual number is _ten_ times smaller, and that this can be demonstrated by five minutes of thought. So what? A quarter billion dollars' worth of property is still a lot of property. It exceeds the amount of money spent annually on addict rehabilitation and other programs to prevent and control addiction. Furthermore, the value of the property stolen by addicts is a small part of the total cost to society of addict theft. A much larger cost is paid in fear, changed neighborhood atmosphere, the cost of precautions, and other echoing and re-echoing reactions to theft and its danger.
One point in this exercise in estimating the value of property stolen by addicts is to shed some light on people's attitudes toward numbers. People feel that there is a lot of addict crime, and that $2 billion is a large number, so they are inclined to believe that there is $2 billion worth of addict theft. But $250 million is a large number, too, and if our sense of perspective were not distorted by daily consciousness of federal expenditures, most people would be quite content to accept $250 million a year as a lot of theft.
Along the same lines, this exercise is another reminder that even responsible officials, responsible newspapers, and responsible research groups pick up and pass on as gospel numbers that have no real basis in fact. We are reminded by this experience that because an estimate has been used widely by a variety of people who should know what they are talking about, one cannot assume that the estimate is even approximately correct.
But there is a much more important implication of the fact that there cannot be nearly so much addict theft as people believe. This implication is that there probably cannot be as many addicts as people believe. Most of the money paid for heroin bought at retail comes from stealing, and most addicts buy at retail. Therefore, the number of addicts is basically--although imprecisely--limited by the amount of theft. (The estimate developed in a Hudson Institute study was that close to half of the volume of heroin consumed is used by people in the heroin distribution system who do not buy at retail, and do not pay with stolen property but with their "services" in the distribution system.) But while the people in the business (at lower levels) consume close to half the heroin, they are only some one-sixth or one-seventh of the total number of addicts. They are the ones who can afford big habits.
The most popular, informal estimate of addicts in New York City is 100,000-plus (usually with an emphasis on the "plus"). The federal register in Washington lists some 30,000 addicts in New York City, and the New York City Department of Health's register of addicts' names lists some 70,000. While all the people on those lists are not still active addicts--many of them are dead or in prison--most people believe that there are many addicts who are not on any list. It is common to regard the estimate of 100,000 addicts in New York City as a very conservative one. Dr. Judianne Densen-Gerber was widely quoted in 1970 for her estimate that there would be over 100,000 teenage addicts by the end of the summer. And there are obviously many addicts of 20 years of age and more.
In discussing the number of addicts in this article, we will be talking about the kind of person one thinks of when the term "addict" is used. A better term might be "street addict." This is a person who normally uses heroin every day. He is the kind of person who looks and acts like the normal picture of an addict. We exclude here the people in the medical profession who are frequent users of heroin or other opiates, or are addicted to them, students who use heroin occasionally, wealthy people who are addicted but do not need to steal and do not frequent the normal addict hangouts, etc. When we are addressing the "addict problem," it is much less important that we include these cases; while they are undoubtedly problems in varying degrees, they are a very different type of problem than that posed by the typical street addict.
The amount of property stolen by addicts suggests that the number of New York City street addicts may be more like 70,000 than 100,000, and almost certainly cannot be anything like the 200,000 number that is sometimes used. Several other simple ways of estimating the number of street addicts lead to a similar conclusion.
Experience with the addict population has led observers to estimate that the average street addict spends a quarter to a third of his time in prison. (Some students of the subject, such as Edward Preble and John J. Casey, Jr., believe the average to be over 40%.) This would imply that at any one time, one-quarter to one-third of the addict population is in prison, and that the total addict population can be estimated by multiplying the number of addicts who are in prison by three or four. Of course the number of addicts who are in prison is not a known quantity (and, in fact, as we have indicated above, not even a very precise concept). However, one can make reasonable estimates of the number of addicts in prison (and for this purpose we can include the addicts in various involuntary treatment centers). This number is approximately 14,000-17,000, which is quite compatible with an estimate of 70,000 total New York City street addicts.
Another way of estimating the total number of street addicts in New York City is to use the demographic information that is available about the addict population. For example, we can be reasonable certain that some 25% of the street addict population in New York City is Puerto Rican, and some 50% are blacks. We know that approximately five out of six street addicts are male, and that 50% of the street addicts are between the ages of 16 and 25. This would mean that 20% of the total number of addicts are black males between the age of 16 and 25. If there were 70,000 addicts, this would mean that 14,000 blacks between the ages of 16 and 25 are addicts. But altogether there are only about 140,000 blacks between the ages of 16 and 25 in the city--perhaps half of them living in poverty areas. This means that if there are 70,000 addicts in the city, one in 10 black youths are addicts, and if there are 100,000 addicts, nearly one in six are, and if there are 200,000 addicts, one in three. You can decide for yourself which of these degrees of penetration of the young black male group is most believable, but it is rather clear that the number of 200,000 addicts is implausible. Similarly, the total of 70,000 street addicts would imply 7,000 young Puerto Rican males are addicted, and the total number of Puerto Rican boys between the ages of 17 and 25 in New York City is about 70,000.
None of the above calculations is meant in any way to downplay the importance of the problem of heroin addiction. Heroin is a terrible curse. When you think of the individual tragedy involved, 70,000 is an awfully large number of addicts. And if you have to work for a living, $250 million is an awful lot of money to have stolen from the citizens of the city to be transferred through the hands of addicts and fences into the pockets of those who import and distribute heroin, and those who take bribes or perform other services for the heroin industry.
The main point of this article may well be to illustrate how far one can go in bounding a problem by taking numbers seriously, seeing what they imply, checking various implications against each other and against general knowledge (such as the number of persons or households in the city). Small efforts in this direction can go a long way to help ordinary people and responsible officials to cope with experts of various kinds.
 Mythical numbers may be more mythical and have more vitality in the area of crime than in most areas. In the early 1950s the Kefauver Committee published a $20 billion estimate for the annual "take" of gambling in the United States. The figure actually was "picked from a hat." One staff member said: "We had no real idea of the money spent. The California Crime Commission said $12 billion. Virgil Petersen of Chicago said $30 billion. We picked $20 billion as the balance of the two."
An unusual example of a mythical number that had a vigorous life--the assertion that 28 Black Panthers had been murdered by police--is given a careful biography by Edward Jay Epstein in the February 13, 1971, _New Yorker_. (It turned out that there were 19 Panthers killed, ten of them by the police, and eight of these in situations where it seems likely that the Panthers took the initiative.)
 A parallel datum was developed in a later study by St. Luke's Hospital of 81 addicts--average age 34. More than one-half of the heroin consumed by these addicts, over a year, had been paid for by the sale of heroin. Incidentally, these 81 addicts had stolen an average of $9,000 worth of property in the previous year.
 Among other recent estimators we may note a Marxist, Sol Yurick, who gives us "500,000 junkies" (_Monthly Review_, December 1970), and William R. Corson, who contends, in the December 1970 _Penthouse_, that "today at least 2,500,000 black Americans are hooked on heroin."
 There is an interesting anomaly about the word "addict." Most people, if pressed for a definition of an "addict," would say he is a person who regularly takes heroin (or some such drug) and who, if he fails to get his regular dose of heroin, will have unpleasant or painful withdrawal symptoms. But this definition would not apply to a large part of what is generally recognized as the "addict population." In fact, it would not apply to most certified addicts. An addict who has been detoxified or who has been imprisoned and kept away from drugs for a week or so would not fit the normal definition of "addict." He no longer has any physical symptoms resulting from not taking heroin. "Donkey" Reilly would certainly fulfill most people's ideas of an addict, but for 30 of the 54 years he was an "addict" he was in prison, and he was certainly not actively addicted to heroin during most of the time he spent in prison, which was more than half of his "addict" career (although a certain amount of drugs are available in prison).
Reprinted with permission from The Public Interest, no. 23, Spring 1971, pp. 3-9. Copyright (c) 1971 by National Affairs, Inc.
-- Edward Tufte
A recent article is Slate by Jack Shafer reports the GAO's Singer-style analysis: http://www.slate.com/id/2147876
Because my daughter is an addict, I realized when I read the article that a major source of the money with which addicts buy drugs was not commented on. Many of the addicts I have met get most of their money for drugs by panhandling.
One of the reasons so many mythical numbers involve crime statistics is that the decision to report a crime and the decision of how to classify it are highly discretionary. For example, child abuse was much less frequently reported fifty years ago. Have we had an avalanche of abuse because more is reported now, or are people more sensitized and aware of their rights, or are social workers (and health care professionals) trained to look for it more readily, or are some kids, after being made aware of the consequences of making the accusations, using the charge to accomplish their own agendas?
Probably a little of several of the above.
-- Rod (email)
Good timing for the Singer piece; a new study does something similar with Traumatic Stress Disorder in Vietnam Vets -
Oh yes, and ET got himself on NPR, as well, for Beautiful Evidence; http://www.npr.org/templates/story/story.php?storyId=5673332&ft=1&f=1007
-- Karl Hartkopf (email)
It would be nice to have a similarly pentetrating analysis of the huge estimates of the losses that software and music publishers claim they suffer as a consequence of illegal copying. Illegal copying is not a good thing, of course, but I suspect that the main people who suffer from it are the people who pay the prices demanded by publishers, and not the publishers themselves. I cannot be sure of it (and that is why I would like to see an analysis done by someone who knows), but I think it is quite plausible to suppose that publishers don't suffer at all, because the circulation of illegal copies generates more than enough publicity to produce at least as many extra sales as there would be with no illegal copying.
I've been suspicious of software prices ever since I made the move from mainframes to small computers about twenty years ago. (Before that it never occurred to me to ask how the software like operating systems and compilers that were built into university mainframe computers was paid for. If I thought about it all I probably thought the price of the software was included in the price of the machine.) At that time I thought that given time and energy I could probably write a word-processor as satisfactory as those then on the market, but that I wouldn't know where to begin if I wanted to write a compiler. Having learnt a bit more since then about Polish notation I realize that writing a compiler might be marginally less difficult than I thought, but still more difficult than writing a word-processor. Yet 20 years ago one could buy a perfectly serviceable compiler for a PC (Turbo Pascal, for example) for a far lower price than one needed to pay for a word-processor. As far as I can see the only explanation is that the prices have nothing to do with profit margins or with the amount of investment that was in the development, but everything to do with what the market will stand.
Re-reading this, I fear that in the second paragraph I have drifted away from the topic, but I hope it will stand up as an illustration of why the analysis asked in the first paragraph would be worth having.
-- Athel Cornish-Bowden (email)
Another case is to be found in vol. I (p. 210 in 3rd edn., 1969) of Kendall and Stuart's classic work The Advanced Theory of Statistics, where they list 24 forecasts of potato yields in England and Wales in 1929-1936. Of the 24 forecasts, one was spot on, but all the others underestimated the actual results. As they commented, the "table exhibits very clearly ... the chronic pessimism of crop forecasts", and they go on to say that "one of the commoner misunderstandings ... is based on the supposition that, though individuals may make mistakes, their errors will cancel out in the aggregate". My impression is that "the chronic pessimism of crop forecasts" is alive and well in 2006, at least in France. Every year we have dire predictions of the catastrophic yields that farmers are going to have, and every year they survive to complain again the following year.
Incidentally, although Kendall and Stuart's book calls itself "advanced", it is a lot more readable than many books that claim to be elementary. At least, volume I is. I never bought volumes II and III, but my recollection from library copies is that they were much heavier going.
-- Athel Cornish-Bowden (email)
On the new Johns Hopkins University study estimating 650,000 Iraqi deaths, far more than the official estimate of 30,000: does the new study over-reach? Or does the official estimate under-reach?
Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey
by Prof. Gilbert Burnham MD
Iraqi Death Count: 650,000? How to read the latest study on mortality in Iraq.
by Daniel Engber
-- John Galada (email)
Looks like "identity theft" via lost hard drives doesn't happen. See the excellent debunking story by Fred H. Cate in the Washington Post.
-- Edward Tufte
Two interesting estimates, each probably 2 to 3 orders of magnitude on the high side:
The stoned story in the Los Angeles Times contains the inevitable comment that the prankish estimate is "conservative." Time to read Singer's article (above) again.
-- Edward Tufte
Here's the general principle of "conservative estimates:" If an estimate is described as conservative, it's not. "Conservative" is often a cheerleading, self-congratulatory, and mushy word that replaces quantitative estimates of error.
Recall the Columbia shuttle slide: "Review of Test Data Indicates Conservatism for Tile Penetration" in our thread PowerPoint does rocket science--and better techniques for technical reports.
-- Edward Tufte
-- Edward Tufte
Reasoning about evidence
For good, practical advice on effective analytical thinking, see here
-- Edward Tufte
Error in "The Vitality of Mythical Numbers"
I'm surprised that no-one has noted that 800,000 robberies at an average of $100 per robbery yields $80 million, not $8 million. Not that it invalidates the conclusions.
-- George V. Reilly (email)
Great stuff. I will be using the Singer paper in a class I teach on policy analysis. I also use writings on how to measure hunger and debates over how many "defensive gun uses" there are per year. Both are useful for discussions about definitions (e.g., who says your use of your gun truly was "defensive" as oppposed to completely unnecessary).
The kind of reasoning Singer does is what many people call Fermi Problems. They can be fun to go over in class, too. E.g., how many piano tuners work in Chicago.
-- Doug (email)
Enrico Fermi not only estimated the number of piano tuners in Chicago, one of his most famous estimates was the one he made during the first atom bomb test on 16 July, 1945. There was an important question in the minds of the bomb makers on the yield of this new class of weapon. During the test Fermi estimated that it was about 10 kilotons.
Fermi didn't guess - as the shockwave from the explosion hit Fermi he threw a handful of paper scraps into the air and watched how far they moved. Using this data and some assumptions he made his estimate. It was surprisingly accurate. Not only to the correct order of magnitude, but within a very respectable factor of 2. The actual yield was 19 kilotons.
Fermi used the "piano tuner" approach to train his students to be able to conceptualise and evaluate "order of magnitude" estimates.
For a very recent Fermi estimate - of the energy released (and volume and mass of sand ejected) during the eruption of the Puyehue-Cordon Caulle volcano in Chile on 4 July see here = http://arxiv.org/abs/1109.1165.
Fermi Problem: Power developed at the eruption of the Puyehue-Cordon Caulle volcanic system in June 2011
By Hernan Asorey & Arturo Lopez Davalos
Abstract of the paper reads;
On June 4 2011 the Puyehue-Cordon Caulle volcanic system produced a pyroclastic subplinian eruption reaching level 3 in the volcanic explosivity index. The first stage of the eruption released sand and ashes that affected small towns and cities in the surrounding areas, including San Carlos de Bariloche, in Argentina, one of the largest cities in the North Patagonian Andean region. By treating the eruption as a Fermi problem, we estimated the volume and mass of sand ejected as well as the energy and power released during the eruptive phase. We then put the results in context by comparing the obtained values with everyday quantities, like the load of a cargo truck or the electric power produced in Argentina. These calculations have been done as a pedagogic exercise, and after evaluation of the hypothesis was done in the classroom, the calculations have been performed by the students. These are students of the first physics course at the Physics and Chemistry Teacher Programs of the Universidad Nacional de Rio Negro
-- Matt R (email)