All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Beautiful EvidencePaper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
All 4 clothbound books, autographed by the author $150
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $2
The Cognitive Style of Powerpoint $2
Seeing Around + Feynman Diagrams $2
Data Analysis for Politics and Policy $2catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
Houston TX, January 29
Austin TX, January 31
Dallas TX, February 2
I wonder if anyone can suggest good sources of basic information on how to calculate meaningful statistics. Let me explain:
I work in human-resources data analysis. I don't have a particularly strong statistics background, and I generally don't need one -- we mostly report on counts, sums, averages, and percentages. However, I occasionally run into trouble selecting meaningful numerators and denominators. For example, in calculating the percentage of employees who attended classes last year, should the denominator be the average number of employees over the year, the total number of employees at any point in the year, or some other figure?
I see lots of books explaining the math behind statistics, and of course this forum provides copious sources on information design, but I'm not sure where to look for this sort of practical information. I'd appreciate any suggestions.
-- Joe Levy (email)
The classic book is Hans Zeisel, Say It With Figures, which went through many editions. It may not be in print now, but used copies abound.
A substantial part of "meaningful" comes from the substance of the problem itself, so watch what skilled analysts do. Build up a collection of reports in your field from a diversity of sources and see what good practices look like.
Also textbooks in your field may contain material relevant to your question.
-- Edward Tufte
I concur with Dr. Tufte, the answer lies in the question. To take your example regarding percentage of employes who attended class last year, the answer would depend on the context of the question. If the broader question is "How successful were classes in attracting employees last year?" Then using the ratio of the number of attendess to employees at the time each class was held and averaging over the number of classes would be appropriate.
If the aim was to determine if the percentage of employees recieving training conformed to some level then an appropriate measure would be the total number trained in a year divided by the average number of employees for the year.
Useful figures come from careful questions.
For some excellent thinking about statistics try getting a copy of R.A. Fishers "Statistical methods for research workers" from your local library. While the book is dated and the writing style somewhat stiff, the introductory chapter is excellent in describing "the qualifications of satisfactory statistics"
-- John Walker (email)
Instead of just a single number that answers a single question, it will often be useful to show several standardizations (denominators in this case) answering several questions. Make it clear what each number means. Zeisel is particularly helpful on this matter.
Also take a look at A. Bradford Hill's classic on causality, posted on this board on the NEW page at https://www.edwardtufte.com/tufte/hill
So your reading assignment is Ziesel, Hill, and Fisher! And, of course, other reports using these types of data.
-- Edward Tufte
Mr. Levy mentions several indicators of interest to his organization. If these are the right indicators, and "right" changes with circumstances, what is important is how they change over time. I suspect he also encounters managers who like to compare this month to last month, or this month this year to this month last year, rather than examining all the data points. I have learned much about statistical calculations useful to business and other organizations from the works of Dr. Donald J. Wheeler. I am not a statistician nor even mathematically inclined, but I do need to understand how process data change and whether the change is due to special causes or to variation inherent in the process. The fundamental question is "How are we doing?"
For Mr. Levy, I would suggest Wheeler's "Understanding Variation" first, followed by "Making Sense of Data". I wish I could walk down the hall and hand him my copies! But he will have to go to www.spcpress.com and check out the goods. Lots of free articles are available so you can get the feel of Dr. Wheeler's style before buying a book. I also recommend a paper by Davis Balestracci called "Data Sanity - Statistical Thinking Applied to Everyday Data", initially published and sold by the Statistics Division of the American Society for Quality but downloadable now at the Deming Electronic Network web page.
I would like to recommend a number of other books related to process measurement and quality improvement that Mr. Levy might find illuminating if he has not encountered them already, but will do so directly via email if he (or anyone else) is interested (firstname.lastname@example.org).
-- Steven Byers (email)
These six principles of a well constructed rate might be helpful. They represent an ideal, and trade-offs need to be made. Some of the text reflects my work at a state education agency.
1. Includes in the denominator only those items that can show up in the numerator.
2. Includes in the numerator only those items that are also in the denominator.
3. It is simple. It can be easily explained to the public, legislators, and board members (has face validity). You can explain why this rate differs from another one.
4. It is technically sound. It has the support of researchers and statisticians. It accounts for sources of bias (unusual conditions that skew the rate if not accounted for).
5. It is valid in the eyes of those for whom the rate is produced. It is accepted by them as reflecting what they do (the items included in the numerator and denominator represent what the rate is said to measure).
6. It can be aggregated to higher levels of organization in a way that makes sense.
7. It is neutral in its effect. It measures an event fairly and does not have a subjective value judgment built in.
-- Bob Jones (email)
Another way to think about "framing the right question" is to consider Dr. Deming's (in Out of the Crisis and elsewere) admonition about operational definitions. "An operational definition is one that reasonable men can agree on. An operational definition is one that people can do business with. An operational definition of safe, round, reliable, or any other quality must be communicable, with the same meaning to vendor as to purchaser, same meaning yesterday and today to the production worker." Further, there is "No exact value; no true value." That is, every measurement process is subject to variation.
-- Steven Byers (email)
I am not sure about meaningful statistics, but I did see a good meaningless statistic the other night (ironically on the BBC's Test the Nation), and I quote, "1 in 50 people have an IQ in the top 2%..."
-- Adam Poole (email)