All 5 books, Edward Tufte paperback $180
All 5 clothbound books, autographed by ET $280
Visual Display of Quantitative Information
Envisioning Information
Visual Explanations
Beautiful Evidence
Seeing With Fresh Eyes
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $5
The Cognitive Style of Powerpoint $5
Seeing Around + Feynman Diagrams $5
Data Analysis for Politics and Policy $9
catalog + shopping cart
New ET Book
Seeing with Fresh Eyes:
Meaning, Space, Data, Truth
catalog + shopping cart
Analyzing/Presenting Data/Information
All 5 books + 4-hour ET online video course, keyed to the 5 books.
Information Design in Surveys

In the February 2002 issue of the Harvard Business Review there is an excellent aricle on the design of workplace surveys:

"Getting the Truth into Workplace Surveys," by Palmer Morrel-Samuels.

The article is not available on-line, but you can look at the absract and order a copy here.

I heartily recommend it to anyone involved in the creation of surveys. Here is a brief excerpt to show you what it is like:

Guideline 4: Keep sections of the survey unlabeled and uninterrupted by page breaks.

Boxes, topic labels, and other innocuous-looking details on surveys can skew responses subtly and even substantially. The reason is relatively straightforward: As extensive research shows, respondents tend to respond similarly to questions they think relate to each other. Several years ago, we were asked to revise an employee quesionnaire for a large parcel-delivery service based in Europe. The survey contained approximately 120 questions divided into 25 sections, with each section having its own label ("benefits," "communication," and so on) and set off in its own box. When we looked at the results, we spotted some unlikely correlations between average scores for certain sections and corresponding performance measures. For example, teamwork seemed to be negatively correlated with on-time delivery.

A statistical test revealed the source of the problem. Quesions in some sections spanned two pages and therefore appeared in two separate boxes. Consequently, respondents treated the material in each box as if it addressed a separate topic. We solved the problem by simply removing the boxes, labels, and page breaks that interrupted some secions. The changes in formatting encouraged respondents to consider each question on its own merits; although the changes were subtle, they had a profound impact on the survey results.

-- David Person (email)

When I consulted for the New York Times/CBS News Poll back in 1980, we had a problem with the exit poll questionnaire. Voters completed the frontside of the sheet, but quite a number did not turn the sheet over and answer the questions on the backside. Indeed "backside" became a variable for analysis, with possibilities of extrapolating backside answers from knowledge of a respondent's frontside answers. But my consulting colleague, Mike Kagay, had a better idea, which was to put a clear strong arrow on the bottom of the frontside indicating that there were more questions on the backside of the sheet!

There are some more ideas on this topic in two other threads here: see especially "graphic design in data input" and, of course, "butterfly ballot." Here in the data input thread:

Some of the best work on data input has been done by Patricia Wright of the Applied Psychology Unit, Medical Research Council, Cambridge UK and by Jeremy Wyatt .

Here are some of my opinions:

Think hard about minimizing the information you elicit; users are more likely to abandon a long, snoopy, instrusive set of questions. For example, don't turn your questionnaire into a gratuitous set of probes for market research on your innocent respondents.

Think hard about protecting the integrity and privacy of the information you elicit; why should the user trust you at all?

How are you going to minimize entry errors? Discover the types of entry errors that are made, and then redesign to fix them. Regard all entry errors as your fault (even if they aren't) and design to fix them.

For a good model of transactions-based questions, order a book from and watch how they nicely navigate you through a quite long series of steps.

On design, find something that works and is already successful--and see what they do. No need to get it original, just get it right. Surely, in practice at least, this is a solved problem. Find a good proven solution and use it.

The instructions, questions, and user responses are the important matters here; minimize everything else, especially gratuitous heavy-handed design structure (frames, boxes, highlighting). For example, very light but clear boxes or fields for data entry.

Avoid an over-produced, designed, slick look; the questionnaire form is a workaday straight-forward document. Your design model should be exactly that: a workaday, straight-forward document.

Allow for review, checking, and confirmation of answers by users before they commit (although this is probably a more complicated matter than a simple rule can handle).

Error messages to the user should not be rude, abrupt, or perpetuate the confusion. That is, they should probably not be written by computer programmers.

See Envisioning Information, chapter 3 on layering and separation, particularly the material on de-gridding. Also in the Ask E.T. section here, see, of course, the discussion of the butterfly ballot in Florida in November 2000.

-- Edward Tufte

For the analysis of survey results I can recommend Stream Analysis. "stream analysis porras" (Jerry I Porras) on Google will pull up the references.

-- Martin Ternouth (email)

Not quite sure that this is the right thread to put this on.

I am working on a project that has required reference to data from the UK census of 2001. The data collection process apparently missed up to a million people, and - in addition - blanks were left in mandatory fields on many of the census forms that were returned.

The missing million has been recreated by generating fictional records and the missing fields have been completed by reference to patterns in the surrounding areas.

Having myself in the past witnessed the large-scale "cleansing" of national health statistics, I have a concern that the availability of modern technology is making it easier and easier for large volumes of raw data to be manipulated before publication in order to eliminate anomalies that might discredit the validity of the data or the competence of those collecting it.

My own view is that anomalies in raw data are a vital indicator of underlying issues that the design of the collection process has not addressed. Is anyone aware of any work that has been done in this field?

-- Martin Ternouth (email)


This may or may not be what you are after. Try looking for information on Benford's law, Zipf's law or work by Mark Nigrini. A good starting link is There is a good set of references at the bottom.

Basically it states that data derived from social sources has a skewed first digit probability. This can then be used to detect randomly generated data in things such as tax returns.

-- Andrew Nicholls (email)

While studying market research one very good resource I came upon were papers written by Don Dillman at the Washington State University. Many of them look at the online surveys and how the design and structure of the questions influence the responses and the number of responses.

Some of the titles include "The Influence of Visual Layout on Scalar Questions in Web Surveys. Unpublished Master's Thesis" and "Connections Between Optical Features and Respondent Friendly Design: Cognitive Interview Comparisons of the Census 2000 Form and New Possibilities".

All of his papers can be found here.

-- Ken Beegle (email)

normalizing survey results: user-centered design or lies/damn lies/statistics?

I've been working on visualizing the results of a nationwide survey for client presentations.

In the survey itself, respondents were asked to answer questions on a scale of 1 to 5 or 1 to 7 in order to avoid "survey fatigue." So, in presenting results, we must decide whether to normalize all scales to 10 (at the sacrifice of complete statistical accuracy), or force clients to figure out what a 2.7 of 7 means. My gut says normalize, but the statisticians are quite convincing.

Any good resources on when or when not to normalize?

-- Jessica Gatto (email)

Threads relevant to analytic design:

Seeing Around: New ET essay published

Privacy Policy