Information Design in Surveys
In the February 2002 issue of the Harvard Business Review there is an excellent aricle on the design of workplace surveys:
“Getting the Truth into Workplace Surveys,” by Palmer Morrel-Samuels.
The article is not available on-line, but you can look at the absract and order a copy here.
I heartily recommend it to anyone involved in the creation of surveys. Here is a brief excerpt to show you what it is like:
Guideline 4: Keep sections of the survey unlabeled and uninterrupted by page breaks.
Boxes, topic labels, and other innocuous-looking details on surveys can skew responses subtly and even substantially. The reason is relatively straightforward: As extensive research shows, respondents tend to respond similarly to questions they think relate to each other. Several years ago, we were asked to revise an employee quesionnaire for a large parcel-delivery service based in Europe. The survey contained approximately 120 questions divided into 25 sections, with each section having its own label (“benefits,” “communication,” and so on) and set off in its own box. When we looked at the results, we spotted some unlikely correlations between average scores for certain sections and corresponding performance measures. For example, teamwork seemed to be negatively correlated with on-time delivery.
A statistical test revealed the source of the problem. Quesions in some sections spanned two pages and therefore appeared in two separate boxes. Consequently, respondents treated the material in each box as if it addressed a separate topic. We solved the problem by simply removing the boxes, labels, and page breaks that interrupted some secions. The changes in formatting encouraged respondents to consider each question on its own merits; although the changes were subtle, they had a profound impact on the survey results.
When I consulted for the New York Times/CBS News Poll back in 1980, we had a problem with the exit poll questionnaire. Voters completed the frontside of the sheet, but quite a number did not turn the sheet over and answer the questions on the backside. Indeed “backside” became a variable for analysis, with possibilities of extrapolating backside answers from knowledge of a respondent’s frontside answers. But my consulting colleague, Mike Kagay, had a better idea, which was to put a clear strong arrow on the bottom of the frontside indicating that there were more questions on the backside of the sheet!
There are some more ideas on this topic in two other threads here: see especially “graphic design in data input” and, of course, “butterfly ballot.” Here in the data input thread:
Some of the best work on data input has been done by Patricia Wright of the Applied Psychology Unit, Medical Research Council, Cambridge UK and by Jeremy Wyatt http://www.ihs.ox.ac.uk/csm/jwpub.html .
Here are some of my opinions:
Think hard about minimizing the information you elicit; users are more likely to abandon a long, snoopy, instrusive set of questions. For example, don’t turn your questionnaire into a gratuitous set of probes for market research on your innocent respondents.
Think hard about protecting the integrity and privacy of the information you elicit; why should the user trust you at all?
How are you going to minimize entry errors? Discover the types of entry errors that are made, and then redesign to fix them. Regard all entry errors as your fault (even if they aren’t) and design to fix them.
For a good model of transactions-based questions, order a book from amazon.com and watch how they nicely navigate you through a quite long series of steps.
On design, find something that works and is already successful–and see what they do. No need to get it original, just get it right. Surely, in practice at least, this is a solved problem. Find a good proven solution and use it.
The instructions, questions, and user responses are the important matters here; minimize everything else, especially gratuitous heavy-handed design structure (frames, boxes, highlighting). For example, very light but clear boxes or fields for data entry.
Avoid an over-produced, designed, slick look; the questionnaire form is a workaday straight-forward document. Your design model should be exactly that: a workaday, straight-forward document.
Allow for review, checking, and confirmation of answers by users before they commit (although this is probably a more complicated matter than a simple rule can handle).
Error messages to the user should not be rude, abrupt, or perpetuate the confusion. That is, they should probably not be written by computer programmers.
See Envisioning Information, chapter 3 on layering and separation, particularly the material on de-gridding. Also in the Ask E.T. section here, see, of course, the discussion of the butterfly ballot in Florida in November 2000.
For the analysis of survey results I can recommend Stream
Analysis. “stream analysis porras” (Jerry I Porras) on Google
will pull up the references.
Not quite sure that this is the right thread to put this on.
I am working on a project that has required reference to data from the UK census of 2001.
The data collection process apparently missed up to a million people, and – in addition –
blanks were left in mandatory fields on many of the census forms that were returned.
The missing million has been recreated by generating fictional records and the missing
fields have been completed by reference to patterns in the surrounding areas.
Having myself in the past witnessed the large-scale “cleansing” of national health
statistics, I have a concern that the availability of modern technology is making it easier
and easier for large volumes of raw data to be manipulated before publication in order to
eliminate anomalies that might discredit the validity of the data or the competence of
those collecting it.
My own view is that anomalies in raw data are a vital indicator of underlying issues that
the design of the collection process has not addressed. Is anyone aware of any work that
has been done in this field?
Martin,
This may or may not be what you are after. Try looking for information on Benford’s law, Zipf’s law or work by Mark Nigrini. A good starting link is http://mathworld.wolfram.com/BenfordsLaw.html There is a good set of references at the bottom.
Basically it states that data derived from social sources has a skewed first digit probability. This can then be used to detect randomly generated data in things such as tax returns.
While studying market research one very good resource I came upon were papers written
by Don Dillman at the Washington State University. Many of them look at the online surveys
and how the design and structure of the questions influence the responses and the number
of responses.
Some of the titles include “The Influence of Visual Layout on Scalar
Questions in Web Surveys. Unpublished Master’s Thesis” and “Connections Between Optical
Features and Respondent Friendly Design: Cognitive Interview Comparisons of the Census
2000 Form and New Possibilities”.
All of his papers can be found here.
I’ve been working on visualizing the results of a nationwide survey for client presentations.
In the survey itself, respondents were asked to answer questions on a scale of 1 to 5 or 1 to 7 in order to avoid “survey fatigue.” So, in presenting results, we must decide whether to normalize all scales to 10 (at the sacrifice of complete statistical accuracy), or force clients to figure out what a 2.7 of 7 means. My gut says normalize, but the statisticians are quite convincing.
Any good resources on when or when not to normalize?