ET textbook, Data Analysis for Politics and Policy: PDF files now available

I'm interested in your other book, Data Analysis for Politics and Policy (or is it Policy and Politics?) Is there a way to order a copy of it? I can't find it for sale on the site, though it is listed on the home page. I already have the other books.

-- Florence R. Webb (email)

Note added August 2013:

Data Analysis for Politics and Policy is now available as an .pdf ebook here.

Here is an a review of the book:

My book Data Analysis for Politics and Policy was published by Prentice-Hall in 1974; after 15 printings, it is still in print, barely. It is available at amazon.com. There was a fairly detailed review of the book around 1975 or 1976 in the Journal of the American Statistical Association; the book was widely used as a college text.

The 179-page book is for courses in applied statistics, particularly for policy making and the social sciences. It deals with making causal inferences from statistical evidence, research designs, predictions and projections, linear and multiple regression. All the examples are real, involving serious questions (no regressions of height on weight that are found in some statistics texts). The technical material is at the level of one or two classes in college math. There is one somewhat more technical part, on logarithmic scale transformations and their interpretation in regression and in graphics. The book was written very much under the influence of Frederick Mosteller, John Tukey, and my professor of statistics at Stanford, Lincoln Moses.

I used the book for years at Princeton and Yale in teaching both undergraduate and graduate classes in Evidence for Policy, Basic Statistics, Introduction to Social Science Methods, and the like.

"Data Analysis" means that the book is about how to examine data to reach sensible conclusions, figure out causality, and make decisions; there is virtually nothing in the book about probability models and significance testing, issues that are studied in "Statistics" courses.

The style of the book is somewhat like the chapter on the cholera epidemic and the space shuttle Challenger in my Visual Explanations, except there are equations and lots of regression analysis in Data Analysis for Politics and Policy. I have not revised the book since it was published in 1974; some examples are antique but the ideas in the book are surely timeless.

E. T.

-- Edward Tufte

I got here looking to see if I could order Data Analysis for Politics and Policy for a 200 seat course in Pol Sc research methods. I'm hoping there may be enough others to stimulate some supply.

-- Ken Hart (email)

Response to ET book: Data Analysis for Politics and Policy

The idea of posting Data Analysis for Politics and Policy as a download is a great one (even though I own my own copy).

I'd recommend PDF as the download format for several reasons; the author retains typographic control over the document, it is easily accessed on many systems because Windows, OS X & Linux systems all have PDF readers. (e.g. xpdf, Adobe Reader), it is reasonably compact especially if you use pdf generation from something like Scribus, PdfTeX or InDesign, rather than using images of the original pages, and finally most browsers know how to handle pdf (download if no plugin is present).

Other options include html, which is truly a universal format but control over the document appearance is difficult, and dejavu, a little known format that offers the smallest download size along with freely available viewers but has a more complex document creation process.

-- John Walker (email)

So perhaps a pdf version is a good shortrun solution until there's a physical book.

My own little problem with pdf files is that I have a hard time closely reading long pdf documents. They're easy to scan but it is difficult to sustain a close reading over many pages compared to a physical book or a long journal article. As a result, I habitually print out long technical papers (or long emails for that matter) that require sustained careful readings.

Many auction catalogs are now sent by pdf files, which work well since the main analytical task is scanning over many items to find a few to read closely. But a textbook?

This effect is a bit odd, for I look closely at many different pages while surfing the internet, but there's something about the need for sustained attention to one long piece that differs from looking at many short different pieces. Maybe I'll get used to it.

-- Edward Tufte

Response to ET book: Data Analysis for Politics and Policy

It would be valuable to have the document formatted and available online in DOCBOOK format, which allows publication in HTML and PDF from a single source, and use Mathml for the various equations I assume are intrinsic to the subject - the two can be used together.

-- Billy Harvey (email)

Response to ET book: Data Analysis for Politics and Policy

I think that the idea of putting the book into a output independent format is good in theory but the workflow from SGML or Docbook to a print format (e.g. pdf) or a web format (html) are fairly complicated and probably overkill for a single book. If you are a technical publishing company or have a lot of documents to produce that are very similar, then the effort may be justified, but for a single project I would argue that, even with free tools, the time required to type in the source and the style sheets would be too much. I've always felt that many of the newer markup languages require far too much typing (e.g. <para> and </para> compared to a blank line (1 keystroke) above and below paragraphs in LaTeX).

A similar end result can be obtained by using TeX/LaTeX/ConTeXt to produce a source file that can generate a print-ready version (pdf) and a web ready version (latextohtml). The effort required to markup the document would be far less and the equations would be easily generated. The price is that with SGML etc. you have a complete logical markup of the document that can then be post processed with a style sheet to produce any output you desire. With TeX et al you are using macros to superimpose <some> logical markup on a typesetting language and consequently you are limited in how easily you can change the format for various outputs without changing the source file.

On the topic of problems with PDF's. As a scientist I read journal articles everyday and nearly all of these are provided as PDF's. While I read some online, it's usually to determine if I'm interested in the subject, most of them I print out for reading (to save paper I keep a file of reprints and search that first before I hit the print button). I expect that most people, when offered a PDF of Data Analysis for Politics and Policy, will click on the print button rather than try to read it online.

PDF is not a good online format and was not meant to be. It was designed to be a print-ready format to replace Postscript. The original idea was that the application would generate Postscript output that would be then converted to the simpler PDF. The PDF file could be sent to a print shop for printing. The recent extensions to PDF (e.g. FDF for forms) to make it a viable online format are based on visual formatting of a document, that is entries/changes are marked by page location and not by the type of information they contain. This is a bad idea for web-based applications because it has no concept of what information is contained within the document. A better online format would use a markup language to indicate what information is in the document and then use a style sheet and browser to render it. This brings us back to the advantages of SGML and docbook especially for complex documents such as grants, forms etc. where information is extracted from the document to be placed in databases. They are often overkill for a single book that is meant to be read in a linear fashion.

-- John Walker (email)

Response to ET book: Data Analysis for Politics and Policy

I think no.1 is the best option. The scans can be put in a pdf for download. It also prevents the duplication of effort that would occur for resetting the first edition and then typesetting the second.

-- John Walker (email)

Response to ET book: Data Analysis for Politics and Policy

This pdf is part of the rather large publication on the magnetic properties of steels that I've been discussing off and on in the Ask E.T. forum for the past several years. I just recently completed it (it's on its way back from the CD-ROM replicator as I write); the pdf document contains fully 1721 pages of charts, tables, a lot of text and some scanned documents of product information from different steel producers.

The attachment highlights some of the issues I came up against that may be of concern to you if you decide to proceed with a scanned reissue of your Data Analysis for Politics and Policy.

Clarity of type and images was of great importance as I prepared this document. I scanned the Carpenter pages with a flatbed scanner at a moderately high level of resolution, 600 dpi, to tiff and then did some very minor corrections to straighten and sharpen the image and to brighten the page. As you know, anything you do to a scanned image will tend to degrade the clarity of the image, so I tried to get the scans straight to begin with and to keep any modifications to a bare minimum. I then distilled the tiff image to pdf using no compression at all and a 4000 dpi resolution before placing it into the larger document (I used PageMaker for all the page layout work). I found that this intermediate pdf step provided for a clearer document image than placing the tiff image directly; I don't know why but it must have something to do with the postscript translation that pdf uses.

This method proved to be the best I could come up with, and provided for rather clear type and charts in the scanned document even at relatively high zoom resolutions. I expect that readers of my publication will be zooming-in to read the text or view the charts and therefore tried to keep the pixelation at high zoom rates to an absolute minimum. This may be something to consider, especially as readers of Data Analysis... zoom in to read equations, check subscripts or other small type, or examine other images.

The type I set directly, the disclaimer and sidebar in this example, is always clear at any zoom resolution, as were the charts and tables I created myself in other portions of the project. Acrobat Distiller handles directly set type with precision. The pages I've placed here are most likely smaller than the final page size of Data Analysis...; you may have less difficulty with type clarity if you proceed with a scanned reissue at a larger size.

The main drawback to all this is the resultant file size. Even after the distillation to the final pdf document using a 1200 dpi resolution, these 5 pages take up about 2.5 megabytes. Any attempt to reduce the file size resulted in a degradation to the image that I found unacceptable. For my purposes, file size wasn't a problem as the final publication is on CD-ROM (the entire publication takes up about 220 megabytes). You, though, may find this be to an issue of real importance, especially if you are going to distribute Data Analysis... over the web. I imagine that a page of just type, directly set and distilled into pdf, will require maybe 10 or 15 kilobytes, maybe even less.

I guess this comes down to the same old digital thing: resolution, or clarity, versus file size. I always lean toward clarity of image, which would require a complete resetting of the pages, and I guess I'd say that's how I'd go. If the demands on your time are the deciding factor, scanning the pages will, I think, require some compromises.

-- Steve Sprague (email)

Response to ET book: Data Analysis for Politics and Policy

I should have included in the last paragraph immediately above: given the superb type handling capabilities of Distiller, I'd go with a resetting of the document and then distilling into pdf as this method will provide for full type clarity and also substantially reduce the file size of the finished document.

-- Steve Sprague (email)

Scanning books for the web

For future projects, take a look at DjVu which provides high-quality images of pages at smaller sizes than TIFF or PDF file formats.

Both an open-source version and a commercial version are available. The commerical site distributes free viewers (similar to Acrobat) as browser plug-ins.

-- Prem Thomas (email)

Good morning,

Two weeks ago I bought two of your books (The visual displaying of quantitative information, II edition and Envisioning information). In the first one there was a booklet announcing your other books, as well as the page to download the PDF files of Data Analysis for Politics and Policy. However, I have tried to download it today but the link (http://www.edwardtufte.com/tufte/dapp/) does not work. A page with a title NOT FOUND is displayed instead of the page with the PDF.

I am really interested in order to go on with my work http://visualinfo.es (I will completely understand if you remove this last paragraph from the published message if you do not want it in your website. I am not trying to announce my project, but I like to explain you why your work is so important for me).

Thank you for leading and guiding other people like me with your work!

Response: the entire book is now available as an eBook for $2. See our order page

-- Maria Pascual (email)

