Excel’s statistical graphics
March 21, 2009 | Edward Tufte
18 Comment(s)
From a Professor of Physics:
http://scienceblogs.com/principles/2009/03/why_does_excel_suck_so_much.php#more
Topics: E.T.
From a Professor of Physics:
http://scienceblogs.com/principles/2009/03/why_does_excel_suck_so_much.php#more
The comments in the linked post are spot on. I love Excel as a business tool, and have been using it for nearly 20 years. However, the chart defaults are dreadful (which means most business charts I see are equally dreadful), and the upgrade in Excel 2007 was a terrible disappointment, because Microsoft went for bling instead of clarity. Their design managers have obviously never read your books!
Excel isn’t the only Office application that is ‘sub-standard’. A lot of people take issue with Word’s typography:
http://robgoodlatte.com/2007/07/24/3-examples-of-bad-microsoft-word-typography/
http://oestrem.com/thingstwice/2007/05/latex-vs-word-vs-writer/
http://nitens.org/taraborelli/latex
Microsoft has a lot of resources, with many very smart and passionate people working for them, and yet for some
reason many of its products seem to be sub-par (to put it lightly).
I have “fond” memories of spending too much time on the formatting of graphs and reports for the various lab courses
I took in undergrad. Rather than studying the data and thinking about the significance of the experiment I had
conducted that day, my time was consumed with removing chartjunk from the Excel-generated graphs. All of my lab
manuals had an example of what was expected (something similar to the “improved” graph on the scienceblog site) but
no instructions on how to get there. Looking back over those lab reports I can see that I would frequently receive
deductions for incorrect format rather than wrongful conclusions. Instead of revealing and engaging the scientific
principles of the experiment, my lab reports taught me to conform to bureaucracy.
Maybe ET’s “don’t do lowest common denominator design” can be expanded to “don’t force your students to use lowest
common denominator software.”
Actually, there is at least one program manager for Excel that has read Edward Tufte’s books, as stated in this relevant post (second sentence):
http://blogs.msdn.com/excel/archive/2009/02/24/how-to-create-a-professional-chart-using-excel-2007.aspx
It is a pity that Excel has such poor charting capabilities. Because of its widespread use it could educate people by example on the principles of good chart design.
However, the solution in stated elsewhere in the Ask ET forum: use proper tools (i.e. statistical packages) for the job at hand.
OpenOffice is touted as the replacement for Microsoft Office. Unfortunately, when one goes looking around the internet
for reviews of OpenOffice, one tends to run across some naysayers. The backstory is that Sun has been a relatively poor
custodian of OpenOffice. So why do geeks still like OpenOffice? Are they just less informed wanna-be geeks? Turns out
many Linux distros and the NeoOffice
project for Mac derive their source code from the Go-OO.org fork of
OpenOffice, whose politics the hackers have found more acceptable.
That said, people still use the generic term OpenOffice, so I’ll stick with that. I’ve found OpenOffice is generally
better than Excel by default, and the depth of howto available on the Internet is much better.¿¿Want to construct
regression models and the functions aren’t built in? Somebody has a step-by-step tutorial for it. A classic is linear
regression, which can be rather frustrating in Excel (I found it easier to work out a real-world sink-or-save-the-ship
problem on my HP 48G calculator, circa 1992). I started using OpenOffice specifically because the community provides
more help for users trying to do more complex things.
Niels, how does OpenOffice compare with MSOffice as a graph application? I’m ready to believe that the spreadsheet, as a spreadsheet (i.e. an arrangement of cells containing formulae) is comparable.
Microsoft have now launched an official blog for the Excel 2010 product. While details are still sparse, there is a contact page available. While I am unfamiliar with the Microsoft development process, would it perhaps be worthwhile for contributors to this forum to lobby for improvements to the Excel graph displays so that the many workarounds and fixes that I and others have had to develop become part of the core charting capability instead?
Will, I feel a little bit burned by the experience of the development of Excel 2007’s rewritten graph application. The blog that was set up for the public discussion of ideas was hosted by someone who swore he’d read and absorbed Tufte’s work, but the final result was insultingly at odds with the boasts.
For example, Microsoft yet again failed to include the labelling of scatter points with the contents of a range that is neither the category, nor the value, nor the series. This is a function that Lotus 123 had in the 1980s, for goodness’ sake, and yet in the 2010s we will still be using third party VBA apps if we want to do it with a Microsoft product.
Excel 2007’s “new” graph application contained no new statistical features, such as box and whisker graphs, but many new art effects, such as metallic look surfaces. It was very clear that the call for feedback had been nothing but PR, with no intention of responding to it with action.
It was interesting in a different thread to read Edward Tufte’s endorsement for the Origin graphing/statistics package. So while this thread seems to be complaining about Excel’s graphing capabilities I thought I’d offer another alternative that I’ve been using.
The package is GNUPLOT and is available for free on a variety of platforms. It seems mainly to solve the Excel problem listed above because it’s a lot harder to use than Excel is, specifically you can’t accidentally choose an option that looks right when in fact it is wrong (the professor’s lament about his students graphing current and magnetic field strength).
But despite its complications there are a lot of demonstration files supplied and I find it much easier to do useful work fast by quickly scanning through the demos to find one that’s close to what I want to do and then to examine the plot commands used in that demo.
Here’s a collection of plots made using gnuplot.
For graphs of Excel data, an alternative I’m using more and more is MATPLOTLIB [
http://matplotlib.sourceforge.net/ ]
See [ http://matplotlib.sourceforge.net/gallery.html ] for their gallery. This Python
library is open source and easy to use with a basic understanding of Python. It is well
supported, and provides good documentation. Please note that MATPLOTLIB ships with an Excel
toolkit to easily read/write Excel files [
http://matplotlib.sourceforge.net/users/toolkits.html ]
I’m surprised that no-one has yet mentioned R (http://cran.r-project.org/). This is rapidly becoming the de facto statistics package for professional data analysts in academia and Industry and there are now some great books on how to use it for statistics and for professional graphic output e.g. (http://tinyurl.com/nrfvr3).
For those who have all their data already in Excel there is this (http://www.statconn.com/).
To follow on from my last item on the R language, The New York Times recently highlighted the spread of R as the package of choice for academics and commercial users. And a related NYT blog item here.
One quote is interesting – “Intel Capital has placed the number of R users at 1 million”.
For quick plotting, and curve fitting, I use gnuplot. For plots that go in papers I use Grace. Both are free.
I don’t like Origin, too much overhead and too expensive. But it does make nice plots without a lot of effort.
Sparklines will be available in Excel 10. http://blogs.msdn.com/excel/archive/tags/Sparklines/default.aspx
The last entry in the link references Dr. Tufte.
I’d like to add to the mention of the R language the package ggplot2, which by default follows from the advice of Tufte. It also includes color scales from the ColorBrewer packages, which make it easy to write an application to generate booklets of graphs. I’ve been using it all summer and the graphs are well appreciated and far more pleasing to the eye than those of Excel or the default R graphs.
Has anyone used Tableau to any great extent? I have started working with it and found it to
be a wonderful tool for exploration of data but it does have some peculiarities in plotting.
http://www.tableausoftware.com
For those with Apple Mac, DataGraph by David Adelstien (http://www.visualdatatools.com/DataGraph/) is well worth a look – and won’t break the bank either (50GBP at present).
David
Of course, instead of whisking around for a program that caters to your whims, there’s the complete-control end of the
continuum: JavaScript graphics”/dfdvisualizations” libraries like D3 and Protovis (on the site – look – a recreation of Minard’s graphic!)
give you complete control and flexibility, but obviously at the cost of having to learn an equally flexible control language
(although the API’s actually quite nice and functional and data-oriented, if you care about that stuff.) Given attention to
detail, it sure as hell can produce purdy results, though – although I figure because one has to start from scratch with
every graphic there’s a lot nudging you towards minimalism.