HomeBookscoursesposters and graph paperfine artsculptureET Notebooksshopping cart/checkout
New ET Writings, Artworks & NewsDogs & Others of Graphics PressPowerpoint Essay
[ Current Topics | Complete List of All Active Topics | RSS feed | Search ]

Graphing Software

Can you recommend any graphing software that will produce graphs of high visual quality? My application calls for rather simple line graphs, yet I find my current choice (Microsoft Excel) to be very limited in terms of the aesthetic subtleties I'm looking for (especially regarding color, line weight and curve accuracy).

I'm looking forward to attending your class in Palo Alto in June.

Thanks.

Steve Sprague

ssprague@protolam.com

-- Steve Sprague (email), April 27, 2001


Now and then some of my students have been able to hack Excel to do good graphics. But for serious data analysis and publishable graphics, use a high-end statistics package such as Origin 6.0 (good review in Science, July 16, 2000; Windows only, reads into Adode Illustrator); SYSTAT; Datadesk; STATA; SAS; SPSS; SigmaPlot; or the like. All these programs will give you excellent data analysis and statistically competent displays more or less suitable for scientific publication or serious presentations.

For the highest level graphics (elegant, custom, expensive), enter the crunched data or the graphical output into Adobe Illustrator. Or have all your graphical templates designed and set up in Illustrator. This program gives complete control over typography, line weight, color, grids, layout--just what we need for doing graphical work. It is a serious, complex design program; you may want to work with real graphical designers who will surely know their way around Illustrator. The graphics for the medical interface in my book Visual Explanations (pages 110-111) were done first by scaling the medical data with custom software, and then those statistical results were brought into Adobe Illustrator to produce the complex data/image display on page 111. The output from Illustrator is directly publishable.

More generally, set up a few really good templates for tables and graphics. Use these really good architectures for everything you can. Also when you see excellent graphics, find out how they were done. Borrow strength from demonstrated excellence. The idea for information design is: Don't get it original, get it right.

E.T.

-- edward tufte, April 27, 2001


SPLUS seems conspicuously missing from the list of statistical software providing strong graphics capabilities. Was there a reason for this? SPLUS's Trellis graphics came to my rescue early on with multivariate data display.

-- Sue Bell (email), May 10, 2001


I've long admired SPLUS as well as the other programs I forgot to put on the list.

-- edward tufte, May 13, 2001


Thanks to all for your suggestions. I've looked into all of the software packages recommended and many others as well.

As my choice has been constricted somewhat by budget and as I really only need to produce two dimensional graphs and don't need heavyweight statistical analysis, two have been continually recommended: DeltaGraph by SPSS and Grapher by Golden Software. Of these two, Grapher is by far the easiest to use with a much broader feature set than DeltaGraph. DeltaGraph's main selling point is that it will export to .eps format; Grapher won't but that won't hinder my efforts. Both can produce graphs of high quality. Grapher's the one I'm going with.

Of the higher end programs, SigmaPlot, Origin, S-Plus and Prism all seem to have excellent properties. SigmaPlot would most likely be my choice due to its first rate integration with Excel (where all my data is), although S-Plus has some excellent analytical features and add-ons. All of these have much more analytical power than I need and are substantially more expensive to boot.

I did look into Illustrator and after some further discussions with the folks at Adobe, I pretty much decided against it. It's graphic sophistication notwithstanding, it's very limited in it's ability to handle non-linear formats (I'm mostly working with semi-log and log-log charts), and Adobe actually suggested I look into DeltaGraph.

If there are any other thoughts, I'd love to hear them. My project will be ongoing for the next couple of years and I may well be looking to further enhance my graphs.

Steve Sprague

-- Steve Sprague (email), June 12, 2001


KaleidaGraph is another excellent choice. For its surprisingly low cost, it produces amazing graphs. It is on Windows and Macintosh. You can export the graphs directly into many file types. You can also link your data DIRECTLY TO EXCEL FILES.

I have used it since vs 1.0. Compared to other software (Microsoft comes to mind) I am astounded at how great the improvements are from version to version. Never just "tweeks" but truely great leaps in visual creation.

-- Mike Wilson (email), August 23, 2001


I've used a wide variety of graphing software and have found most graphics "packages" to be too limiting. You might consider using a graphics language with a plotting extension or macro package. I used several but the ones I would recommend are GLE and Metapost.

I currently use the free graphics language metapost to do most of my data plotting for presentation. It has many advantages; it has a macro package that allows for easy plotting of 2-d data on linear or log coordinates, as a graphics language it can do almost anything any of the more powerful packages such as Illustrator can do, it uses TeX/LaTeX to handle type and equations and as a result the text output is excellent, it is extensible (I've added several macros and symbol libraries to create better looking graphs ala Tufte), it runs on any operating system (Dos, Windows, Macintosh, MacOSX, Linux, Solaris etc), it has some of the best curve drawing algorithms around, it generates very clean postscript output, it allows total control over line widths, profiles and color, and did I mention it's free.

It also has some disadvantages although to be fair some of these are a concious choice reagrding interface and design (marked by a *); It is not WYSIWYG (*), it is a language and has a steep learning curve, It takes at least three steps to generate a preview (mpost file; mp2eps file; gview file.eps)(*), fonts other than what TeX provides can be a bear to use, it is not an analysis package and has no feature for curve fitting (*), if it doesn't have what you want you have to put it in yourself (eg. I had to write an xy error bar routine to handle data with error in both axes --but-- it draws it the way I want it to be drawn).

In general I find that the advantages of using metapost to far outweigh the disadvantages and woudl encourageanyone looking for anew graphics package to give it a try. More information about metapost/TeX/LaTeX can be found at www.tug.org

John Walker

-- John Walker (email), October 16, 2001


I use MatLab for my graphs. It can produce line charts, contour maps, color intensity images, small multiples, 3d plots, and gives you control of very many fine details: line width, colours, positioning, overlays, transparency, etc. It runs on PCs, Macs, etc., and produces PostScript output that is directly publishable. It also gives you sophisticated data analysis capabilities. For an example, see the MatLab manual that I wrote using exclusively MatLab graphics and LaTeX ("Basics of MatLab and Beyond", 2000, CRC Press). MatLab is a bit expensive (themathworks.com), but they do sell a dicounted student version.

BTW Glad to see a Tufte website: I've admired his books for years.

-- Andrew Knight (email), October 26, 2001


Does anyone have any idea of how to draw the graph styles that are recommended in The Visual Display of Quantitative Information? I would especially be interested in tips on how to do the min-max range scales that are described in the book, in Illustrator. (I'm using Illustrator 10.0.)

-- Guan Yang (email), December 2, 2001


My (admittedly high-cost) graphics-producing solution is to make figures in Mathematica, then export them to Adobe Illustrator for fine-tuning. If you have the time and patience to learn it, Mathematica can be used to produce any figure you can imagine. It's straightforward to automate the production of graphs of a particular format. I still find that it's best to use Illustrator to fine-tune axis labels, shadings, etc.

-- Andy Peters (email), January 22, 2002


I've had success with gnuplot. It doesn't provide much in the way of analysis tools, but it is a great little app for scatter plots, custom functions, curve fitting, and parametrics. There is also z-axis support. It is completely customizable and scriptable, making it an ideal complement to regularly parsed data. It is available for a wide range of systems, and I'm partial to the Mac OS X port. And it's open source and released under the GPL.

-- Paul Smith (email), January 31, 2002


My favorite by far is Igor Pro, now available for both Mac and PC. See http://www.wavemetrics.com/Products/IGORPro/IgorPro.html for more information.

-- Alyssa Goodman (email), April 17, 2002


S-Plus has a lot of analysis options and is pretty flexible wrt graphing - I noticed that AXUM software is based on S-Plus, if you use the command line log editor feature, you will find you are editing S-Plus code. I took a class that used S-Plus, which gave me the student software for free - the class cost a lot less than buying the software!! and the student version so far lacks nothing I need. That was before they had the GUI, and I'm not sure I would have figured out how to use the darn thing without the class!!

-- Barney (email), April 26, 2002


I get about 15 charts from several staff for a performance measures publication. I usually take their Excel chart and copy it and paste it into Macromedia FreeHand (a competitor of Adobe Illustrator).

For making charts that need to be close together for comparisons, I group the elements in FreeHand and gently "stretch" the charts to the same width or height (Excel can't seem to make two similar charts the same size). After ungrouping, I then remove all of the chart clutter, change the line weights and fonts so they are consistent throughout the publication. I also scale them to fit at the actual size to be used (this way all of the titles and graph text are the same size).

I attended the Tufte workshop in the Seattle area on 6/12/2002. I agree with his opinions on Excel and "WimpyPoint."

-- Gerry Rasmussen (email), June 13, 2002


I was using Adobe Illustrator 8.0, but the numbers we are using are constantly changing and the core information is in Excel. Rather than re-input information into Adobe every time I update numbers in Excel, I am looking for a graphics program that interacts with Excel. Any suggestions?

-- Eileen Kemble (email), June 19, 2002


Eileen,

After posting the first question in this thread, I looked into many graphing programs, a number of which are mentioned in prior postings above. There are quite a few that will produce first rate graphs. All will import data from Excel; with many you will be able to manipulate your data to a lesser or greater extent within the graphing program. Some programs can be completely integrated with Excel. SigmaPlot will operate inside Excel while Excel will run inside S-Plus; you'll have the full capabilities of Excel at hand yet will be able to create graphs straight away using the full functionality of the graphing program. All will export in a variety of highly accurate formats, including eps and wmf, so you can plug them into any first rate layout program; they are of sufficient quality to use in presentations, reports, print and pdf publications and the like.

You can download fully functional (or nearly so) demos from these sites:

Grapher from Golden Software at www.goldensoftware.com

SigmaPlot and DeltaGraph from SPSS at www.spss.com

Origin from OriginLab at www.originlab.com

S-Plus from Insightful at www.insightful.com

Kaleidagraph at www.synergy.com

Dr. Tufte and many others have made mention of other graphing programs and techniques in the posts above. One of these I'm sure will fit both your needs and budget. Good luck.

-- Steve Sprague (email), June 19, 2002


R-Language is VERY similar to S-plus in terms of language structure and graphics capability...and its FREE at http://www.r-project.org/ (downloadable GNU software includes extensive help files containing great sample code)

-- Nathan Pellegrin (email), June 24, 2002


It's a funky old Unix utility, but jgraph produces excellent postscript graphs from a simple text input format. Note - I mean jgraph the old postscript tool, not JGraph the Java library.

It has superior axis formatting and doesn't have a lot of stupid extra bells and whistles that you should never use anyway. Great for programmer types who like to write code to manage their own data.

jgraph's home is http://www.cs.utk.edu/~plank/plank/jgraph/jgraph.html It hasn't been updated in several years and has some silly bugs. I recommend you start with the Debian sources http://packages.debian.org/stable/math/jgraph.html

-- Nelson Minar (email), September 1, 2002


Continuing Nelson's point about rusty old Unix-fostered tools being useful, there's the venerable Graphviz toolkit from AT&T, which is useful primarily for producing directed and un-directed graphs (as opposed to statistical bar, pie, and line graphs). Graphviz, like you might expect from the mention of Unix heritage, expects a file full of special syntax, which is then processed to produce an output file, but is actually quite the powerful package, once you get used to it.

More information, including downloads for Unix, Mac OS X, and Windows, is available at <http://www.research.att.com/sw/tools/graphviz/>

-- Dan Moniz (email), November 20, 2002


Question about Graphing Software

Can anyone recommend a Mac OS X native software package that will let me map 6 or 7 variables on a single chart? Some of the variables will have differing scales, so I need to be able to incorporate multiple scales. I also need to be able to demonstrate a single line with varying thicknesses to delineate a range.

Oh, and did I mention easy to use. I've read the posts so far, but nothing seems to fit the bill (unless I'm missing something). Although it seems that everyone here hates Excel (and I have my moments as well), I'm looking for software where I can enter formulas, insert the numbers, and the chart magically appears.

Any suggestions would be greatly appreciated!

Anita Chambers

-- Anita Chambers (email), January 21, 2003


Anita,

Most sophisticated graphing packages will integrate directly with Excel, so you can set up your spreadsheet any way you want and have the results displayed automatically on a chart or graph. With these programs, you can plot numerous variables with multiple and/or overlaid axes; they also have graphic capabilities that are vastly superior to anything Excel will ever have.

Unfortunately, most of these are set up for use only with Windows. You might want to try DeltaGraph for Macintosh; it's from SPSS and is not too expensive (as these things go). You can get information about it at http://www.spss.com/spssbi/deltagraph/mac/ . If it works like their Windows verison you should be able to use it fairly easily.

Hope that helps.

-- Steve Sprague (email), January 21, 2003


An amazing software package with about a 2-minute learning curve is http://www.dpgraph.com/

Michael Round

-- Michael Round (email), May 10, 2003


I've been showing data lately using Treemap, a novel graphing format written in Java:

http://www.cs.umd.edu/hcil/treemap/

It's a specialized graphing tool for visualization of hierarchical structures. WSJ.com displays stock market data using an extended version of this tool.

One can compare nodes and sub-trees, at varying depth in the tree, which helps spot patterns and exceptions. It's been adaptable to my uses, data can import via Excel.

-- Brian Wagstaff (email), May 17, 2003


JMP 5.0 from SAS is a less expensive alternative to SAS with plenty of data discovery tools as well as a wealth of graphing and display options that can produce quality results.

-- Ron Agresta (email), June 7, 2003


There is a chart/graph function in illustrator which allows you to import Excel/Spreadsheet data.

Illustrator compiles the data and draws out the particular graph you chose. Those graphs can be broken apart and edited with Illustrator's native tools.

In Illustrator hit F1. Then select Search from the navigation in the left hand column. In the searc hfield inser the word excel. Then in the results select the term: Importing data from another application

Follow the instructions.

-- Jeffrey berg (email), August 14, 2003


We've been recently using Adode Illustrator to construct many small intense financial graphics, visually importing the data from the web, stitching time-series together, and then adjusting and rescaling for publication.

-- Edward Tufte, August 14, 2003


This trick gives me the flexibility and power of both Excel and Illustrator:

I do the statistical calculations using Excel. I then print (the ugly looking) Excel graphs as acrobat PDF-files using Acrobat Distiller. The PDF-files can then be opened and edited in Adobe Illustrator.

-- Harald Groven (email), August 14, 2003


If Excel does for statistics what it does for graphics, you may want to do your statistical calculations in some other program (SAS, Systat, SPSS, JMP, etc.), and then do your graphics in Illustrator. Many statistical programs also have fine graphing capabilities, and you can import the graphs into Illustrator for fine-tuning (I'm thinking of Systat in particular).

-- Gregory C. Mayer (email), August 14, 2003


This is an fine suggestion and has been my strategy for a long time: to do good data graphics you need a program that can count (a serious statistics package) and a program that can see (Illustrator).

-- Edward Tufte, August 14, 2003


Re: Graphing in Illustrator. While my older version of Illustrator will import data from Excel and construct a number of graph types, it won't produce scatterplots with log scales (essential to my project). Does any know if the newer versions will?

Also, excellent advice from Jeffrey Berg. Smoothing is essential for viewing graphs in Acrobat and Reader and should be done immediately after installation; it's a feature available only in version 5 and later.

-- Steve Sprague (email), August 15, 2003


A sideline on Excel Statistics

See: McCullough & Wilson Comput Stat Datra An 40: 713-721

which says that the errors in Excel have not improved and may have worsened in recent editions.

Do NOT trust Excel stats.

-- Steve Heise (email), August 25, 2003


For those running on Mac OS X, I've recently discovered the strangely named Aabel (http://www.gigawiz.com/ Aabel.html), which seems particularly well suited to multivariate data analysis. And it has decent anti-aliased typography, which is uncommon in many otherwise commendable graphing packages.

-- Jin Choi (email), September 8, 2003


GRI is nice

I have been very pleased with Gri, a graphics language written by Dan Kelly and hosted at SourceForge:

http://gri.sourceforge.net/

It has been very responsive: I was able to get very good results at the very start knowing very little, and have been rewarded with increased control and power as I have learned more.

Available for many platforms, and essentially free (it's GPL).

-- Mark Woodworth (email), September 12, 2003


I am an astronomer... Myself and upwards of 50% of other astronomers/ astrophysicists use the programming language IDL (Interactive Data Language http://rsinc.com/idl/ ) which has both basic and highly advanced analysis capability as well as fully manipulable plotting output in a variety of formats (vector, raster, etc.). We use this in astrophysics quite a bit (we run telescopes and data analysis pipelines with it in addition to plotting!) as we can quickly plot, analyze, etc. a chunk of data and play with it at the same time... then, for demanding presentation (without powerpoint of course!) we can lock ourselves in our offices for a day and fully mess with the elements of the graphics objects.

However, for really good charts and graphs, I have to recommend the free plotting package Grace (formerly "xmgr" http://plasma-gate.weizmann.ac.il/Grace/ [You must have X Windows installed to use grace... Macs have Apple's X11]). It allows you highly sophisticated, manipulable output on data that you can tie to an input file (like the output of an Excel file). I did the following figure using a combination of IDL and Grace for a recent paper:

http://pobox.com/~joehall/temp/nasa_synthesis.gif

(Full disclosure: I actually cited Mr. Tufte's Challenger work in this paper. Hall, J.L. "Columbia & Challenger: Organizational Failure at NASA." Space Policy 19:4, 239-247 (November 2003). You can read it here:

http://pobox.com/~joehall/papers/nasa.pdf

(Mr. Tufte- Your copy is in the mail!)

-- Joseph Lorenzo Hall (email), December 16, 2003


GraphPad Prism is a great statistical and graphing application. I drop figures into Adobe Illustrator and edit further for fine control. www.graphpad.com for a demo of the program (Mac or Windows). The CEO of the company will return email questions usually the same day. He is also a very good pharmacologist and offers multiple statistical tutorials on his web site.

-- Christian Hunter (email), January 9, 2004


Macromedia Flash has excellent graphic tools and also allows importing Freehand and Illustrator files with fidelity.

Flash's javascript-like language, Actionscript, enables you to transform any graphic into a dynamic "object" that you can bind to data.

Depending on the methods and properties you assigned your graphic "objects," they will change as data changes.

Static info graphics become dynamic! Combine dynamic and static graphics for advanced visual data analysis.

Manually input data or bind to external data sources.

You can connect to data "via XML through HTTP (JSP, ASP, etc.), to SOAP services, or directly from an API using a more dynamic binary protocol."

-- Jean O'Sullivan (email), January 11, 2004


For info on Swiff Chart, Flash-based charting program with great graphics see

http://www.nwfusion.com/newsletters/web/2003/1117web1.html

(can import Excel and other database files)

-- Jean O'Sullivan (email), January 11, 2004


To Jean: I checked Swift Chart and hated the examples given. Poor presentation of data, as well as garish and busy.

But my real question is this: Why do you a graph in flash? Under what conditions would this help the reader to understand the information?

-- melissa spore (email), January 12, 2004


Flash allows complete control over the appearance and functioning of any graphic you design. And you can import, and I think even paste, your Illustrator or Freehand files into Flash where you can "program," i.e. make dynamic, elements of your illustration.

You are not limited to graphs and charts. Create dynamic diagrams of transit systems, for example, or cell mechanisms, nerve systems, urban maps - anything that will help you show complex interactions of different factors.

Flash allows you to program the elements of your illustration and those elements will respond to data you can enter manually or extract from an external data source.

Simple example - Map of U.S. with each state programmed to change colors from cool to hot depending on any variable you want. Add another variable and show it with a simple circle that grows or shrinks depending upon data it receives.

A Flash subway transit map could have train "objects" programmed to correspond to actual conditions in real time to give dynamic visual representation of data. Add more variables to understand more.

You can also create visual models of solar system, for example, and do mathematical "what if's."

It's all about visualizing the interaction of changing variables.

A bit like motion picture vs. still photo.

-- Jean O'Sullivan (email), January 12, 2004


Some of you, especially if working with multi-file/multi database table clinical data, might be interested in Patient Profiles (formerly CrossGraphs). This software is being used by the FDA to graphically review submitted New Drug Applications (although other software is also available to FDA reviewers). Graphical cross-tabulation is built-in, as well as data and graphical drilldown (drilldown from aggregate graphs, those showing data for multiple subjects to individual patient profiles). Go to http://www.csscomp.com or info@belmont.ppdi.com.

[link updated January 2005]

-- Jeff Millstein (email), January 26, 2004


I am graduate student in nuclear physics. Almost everyone I know/work with in the field creates their plots using PAW or ROOT. These are free data analysis/graphing software packages produced by CERN (a particle accelerator facility in Europe). PAW is older and is structured in Fortran, while ROOT is newer and is structured in C++. They produce output files in postscript or pdf format and they contain fairly sophisticated analysis routines.

PAW:
http://wwwasd.web.cern.ch/wwwasd/paw/

ROOT:
http://root.cern.ch/

-- Jaideep Singh (email), March 12, 2004


David - I haven't tried OpenDX yet, but I was looking at it a few minutes ago, and may end up trying it out over spring break -- it looks like it would be very useful for what I need. If I end up doing that, I'll report back.

On a different note, I recently came across a scripted graphing program called GRI, which looks like it is able to produce high-quality line and contour graphs. See http://gri.sourceforge.net/gri-cookbook/index.html for some examples, and http://gri.sourceforge.net for the homepage from which to download it.

-- Brooks Moses (email), March 14, 2004


Dell is offering a special on a package called Mekko Graphics that has an interesting assortment of graphics that can be used in powerpoint. Does anyone have any experience with this package?

Info is at http://www.mekkographics.com

-- John Wall (email), March 15, 2004


The Dell/Mekko graphics are not serious, and operate at the level of PowerPoint templates for statistical graphics.

-- Edward Tufte, March 15, 2004


I would like to add my 2 cents to this multi-year thread. Steve Sprague, you still reading?? I did a majority of my figures for publications in Physics journals and my dissertation using Gnuplot exporting in the .fig format. I then imported those plots into Xfig and edited as needed. Everything is editable at this point. Xfig can then export in virtually any format you wish, jpg, ps, eps, bitmap, tiff, etc. It worked well for me. I then used those figures in a TeX document yielding publishable quality results. All mentioned software is free and TeX has been used to typeset many books.

Cheers!

-- Matthew Lee (email), April 22, 2004


Reading every day. Thanks for the note.

-- Steve Sprague (email), April 22, 2004


SAS/Graph, particulary the annotate facility and proc ganno. For years I stayed away from this component of the SAS system because the documentation didn't seem to show me that annotate could do anything that I wanted to do. The examples were stunningly inane. But stuck once with no other way, I found that this tool, in conjunction with the statistical routines in SAS, is likely the most powerful computing tool in my arsenal. There is essentially nothing that can't be done.

I spent about ten years in R&D in the pharmaceutical industry working on clinical trials. About four years ago I moved to the commercial side of my company, in a data mining role, examing commercial data (sales, marketing, sales rep visit patterns, call center patterns, etc.). While in R&D, most studies employed a couple hundred patients. Studies were well designed, conduct was regulated by FDA, patients were randomized. As such, the data were quite dull. Does the treatment have any effect? Safety? Efficacy? Standard analysis tables and reports. A few simple graphics of treatment groups over time.

Now in commercial, ten- or one-hundred-thousand or a million observations is the norm. The business data are substantially MORE complex than the clinical trials data, not less. Yet, most people on the business side only report (read 'count and total') these data, collapsing them across the very sources of variability that are driving the process, using tools like Brio and MicroStrategy and Crystal Reports. The advanced users sometimes use SAS to manipulate the data but then the results are dumped into Excel and PP. Arrgh! And most of the eventual displays are then simple time series, showing sales or counts or totals or whatever for each of the last 12 or 24 months. Is the steady and unrelenting drumbeat of time the only factor driving changes in our numbers?

SAS/Graph's annotate facility allows me to create incredibly data-rich graphics programatically in formats like pdf (high resolution), with pinpoint control over every element of the graphic. It gives me the ability to place (based on the data!) lines, points, curves, and text of any size or color at any place on the graphic. And clever programming allows me to let the program make decisions about how to show things *based on the data*. Although I don't do much TeX anymore, the power reminds me of what TeX does with typesetting. Showing upwards of 5, 6, 7, or more dimensions has become commonplace through the use of colors and shading, small multiples, varying element sizes, and sorting. Examples in ET's books (particularly EI) and course have, of course, proved invaluable as well.

The only thing lacking is an easy way to wrap prose text (like in a 'text box') for adding fulltext descriptions to the graphics. I had to write some fancy macros to build a poor-man's wrapper, but I think there may be an easy way to do this when we upgrade to version 9!

Some of my academic friends tell me that S-Plus has similar capabilities but I cannot neither confirm or refute this claim.

Complex data require serious analysis. Excel is a great tool for keeping track of minutes played for the players on my kids' basketball team or for computing bake sale proceeds but it is not powerful enough for serious information envisioning of gigantic, brutally complex, highly multi-variate data.

Serious analysis requires serious tools.

(And, in the interest of full disclosure, no, SAS doesn't pay me, but the sales rep that visits our company once gave me a golf shirt and a coffee mug!)

-- Rafe Donahue (email), June 25, 2004


I've had good luck with a product called JMSL (a Java numerical analysis library from Visual Numerics [www.vni.com]) and it generates very nice, clean, clutter-free graphics. It gave me all the attribute control that our group required to tailor lines, fill, axes, etc. We needed to provide graphics in a live web environment and JMSL was very well suited to embedding into custom software applications to generate graphics on the fly.

-- Jim Moore (email), July 10, 2004


I know this is taboo on this site - but I LOVE Excel for high end graphics. I do my higher end statistical analysis primarily in SPSS -regressions, and other multivariate analyses; but time series and other spreadsheetable data right from Excel. I have used SPSS's graphics - but find it hard to adjust and customize. Spend a bit of time with Excel graphs and you can remove all the chart junk fairly easily. Everything is completely customizable. And a beautiful part is that it can talk with everyone's computer -mac and PC (one good thing about the Microsoft ubiquity). I spend some time to set up the templates and it's very easy to update.

I've even started using for sparklines (then importing into Word, Pagemaker, or PP). The sparklines are great because they keep their dynamic link to the data. Once set up, I can update a report by just opening and hitting print.

And don't tell ET, but you could create your report solely in Excel by using it as a page layout tool too - then you don't have to worry about importing and exporting and opening various products.

-- Dave Krause (email), July 22, 2004


Several of my students have hacked Excel to produce elegant graphics--shrinking the data images, and cleaning out the grids, chartjunk, and poor typography. About 10 years ago, 2 students produced a fine set of sparklines entirely in Excel, which didn't know it was producing good graphics.

-- Edward Tufte, July 22, 2004


An aside about Excel and the 'dangers' of using a computer program without fully understanding what it may be doing in the background.

http://www.biomedcentral.com/1471-2105/5/80

-- Andrew Nicholls (email), July 22, 2004


I have recently produced very reasonable sparklines in StarOffice, or OpenOffice the free equivalent. Nearly all of the graph attributes are editable and so you can remove axes, grids and the chart junk. They are not quite word sized I admit. It's not so much a hack as undoing the defaults. The fine tuning, is hard and there are some weaknesses, but it improved our data denisty in our weekly product quality reviews. We cut a 33 page PP type presentation down to 3 slides to great embarassment of the previous report owners! And now in our department, 'tuftefying' has become a verb. Anyone presenting now gets short shrift if their Tufte Data Density Ratios aren't up to scratch.

I recently attended the course in Portland Oregon, swiftly devoured the books and went on the rampage at work, to get this message across. I honestly believe this could save our company millions in reclaimed hours of meeting time. It's kind of like going green, and recycling valuable resources, except in this case it's time.

Good on yer Prof. for showing me and a colleague if mine some new and very useful tricks.

Rgds

Richard

-- Richard Kenyon (email), August 22, 2004


McKinsey exhibits

As a McKinsey veteran who survived 30 great/gruelling months with the Firm, I feel able to offer a quick response to Karl McDonnell's enquiry about McKinsey exhibits. As at summer 2002 (when I left), McKinsey staff used a relatively unsophisticated set of software for exhibits - Excel for the analysis and Powerpoint for the graphics, the latter reinforced by a bespoke charting module maintained by an in-house team to bypass the default MS settings and to enable some of the more specialised strategy chart formats.

But I have to say that the software was quite peripheral to the success or failure of exhibits. The Firm has the good sense to provide full-time communications specialists to advise teams on the creation and production of accurate, convincing documents, and I received fantastic coaching from the four communication specialists in the London office where I worked. I can't remember a single discussion with them where the software was an issue - our focus was always on the audience and on the message we wanted to communicate. Which I suppose shouldn't come as any kind of surprise to the readership of this particular discussion board. I should also point out that all four were extremely familiar with Professor Tufte's books (well-thumbed copies were to be found in their offices), and used the principles he expounds in the orientation courses provided for every new consultant.

-- Will Judge (email), August 31, 2004


This is very important to note. Excel's date handling is an absolute disaster. Because the program attempts to be "smart" about handling dates in any format imaginable, the default behavior ends up irreparably destroying data in all types of unexpected ways. Aside from the specific problem of transforming gene names like SEP1 to Sep-04 or SEP2 to 2-Sep, I've often run into unexpected data changes -- it will transform centrex phone numbers from 3-3058 to Mar-58 (as in March of 3058, apparently thinking I'm doing some very long-term planning). And be careful about using ranges like 2 - 4, that will turn into 4-Feb.

Excel stores dates internally as integers, which leads to a host of other problems. If you realize Excel has messed up your data and you change the columns to text format, it gives you its internal representation (the number of days since 1/1/1900 or 1/1/1904). So your 3-3058 is now gone forever, replaced by 421550 or 423012. I've even had situations where I've started entering a column of dates and typed only a year in one row, such as 1982, but instead of treating it as the year 1982 it treats it as the integer representation, so 1982 becomes 4-jun-1905. As a programmer, I can understand that there's a good chance that something like SEP1 was intended to represent Sept. 1st, but when common sense dictates that what the user typed was what the user intended, then for crying out loud leave it alone.

What's worse, the default behavior on PC and Mac is different -- worksheets created on Mac start counting days from 1904 while on PCs they start counting days from 1900. Copy a column of dates from a spreadsheet created on PC to one created on Mac and suddenly everyone's four years younger -- that's right, even though Excel knows that one sheet starts counting at 1904 and the other starts at 1900, it doesn't bother converting the data.

cheers, jamie

> An aside about Excel and the 'dangers' of using a computer program without fully > understanding what it may be doing in the background.

> http://www.biomedcentral.com/1471-2105/5/80

-- Jamie Ciocco (email), September 17, 2004


I just want to second some comments above in support of open-source packages such as R, gnuplot, octave, and ploticus:

http://cran.r-project.org/index.html

http://www.gnuplot.info/

http://www.octave.org/

http://ploticus.sourceforge.net/doc/welcome.html

Sigmaplot is fairly powerful, but is fairly expensive and for a few years did not appear to be under active development (that seems to have changed, it looks like version 9 is out: http://www.systat.com/products/SigmaPlot/).

-- Brian Crounse (email), October 20, 2004


Vista is a good statistics package. It can do everything from anovas to z-plots with of course multivariate regression analysis and 3-d spin plots. O. It's free, and runs on windows, linux, etc... I had JMP by SAS, but something odd happened, b/c it quit working, even though I have the purchased version it now says the JMP.PER file is corrupt, etc....

-- jon hastings (email), November 16, 2004


Minitab has recently issued V14 which is a real improvement over earlier versions. The statistics have always been good but the graphs were clunky. The new graphs are a lot more controllable and can be tuned to be more informative.

-- James N. Cawse (email), December 9, 2004


Hi -

I must add my voice to the Excel heretics here. I work for Germany's largest private financial and research company, running large-scale industrial modelling and ratings.

Every quarter we produce our industry dossiers. These are individually between 40 and 65 pages, depending on data coverage and the like; all told we produce something like 4500 charts, 400 tables and quite a bit of text.

We do it all, including page layout and design, in Excel. While going into the details would bore everyone here terribly, suffice to say that trying to get multiple programs to work together efficiently turned out to be a disaster.

We used Excel to massively increase our output and therefore our productivity. We were using 4-5 programs to do the numbers, to make charts, to create pretty tables and then to write our analyses; this took 3 people two weeks to do. Switching to doing everything (except the model work) in Excel let us massively simplify things and avoid all sorts of object problems, and we now do the product with 0 people and in around 12 hours (ok, someone has to click on a button to start the procedures...).

So, while Excel might not be the best of all worlds, it allows this economist to stop doing excessive amounts of computer work and lets me get back to doing economics instead.

And I have gone through four copies of Visual Display here at work: one fell apart from being borrowed so much, two others wandered off into other departments and I keep the final copy hidden so that it also doesn't disappear... :-)

John

PS: You can get Excel charts to do quite a bit, as long as you are willing not only to program in VBA, but also to think outside the box. Walkenbach is god here.

-- John F. Opie (email), January 19, 2005


John Walkenbach's Spreadsheet Page.

-- Niels Olson (email), January 19, 2005


In experimenting within Excel to solve some of the charts in VDQI, I was intrigued by some of the ways in which data density could be increased further by combining different ideas. One such end result is similar to the dot-dash-plot at the top of page 133. In addition to this though, I realised that this could be combined with a quartile plot for the frame but with those quartiles represented by the values at each quartile. This then generates the bivariate relationship, the full distribution of each variable and the quartile plot of each variable. I'm happy to be corrected but I think this generates a data/ink ratio of 1.0. If anyone is interested, I can send through the Excel example. Unfortunately, I don't have a website on which I can host a picture of this.

-- Will Oswald (email), January 20, 2005


Great job. The "Dot Chart with range" is pure genius. I laughed out loud when I saw it, because it made so much sense. Yup I think its data/ink ratio of 1.0.

Ah simplicity,

-- Jeff (email), January 21, 2005


Thanks guys for the comments, and thanks especially to Jeff for hosting images of the files to which I've been referring. The "dot chart with range" plot is combining together a couple of ideas within Visual Display of Quantitative Information. Here's what it actually looks like:

The idea here is that the main body of the plot shows a normal x-y bivariate scatter plot. The x-axis is the actual display of all x values, i.e. the full distribution of x-values, as per page 133 of VDQI. What I've added here is that the labels of the x-axis are the minimum, 25th percentile, median, 75th percentile and maximum. The same approach applies for the y-axis. This means that the axes and axis labels are all providing additional information about the data distribution. Jeff has kindly posted images of the other charts that were all created very simply within Excel here.

-- Will Oswald (email), January 24, 2005


A wonderful piece of work done by Will Oswald. Especially delightful is his cryptically named "Dot chart with distribution, t" which shows a bivariate distribution migrating across the x-y plane as a function of time.

I have been able to use a similar technique for bi-variate time series data where the comparison of interest is the interplay between the x and y values across time, not necessarily the relationship between x and time, and between y and time (as is often seen in the dreaded double time series plot). With monthly data, I have done something similar to what Will has done, coding the year (2002, 2003, ...) via color and the month with a single digit code (JFMA5678SOND). Having the year-color code increase in darkness as data become more recent is helpful, as Will has done, in that it brings the most current data to the front. Highlighting the most recent point is a crafty finial, bring attention to the current state of things.

I tend to use this single-point highlighting technique often when producing reports for large numbers of groups. Instead of simply reporting, say, market share and volume once to each of 500 business units, I create 500 plots the entire bi-variate distribution, with each plot containing a different highlighted point for that respective group. This 'you are here' method allows each group to see where it is relative to the distribution of its peers.

So, Will: are you going to tell us Excel novices how to make these wonderful plots?

-- Rafe Donahue (email), January 24, 2005


Probably easier to see the charts here (thanks again to Jeff). The first plot shows more detail than above, in that there is some sense of time progression, with the markets going from a grey circle with a white fill and getting progressively darker until the last data point is marked red, with call-outs from each axis.

The other version with which I've been experimenting is to add the y-axis transposed to lie underneath the x-axis. The chart here plots EUR/USD against USD/GBP over time (it is coincidence here that the darker values are extending further out along the diagonal). By plotting the y-axis transposed to lie underneath the x-axis, what I'm trying to do is to show any commonality in clustering (a significant feature in financial markets) and in the quartile measures. Note, though, that this equal size scaling for both variables is only valid if the thesis that a linear regression based on the plotted data is valid. Nevertheless, this can at least give an indication of whether co-clustering does exist.



As always, the Excel file is available by request, although this last plot is a very manual process at this stage.

-- Will Oswald (email), January 24, 2005


I didn't see that anybody mentioned GMT, the Generic Mapping Tools. Here's their home page

http://gmt.soest.hawaii.edu/

and part of the introduction from that page:

"GMT is an open source collection of ~60 tools for manipulating geographic and Cartesian data sets (including filtering, trend fitting, gridding, projecting, etc.) and producing Encapsulated PostScript File (EPS) illustrations ranging from simple x-y plots via contour maps to artificially illuminated surfaces and 3-D perspective views."

-- Pete Kelly (email), February 3, 2005


Having the "right" graphics software (or being able to do the "right" thing with whatever software) is fine -- but trouble arises again if one has to merge graphics with text, for instance, for a paper to be printed in a book. For want of alternatives, I have managed to produce graphics with Excel that were not too disgusting (even though certainly not really nice to look at), but when I copied these to my MS-Word document (not that I would love to do such a thing, but the editor of the book uses Word as a matter of course), I had to resize them - and at this point several of my more or less nicely set up elements (line strength etc.) went haywire. Any suggestions for help? Or, more specifically, does anybody know whether I can circumvent this problem using Adobe Illustrator? That is, could I produce graphs with Illustrator such that they don't have to be resized or changed in any other respect when inserted into Word (if this can be done at all)? In this case I will get me a copy of Illustrator immediately ... Apologies to all readers and especially to ET for my ignorance.

-- Wolfgang Ludwig-Mayerhofer (email), March 28, 2005


I generally use R for plotting graphs from scientific data. It is by no means perfect, but overall the best I have used. R has the advantage of being open source, making extension a lot easier than several proprietary tools I have used. Its default output is not bad and it is very extensible.

Recently I have been implementing scatterplots based on the designs presented in "The Visual Display of Quantitative Information". My goal has been to make the process as automated as possible. Publication quality graphics still will need manual tweaking, but this is not suited to exploratory analysis. Here is an example chart generated by my R function, from sample data included with R:

It shows:

  • Minimum and maximum values of both variables, with the precision indicated
  • The quartiles shown by shifting segments of the axis
  • The mean (red dot) and median (gap in the axis)
  • A rug plot indicating density
  • The relative time of events, shown by shading

It doesn't look as good as a low resolution bitmap, so there is also a PDF version which is suitable for zooming and printing. It is still a work in progress, and I am experimenting with different options so any comments and suggestions would be appreciated. The source code is available for those who would like to have a look or try it out.

-- Steven J. Murdoch (email), March 31, 2005


Mr Murdoch has shown us a fabulous piece of data displaying.

The bitmap, although low resolution, shows immediately the bimodal nature of the data. I find this fascinating in that we always hear that Old Faithful erupts every hour. Obviously this is not the case. The argument can then be adjusted to say that "on average, Old Faithful erupts approximately every 70 minutes." This is certainly true (truer?), but still a corrupting statement nonetheless. The issue is the bimodality, only revealed by examining the atomic level data. There are really two peaks! Very rarely are the eruptions an hour apart, in fact, if it has been exactly 60 minutes and you forgot to put new film in the camera, you are more likely to wait for more than 15 minutes than less than 15 minutes; you might have time to reload your film! This is great information. The mean? The mean? What does it mean?

Summaries, schmummaries! Give me the atoms! Thank you, Mr Murdoch!

Some points on the plots, however: I don't think I understand the shading. Is it day/night? Or summer/winter? A bigger design question: is the parameter that is represented by the shading a rational source of variability? If it truly is day/night or time of day, is there reason to believe that the underground geological natural forces feeding this phenomenon really care wear the sun is? If so, what a great piece of information! If not, don't waste the design variable (shading) on a source of variability that is apparently not distinguishable from residual error. Use the design variables to show discernable sources of variability. Perhaps even better might be a pair (collection? ordered array?) of small multiples; I find it hard to compare the overlain scatterplots visually. Yes. Typically reserve shading for responses. Remember, your data display IS your model.

The bitmap version shows the horizontal extrema lining up with the repective values at the endpoints: 43.00 looks like 43.00 and 96.00 looks like 96.00. But the PDF version seems to miss. Having only been looking at R for a week, I cannot point to a source of this problem or a solution, I can only point out the issue. I'm sorry.

Further, the data: absent is the time-till-next-eruption time of 61 minutes. Hmmm. Every other integer from 44 through 94 seems to be represented. Perhaps we would like to know more about how the data were collected! When? Where? By whom? My conjecture is that if there is a way to figure out "who" collected it, that might make a decent factor for which to use the shading...

Oh, and the scatter plot seems to lie on a strict integer grid but the marginals seem to have a finer support. Which way does it go: are the margins jittered or are the (x,y) points collapsed? Either way, I humbly suggest no adjustments to the atoms.

But in general, a wonderful piece of work. Thank you!

-- rafe donahue (email), March 31, 2005


Let me add links for three of my favorite commercial packages, all of which are available for both OS X and Windows, and none of which, unfortunately, are available for Linux.

Wavemetrics publishes the ferociously good Igor Pro mentioned by Alyssa Goodman upthread.

GraphPad publishes the excellent graphing package Prism.

Finally, a direct link for JMP (produced by SAS) might be useful.

All three packages have powerful, but substantially nonoverlapping statistical analysis capabilities. Igor is arguably the best package available for dealing with dense timeseries (or other evenly-spaced) data, offers a powerful programming environment, and offers tremendous control over plot output. JMP has outstanding exploratory data analysis capabilities, but offers the user only limited control over the details of plots. GraphPad is one of the best-balanced commercial packages that I've seen.

I have used Igor, JMP, and the already-mentioned KaleidaGraph on a regular basis over the last 5 years or so. Many of my colleagues use (and develop) open source packages including Gnuplot. I've been using the free/open source Gnumeric spreadsheet rather than Excel lately. Excel is certainly my favorite Microsoft product. Gnumeric in its current form is nearly as usable, and in several functional respects, better.

-- Alex Merz (email), April 7, 2005


A few new notes:

Regarding S-Plus, this package is nearly unique in its' support of Trellis graphs. It also powerfully renders graphics interactively. It is limited by a decided clunkiness in its' data management and data shaping capabilities.

SAS has a reputation as being a mediocre graphics platform, which is true if one sticks with the default procedures and settings. However, SAS can be forced to yield high-quality graphics, especially if one masters the annotation process (annotate data sets and procedure GANNO).

A more subtle advantage in using SAS is the ability to employ macros and to use the new Output Delivery System (ODS). In particular, one can develop graphics macros that produce browsable pages of graphics. In time, one may develop a library of customized graphics tools tha can be modified for new uses. Finally, the browsable page output allows convenient storage and acquisition of images for editing purposes.

-- CJ Alverson (email), April 12, 2005


merge graphics with text

G N U P L O T can output MetaPost graphics, therefore it is possible to merge graphics with very complex TeX typeset text that, in turn, can be graphically modified. Furthermore, as the output of MetaPost is postscript, one may use PSTOEDIT to convert it to some other interactively editable format.

You have a very extended control over all the graphic details but, to keep it simple, you may have a controlled merging of text and graphics from a single G N U P L O T file.

-- L. Nobre G. (email), April 15, 2005


Omissions, Maple, and alphabetization added.
Aabel (Gigawiz, Mac OS X)
Corda
CSS Informatics
DPGraph
Generic Mapping Tools(Hawaii)
GlobFX (Swiff)
gnumeric
gnuplot
Grace (formerly xmgr for X windows)
Grapher (Golden Software)
GraphPad (Prism)
Graphviz (AT&T research)
Gri Cookbook
Illustrator (Adobe)
Interactive Data Language (Astrophysics)
jgraph (plotting graphs in PostScript)
JMP (SAS)
Kaleidagraph (Synergy.com)
Maple
Mathematica
Mekko Graphics
PUP for Excel (John Walkenbach)
Octave
Origin
PAW (CERN)
Ploticus
R(Auckland)
Redrock Software
ROOT (CERN)
SigmaPlot
S-Plus
SPSS
TeX Users Group
Treemap (Written in Java)
Visual Mining
Wavemetrics

-- Niels Olson (email), April 15, 2005


Matplotlib is an open-source library for building graphs with the python programming language. It is currently being used by people with the Hubble Space telescope, geophysicists, neurobiologists, and others. Check out some of the nifty screenshots, complete with example code.

One nice feature is that it can output images in a number of formats, and can be run from both python scripts and the interactive shell to visualize complex and multi-variate data. The graphs can be zoomed and scrolled on-screen to change scale and focus.

And it's free.

Discussion/user groups have a few thousand posts and answer many questions. The author of the library responds to many of the posts personally. More complex to use than some of the GUI tools, but gives power and customizability in return. Popular with the open-source scientific community.

-- Joshua Newman (email), April 22, 2005


It was mentioned earlier that S has the wonderful Trellis graphing features. As expected, the open-source R also has its version, known as "Lattice". A nice PDF describing the early version is here http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Murrell.pdf and the package, including documentation, can be found here at CRAN. These other tools in this thread are also very nice, but there's something to be said for open source and free tools when you can't afford the powerful and costly tools.

-- Michael Wexler (email), May 1, 2005


Apple's Quartz Composer Imaging Software

Apple's latest version of its OS X operating system, 10.4 ("Tiger"), includes a new tool that may be of value to this community. It is an application called "Quartz Composer" included as part of the optional Developer's Tool installion package.

Apple's developer's documentation describes the application as, "a development tool provided with Mac OS X v10.4 for processing and rendering graphical data." While a programming background may be necessary to truly tap the power of this tool, Apple's documentation promises that people can "create compositions that process graphical content without writing a single line of code."

Quartz Composer is a decendant of an application called PixelShox, whose developer has stopped working on the project. Quartz Composer is being touted as means to showcase the underlying power of the new graphic foundation with in the OS. Early users are already creating websites and distributing sample files, and one enthusiastic blogging developer writes, "In my twenty-five years of hacking on computers Quartz Composer may be the most fun environment I've experienced yet." Another person on the same page comments, "Amazing tool...already using it to develop my psychophysics experiments and playing with it just for fun....from the engineering perspective, you can use this to develop dynamic simulation models of various sorts." (http://weblog.decentric.com/home/2005/02/quartz_composer.html).

Quartz Composer enables the developer to combine images and almost any kind of data: audio and MIDI, video, XML, live internet feeds (such as RSS), time values, etc. The resulting "compositions" can be played back through the operating system (such as screen savers), viewed by QuickTime, or act as stand-alone products.

The system requirements are pretty heavy: you need 10.4 and a modern Mac with a good video card. The newest Mac hardware should all be compatible, but machines a few years older may not be powerful enough to render the compositions.

Related links: http://www.pixelshox.com/

http://developer.apple.com/documentation/GraphicsImaging/Conceptual/QuartzComposer/index.html

http://quartzcomps.com/

http://www.vjcentral.com/news/view/id/quartz_composer_on_os_x_tiger

Cheers,

Jim

(Note: An existing Windows music application also called QuartZ Composer, is similar to Apple's Garage Band. Don't be surprised if Apple is forced to change the name of this new tool.)

-- Jim Williamson (email), May 8, 2005


Hi everybody I'm struggling with the same issues that started that thread a few years ago ... I work with MAC OS X and am looking for a relatively easy to use software to handle data and produce nice graphics. I'm actually trying Aabel for a couple of days, it's a nice piece of software but so far I wasn't able to produce multi Y plots (I mean, with more than 2 Y legends). I'd be glad if any of you was more successful about that, then I'll just buy the software which seems to do the job otherwise !

Thans a lot ahead

Samuel Morin

-- samuel morin (email), May 19, 2005


Samuel,

Have a look at KaleidaGraph, GraphPad, or Igor Pro (Wavemetrics). Scroll up for links to the vendors.

-- Alex Merz (email), May 20, 2005


Followup on use of GNU R

On March 31, 2005 rafe donahue wrote:

Some points on the plots, however:

Thanks for your comments; they were very helpful. I have made some changes to the graph based on your suggestions. The result is below and I have put some further examples, including PDF versions, on my website. As always, the source is available.

(PDF version)

Oh, and the scatter plot seems to lie on a strict integer grid but the marginals seem to have a finer support. Which way does it go: are the margins jittered or are the (x,y) points collapsed?

I jittered the marginal data since otherwise the density information would by hidden by the large number of ties. Since the scatterplot was in 2-D, and so had fewer ties, I did not jitter the data used for this. I agree this is not ideal, so in the new version I handle ties in the marginal data using a miniature histogram in the axis, implemented using a strip chart.

Further, the data: absent is the time-till-next-eruption time of 61 minutes. Hmmm. Every other integer from 44 through 94 seems to be represented. Perhaps we would like to know more about how the data were collected! When? Where? By whom?

I think the reason for patterns like this is due to the data being collected by hand, using an analogue timer, causing the collector to favour round numbers, for example, preferring 60 over 61. This is supported by the observation in the R manual, that in the eruption durations, multiples of 5 occur more frequently than would be expected by chance. It is unfortunate that the data has these problems, but I chose it because it is supplied with R and so is more convenient for demonstration.

The bitmap version shows the horizontal extrema lining up with the repective values at the endpoints: 43.00 looks like 43.00 and 96.00 looks like 96.00. But the PDF version seems to miss.

This is an artifact of the jittering. The axes are drawn based on the real data, whereas marginals are jittered, sometimes moving points outside the original range. The PDF was generated at a different time so the random jittering would be different too. Since I am no longer jittering data, I avoid this problem, but should I need this in the future I made a modified jittering function which does not affect the maximum and minimum. The resulting graph is better, but I prefer the axis strip chart.

I don't think I understand the shading. Is it day/night? Or summer/winter?

The shading is a continuous grey scale showing the order in which the observations were made. The first observation is #010101, the last observation is #BFBFBF. I included it just before making that post because I thought it might be interesting to see if any properties drifted over time. I can't see any patterns involving the shading, so I would conclude that there is no change in the correlation over time. I have now dropped that shading.

Perhaps even better might be a pair (collection? ordered array?) of small multiples;

I agree that this would be a better way to show any variability, so I tried it, but again there was no apparent correlation between the observation index and either the duration or waiting time.

As this dataset only includes the duration of eruptions and the time between them, I looked for another way to use the shading of dots. So I coloured the dots according the duration of the previous eruption. This indeed did show some correlation, as shown in this graph. So I enhanced it by colouring dots blue if the previous duration was less than 180 seconds, and red otherwise. Because of the bimodal nature of the data, this showed the difference clearly. The resulting graph is the one shown at the top of this post. The autocorrelation can be clearly seen on this lag plot.

It appears that when one eruption is of the short type (shorter than 180 seconds), the next one will probably be of the long type (longer than 180 seconds). This is apparent from the graph, and is backed up by the numbers. 36% of eruptions are short. Where the previous eruption is short, only 6% of eruptions are short, but where the previous eruption is long, 52% of eruptions are short. This does appear to be significant, although I would like to know more about the origin of the data before I would commit to that.

I hope this answers your questions.

On April 6, 2005 Robert Simmon wrote:

It would be interesting to see more dimensionality (eruption height) and a plot of mean eruption interval (or a whisker plot showing quartiles) since the park was created.

I agree, but unfortunately this data set does not include any more dimensionality.

-- Steven J. Murdoch (email), May 24, 2005


I wonder if everyone is familiar with SVG (Scalable Vector Graphics)?

In brief it is a XML-based language for defining vector graphics. Development is far from complete yet, but in principle it should be a very powerful and flexible approach to producing graphs.

I particuarly like it because it lends itself so well to being programmatically controlled, because it sits so close to web technology, and most of all because it is vector based so the results are scalable.

I believe that Illustrator can edit/produce SVG files as well, although they may be slightly odd!

More here [Wikipedia entry]

They can be viewed with the free Adobe plugin, and they also support animation, like this tube map.

Hope that's of some interest.

-- Stephen Hampshire (email), May 25, 2005


Interesting stuff!

A while back I needed some software to show the locations of earthquakes on a map as a function of time. I couldn't find any software that showed the distribution as having any kind of order. Finally I wrote some graphics routines which showed each seismic event as a small white dot which then became a slightly larger gray dot, and then finally disappeared as a slightly larger black dot. This showed some patterning when a week's data was shown in succession, but this was the only way to present the data as a coherent phenomenon*. Everything else just made it look like a random jumble.

Is there off the shelf software which would run this sort of animation?

- Paul Veltman

* = designers see the pattern, seismologists are less impressed

-- Paul Veltman (email), December 27, 2005


Spotfire

I'm quite fond of Spotfire because of its intense interactivity. It is really easy to select, subset, crop, and label data using the sliders and check boxes on the control panel. It also has a lot of multifunctioning data elements so you can visualize 4-5 factors simultaneously. Spotfire has been apparently doing a lot of work tailoring its product to specific application areas such as life sciences/biostatistics/genomics. (Someone from those areas should comment)

-- James Cawse (email), May 19, 2006


To expand on the previous post: Spotfire is one of the preferred data exploration tools among people working with high-dimensional data sets such as are frequently found in genomics, proteomics, and automated microscopy. Given that some of these folks really know their stuff, and unquestionably are familiar with the products available, I don't think that the software is so readily dismissed (the company's web site is clearly a rather heavy- breathing marketing site). Partek is another example of software with overlapping capabilities: http://www.partek.com/html/products/products.html

-- Alexey Merz (email), May 19, 2006


In cell graphing

The in-cell graphing is not just an excel feature but also works as is in open-office and gnumeric, and I would imagine others. With a little bit of adaptation it points a way to implement simple but elegant text graphing with other programming languages as well.

-- kieren diment (email), August 26, 2006


This response is actually a further question on Graphing Software. I am interested in producing analyses similar to the lagging/leading/slipping/improving presentation of sectoral stock returns in the New York Times Business Section. In this analysis, a stock's return is measured in percentage (rate) terms for the week and the year, with the market average shown as a somewhat floating axis. Anyone familiar with the diagram would see that "leading" companies are those whose one-week price return AND their one-year price return are positive.

The interesting multi-dimensionality of this graphic is that a size of the data point (a circle) is proportional to a firm's market capitalization (size.)

Does anyone know how to produce relatively simple graphics like this, where data is plotted on an x-y axis, with the size of the data point proportional to some third variable? Perhaps it would be best if anyone knows a reasonably accessible and off-the-shelf package?

I should note that, as a user of GIS software packages (ArcGIS to be exact) I know quite easily how to produce a map where the size of the identifying point is based on some variable (such as firm size). But, in this case, the x-y are locational points, not data points.

-- Kurt Paulsen (email), August 26, 2006


Oh silly me! As someone pointed out to me, such a graph is called a "bubble" plot or a "bubble graph" and can be produced in almost any package. Even unsophisticated Excel can make one!

-- Kurt Paulsen (email), August 29, 2006


One piece of software not yet mentioned is DataDesk on Mac platform. This tool produces acceptable statistical plots like most of the rest, but fills a somewhat different role: data exploration. I've found that the tool, while quirky, promotes visual exploration of data before diving right into hypothesis testing.

For multi-variate data, it allows the selection of three variable to make a 3D dot chart that can be easily scaled and rotated, and then with one selection the data points can be assigned shape based on any other variable and color based on yet another variable, easily enabling cluster analysis when you first start playing with the data.

As an example, I had a huge file with factory yield data for multiple products. This included environmental information for each day, volumes, staff, workcell, et cetera. The questions were of the "what are the key drivers of yield" type -- and the dataset was huge. In DataDesk it was simple to set up a 3D dot array showing selected variables and then look for clusters on time, operator, cell, and so on. I often find that I'm going to something like Kaleida for the graphs used to tell the story, but DataDesk made it easy to find the story. DataDesk.

They also have a good stats plugin for Excel that lets you do *real* statistical analysis. With Excel's broken stats maths, I used to hand-roll even simple things like ANOVA, but ther plugin makes life better.

As others have mentioned, don't be surprised to use one tool to analyze and something entirely different to get good graphical communication output.

-- Scott Hampton (email), September 2, 2006


Two programs not mentioned so far are grapl http://www.grapl.com/ and Coplot http://www.cohort.com/coplot.html

I have only tried the demoof both programs but they seem good value for money and very flexible.

Has anyone used either of these for real?

Graham

-- Graham Smith (email), September 2, 2006


You may be interested to look at GGobi which is an "open source visualization program for exploring high-dimensional data. It provides highly dynamic and interactive graphics such as tours, as well as familiar graphics such as the scatterplot, barchart and parallel coordinates plots. Plots are interactive and linked with brushing and identification"

While GGobi clearly does not produce examples of beautiful graphics, it is very useful to be able to quickly experiment with many different views of your data. We have tried to illustrate this with some movies in the learn more section. The tools in GGobi give you the ability to investigate high-dimensional relationships that are difficult or impossible to see otherwise. Sometimes a static graphic is simply not adequate (although your explorations will often eventually lead you to a static graphic that is good for communicating your findings to others)

-- Hadley (email), September 4, 2006


I haven't seen it noticed here, but I've been very happy with SlideWrite Plus, particularly for total control over the output, and its curve-fitting capabilities.

-- Cathy Halter (email), September 7, 2006


The first post in the topic Graphing Software has Steve Sprague complaining that Microsoft Excel is "very limited in terms of the aesthetic subtleties [he's] looking for."

The blog at Juice Analytics mentions something for those using Excel graphs to complain about.

From the blog post: "Misrepresenting data by default is like shipping Excel with broken statistical functions-it's something that should never have been considered."

-- Jim Linnehan (email), September 27, 2006


Will Excel 2007 also feature a bonus slice of pie when a value is zero? Up till now, to plot zero in a bar chart meant not showing a bar (or not showing a segment of a segmented bar); to plot zero in a pie chart meant not showing a slice. On the other hand, to plot zero in a dot plot still means plotting a symbol aligned with zero. How convenient! If one must use a bar chart, why not use a slightly smaller data rectangle within the scale-line rectangle and then represent zero with a simple line segment (aligned with zero)?

See Dr. William Cleveland's "The Elements of Graphing Data" for a description of the difference between the scale-line rectangle and the data rectangle.

Joe McCaughey

-- Joe McCaughey (email), September 29, 2006


This thread shouldn't be about bashing Excel, but there's just another feature with potential for trouble: as the base unit of Excel's date/time data type is one day, hours, minutes, and seconds are internally represented as the floating-point equivalents of 1/24, 1/1440, and 1/86400, respectively. Floating-point arithmetic isn't very precise; on Excel 2004 for Mac, OS 10.4.8, a difference of one hour is about 5 microseconds too short, one minute is about 280 microseconds too long, and one second is about 20 milliseconds (or 20 000 microseconds, or 2 %) too long.

These deviations will vary according to the actual interval, one second being the worst case, but may affect serial or iterative calculations. So be aware of this when using the native date/time data type.

-- Juhana Siren (email), October 5, 2006


Is there any plug in for Adobe illustrator that lets you draw graphs from equations. You type in y=x^2 and there you go. Illustrator makes the graph.

-- martin eriksson (email), October 10, 2006


Hello everyone.

My wife is a mathematics teacher and we've been having lots of fun getting the graphing calculator that comes free with Macintosh OSX to do all kinds of great stuff, including animations that show relationships between different functions (e.g. between a circle and sine). It has a very nice look to the graphs and they print well. It's fairly easy to use (once you discover what the "inspector" button is for). It can graph in cartesian and polar coordinates, in 2d and 3d. It's an equation grapher, not data.

It's called "Grapher" and may be in the "Utilities" applications folder.

Best, Reed

-- Reed Hedges (email), October 24, 2006


This is in part another plug for R, to which I am a minor contributor. One of the strengths of R is the ease with which new methods of illustration can be programmed. As an example of this, I include a link to an illustration that we (I am am not the originator) think is novel. I would greatly appreciate comments from the contributors.

http://www.bitwrit.com.au/img/wsc.png

Thanks

Jim

-- Jim Lemon (email), October 25, 2006


I too have been impressed by R. Apart from being free, open sourced, and extensible, it gives enormous control over all the aspects of layout that I can think of.

But to really exploit it, you need to buy a book to explain how the graphics works. Early on, I did a lot of thrashing and missed 90% of what it can do. The excellent online help explains everything, but it was hard -- for me anyway -- to piece together the forest.

-- Daniel Von Ehren (email), December 24, 2006


just to add another approach: Processing ( available for all major platforms at http://processing.org ) is " an open source programming language and environment for people who want to program images, animation, and sound. It is used by students, artists, designers, architects, researchers, and hobbyists for learning, prototyping, and production. It is created to teach fundamentals of computer programming within a visual context and to serve as a software sketchbook and professional production tool. Processing is developed by artists and designers as an alternative to proprietary software tools in the same domain."

Apart from being free, the beauty of processing is that it can be very flexible (it's a programming language), but you can also use the existing templates and applications, and the functions and commands are high-level, no need to get to the details of Java. It has a very good tutorial; it's a "teaching" language, in a way. Downside: some learning is needed before you can start to look at your own data.

I use processing to produce/design/specify very dense, very customized, non-standard visualizations.

-- christianhauck (email), December 29, 2006


Pstricks is a nice package that makes postscript available to LaTeX and TeX. This website (http://tug.org/PSTricks/main.cgi) has lots of good examples. When using pstricks, one has about as much control as possible over any sort of chart or diagram. There is no gui included, so getting started can be a bit of a pain. Luckily pstricks is not commercially viable, so the people who work on it are around and happy to answer questions. I have been on the mailing list for around 3 years and have never seen a question go unanswered for longer than a day (usually less).

Here is an example from the site which illustrates a rather complex diagram created using pstricks. It is a precise representation of a model, since it is created using mathematical notation. Furthermore it does not separate the act of creating an image from the process of reasoning about the model. However, you should not make any plans for the evening if you plan to make a diagram in this manner. Optical example
Source: http://tug.org/PSTricks/Examples/ optic1.png

It is also possible to use pstricks to generate postscript which can be imported into many vector editing programs.

-- George Dowding (email), May 27, 2007


Origin 8

Origin 8, the distinguished data-display suite for scientists and engineers, will include sparklines. Origin produces publication- quality graphics, and non-scientists who publish data should consider Origins or other publication-quality graphics programs.

http://www.originlab.com/

-- Edward Tufte, June 24, 2007


I received a new copy of JMP yesterday and was surpised at some of the new data discovery features.

http://www.jmp.com/software/jmp7/tutorials/visualization.shtml

There are still problems with the ways that they deal with scales, etc. but we are starting to get enough power to play with large datasets visually.

-- Tchad (email), June 25, 2007


JMP cartoon graphics

The JMP visualization of survey data depiected very very low resolution graphics. Survey data are best analyzed by good tables; see The Visual Display of Information, pages 120-121 (for a table v. graphic for survey data) and 178-179 (for supertables for surveys).

-- Edward Tufte, June 25, 2007


I've been recently searching up and down for the easiest way to include decent looking graphs and other line diagrams in some LaTeX documents. Having gone through, picture, PiCTeX, gnuplot, mfpic, metapost, tikz, freehand and finally asymptote over that last few years, I have finally settled on a mixture of tikz, freehand and asymptote. Both asymptote and tikz are very flexible vector graphic languages for TeX/LaTeX, much easier to understand than metapost. Other than their flexibility, I also like the fact that one can include the graphics commands directly in the main text and not have to keep separate files outside the main document. A quarter of the time I still use freehand to import EPS files especially for complicated line graphics. Check out the galleries for examples (http://www.fauskes.net/pgftikzexamples/all/ and http://asymptote.sourceforge.net/gallery/).

Herbert Sauro

-- Herbert M Sauro (email), July 7, 2007


Exporting from Excel to vector graphics software

I was recently given a recommendation for Inkscape as a low-cost alternative to Illustrator, but, disappointingly, it turned out to only accept SVG as a vector format for import, and I never could find a viable export path from Microsoft Excel, where I design almost all my graphs, to SVG. Can anyone outline an export path Excel->SVG (that is, as a vector format: raster is no good to me)? Or is there another low-cost alternative to Illustrator besides Inkscape?

-- Derek Cotter (email), September 19, 2007


For someone not looking for a standalone application or a byzantine programming language, but rather an excellent charting library with bindings for popular existing languages, I heartily recommend ChartDirector from Advanced Software Engineering.

It supports many basic chart types, TrueType fonts, antialiasing, built-in functions like curve fitting and confidence intervals, and supports transparency (alpha channel) on any element. Performance-wise it is excellent. Its core is written in C, but works out of the box with C/C++, Perl, PHP, Java, ASP, Python, etc. It's well documented and very easy to get started with.

The demo is free, but it adds a little byline to the bottom of each image. It is affordable to purchase though (less than USD$100 per developer).

-- Zack Steinkamp (email), September 19, 2007


Dot chart with distribution and statistics

Following the contribution from Will Oswald, I offer these verbose instructions for using Excel to make a bivariate scatterplot with tick marks that show data distribution and arbitrary axis labels that show statistics of the data.

1. Create a normal scatterplot from two columns of data, call them "X Data" and "Y Data" Remove all the things you don't want, grid lines, legend, etc.

2. Add a column of zeros next to the two columns of plotted data, call it "Axis". These zeros are used to make the axis ticks.

...The next three steps replace the normal axis line with axis ticks located on the axis at the data values...

3. Right-click on the horizontal axis, select `Format Axis'. Set the Lines colour to `Custom' or `Solid Line'and then the colour white. Change all the tick mark and axis label options to `None'. Repeat for the vertical axis.

4. Add a second series to the graph, choose the column containing X Data for the X Values and the column containing the zeros for the Y Values. Called this series Axis X

5. Add a third series to the graph, choose the column containing the zeros for the X Values and the column containing Y Data for the Y Values. Called this series Axis Y.

...Now format these two series so that they look more like ticks and less like data points...

6. Right click on the data points for Axis X to format that data series. In the dialog box, change the Marker to `custom' or `built-in', and then select the hollow box or short line. Change the foreground colour to black and the background to `no colour' (or select `solid fill' in Excel07). Reduce the size of the Marker to 2 pts (the smallest possible). Change the Line settings to `None'. Check that everything else in the dialogue box is unchecked or set to `none'.

7. Repeat step six for the data points for Axis Y. Now you should have little squarish dots along the horizontal and vertical axes. The position of the dots should correspond to the values of the plotted data. Some experimentation with the Marker type, size and colour may be required to get the desired effect.

...If the built-in Markers are totally unsatisfactory, then you can use the drawing tools to draw a horizontal and vertical line to replace the markers. Make the line very short and black. Copy a line, then right-click on the axis where you want to replace the markers, then click paste. The built-in markers should be replaced by little lines. Some experimentation might be required to get the line length right (the graph doesn't auto-update when you change the lines).

...Now change the axis labels from arbitrary intervals to values that give meaningful information about the data...

8. In the same spreadsheet, make a small table to calculate the values for minimum, 25th percentile, median, 75th percentile and maximum for X Data and Y Data. These statistics can be anything you like to show the spread of the data, you're only limited to the functions available in Excel. Add a third column to this table and populate it with zeros.

9. Add two more data series to the graph. Call the first one `X labels', and choose the row containing the statistics for the X Data the X Values, and the nearby row containing the zeros for the Y Values. Call the second series `Y labels' and choose the row containing the statistics for the Y Data the Y Values (not the X Values as before), and the nearby row containing the zeros for the X Values.

10. Use the XY Chart Labeler (http://www.appspro.com/Utilities/ChartLabeler.htm) to label the `X labels' and `Y labels' series. In Excel07 the labeller seems to format the text in white, so they need to be changed to black to make them appear. Label position should be left. After clicking OK for each set of labels, then go back to the `XY Chart Labels' menu, click `move chart labels' then use the arrows to get the labels on the LHS of the axis ticks (for the vertical axis) and below the axis ticks (for the horizontal axis). You might need to move the plot area boundary in and up a bit to make space for the labels.

11. Make the data points invisible for these two new data series by setting the Marker to `None' and Line to `None'

...That should do it, add axis titles if desired. It's a bit involved, but once you've got it setup as a template the hard work is done. The main problem here is that data point with a zero value will inelegantly overlap with the tick marks, I'm not sure how to fix that...

The result should look something like this:


-- Ben Marwick (email), October 4, 2007


Graphing Software to make pixel-precise images?

Does anybody know of a program that will let me directly address each pixel of a GIF or other bitmap image using integers in a spreadsheet range or text file? The ideal would be for me to be able to use an arbitrarily long number of rows, and five columns (X position, Y position, Red value, Green value, Blue value) to create a data graphic.

This would allow me to create something as simple as a sparkline, or as complicated as a wavefield (animated GIF would introduce the element of time).

-- Derek Cotter (email), December 21, 2007


Reed Hedges above mentioned Grapher, a utility in OS X. Grapher is based on Ron Avitzur's Graphing Calculator. The story of how Ron actually got Graphing Calculator written, tested, and included in OS 8, OS 9, and now OS X is almost unbelievable.

-- Niels Olson (email), February 28, 2008


A quick correction, Niels: The Mac OS X Grapher was originally Curvus Pro X acquired by Apple from Arizona Software. It is unrelated to Pacific Tech's Graphing Calculator which Apple had included with Mac OS 9 for many years. Aside from the author of Curvus sitting down with Graphing Calculator using it as inspiration for some features, the two products are unrelated.

- Ron Avitzur

-- Ron Avitzur (email), April 17, 2008


I have just attended the one day course in Seattle (July, 2008). ET's list of software for analysis and graphics presentation included S-plus - but not R. R was mentioned after one of the audience bought it up (and has obviously been mentioned here), but this is a suggestion that it should be added to the list of recommended software, both for analysis and graphics preparation.

R is extremely flexible - I use it for various forms of analysis in hydrogeology in a consultancy company, my wife uses it in analysis of genetic data at a university. A list of over 1200 R packages (add-ons that extend the core functionality) for a variety of purposes is here - http://cran.r-project.org/web/packages/ A list of task views describing the application of R to more generic topics or problems can be found here - http://cran.r-project.org/web/views/ R runs on Windows, Mac, and varieties of Unix/Linux.

R is fast and powerful - the following is a little outdated but shows that R is comparable in speed to many commercial products http://www.sciviews.org/benchmark/

R is open source, with an extremely helpful community willing to help those with particular problems. Members of the R core development team can frequently be found on the R-help mailing list as well as a wide selection of gurus with various specialities (including at least one of the previous contributors to this thread).

R can output high quality graphics in formats including PDF, Postscript and SVG. If the graphic you want doesn't exist in R or one of it's many packages it can be designed and implemented.

The major problem with R is that it can be extremely daunting to learn - the lack of a GUI makes for a steep learning curve which can be very offputting.

R is free for both commercial and non-commercial use and can be downloaded here http://cran.r-project.org/

-- Michael Cheetham (email), July 21, 2008


R does indeed run in a GUI, albeit one much simpler than, say, the Matlab environment. The Windows version is based on a GUI. In Linux, I recently discovered a package called JGR ,which creates a similar looking environment to the Windows version. Otherwise, running R off the command line interface in Linux can get a bit tedious, I think. Information on the JGR package can be found through the CRAN website. The difficulty I've had with it in Linux was installing it and getting it to run.

But I agree that R is incredibly versatile and the user community is very helpful.

-- Miklos Z. Kiss (email), July 21, 2008


I too am looking for graphing S/W options that will allow for some very non-standard formats. One current need that I have is to create line plots where the thickness of the line as well as its color vary over the length of the line. This allows me to overlay two additional dimensions over a standard line graph. The thickness idea I got from the famous chart of the advance and retreat of the French army. What I am doing right now is developing the line plot in Excel, exporting and importing to Inkscape where I then "draw" the line by hand, very tedious and difficult. I am plotting labor statistics where the the underlying plot is a form of survival plot where I graph the probability of employees staying on the job beyond a certain benchmark, say 120 days. I plot this over time. There is a significant relationship between this function and a variety of underlying economic drivers. As the economy has become less robust over the last 6 months, the MRLS (Median Retained Length of Service) has increased by significantly in a variety of retail positions that I am analyzing. As such plotting the probability of survival to 120 days shows an upward trend over time. I have been showing this kind of data for years. I then typically go on to show a plot of the weekly new unemployment claims, which has been steadily increasing over time. Of real interest in this latest slump is the fact that the years of education for those first time jobless is increasing. I break this into two basic classes, College / Non-College. I end up plotting all three concepts simultaneously. The underlying line plot shows the upward curve in survival to 120 days on the job, the thickness of the line is scaled to the number of first time jobless claims, and the color of the line is a graduated blend of Green and Red which changes from Red to Green as the percentage of those making first time jobless claims who have a college education increases. Any Ideas?

Thanks,

Dr. Robert Yerex Chief Economist Kronos robert.yerex@kronos.com

-- Robert Yerex (email), November 25, 2008


Most recent poster: this is certainly do-able in R, if you posted sample data and output somewhere I could try to recreate it. Red-green contrast could lose 5% of your male viewers ... and I wonder whether parallel line-plots would be clearer (if less compact/clever?)

-- Ben Bolker (email), November 30, 2008


Ashlee Vance has a short article on R in the New York Times. Here is an excerpt:

While it is difficult to calculate exactly how many people use R, those most familiar with the software estimate that close to 250,000 people work with it regularly. The popularity of R at universities could threaten SAS Institute, the privately held business software company that specializes in data analysis software. SAS, with more than $2 billion in annual revenue, has been the preferred tool of scholars and corporate managers.

"R has really become the second language for people coming out of grad school now, and there's an amazing amount of code being written for it," said Max Kuhn, associate director of nonclinical statistics at Pfizer. "You can look on the SAS message boards and see there is a proportional downturn in traffic."

SAS says it has noticed R's rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks.

"I think it addresses a niche market for high-end data analysts that want free, readily available code," said Anne H. Milley, director of technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet."

Historical note: the gas turbine engine and the DC-3 preceded ENIAC.

-- Niels Olson (email), January 6, 2009


Response to Graphing Software, in particular, R

As a member of an academic biostatistics department that uses R almost exclusively, I have come to understand the pros and cons of using R. The NYT article is mostly right on but, like we have seen with some grand truths, "it's more complicated than that."

I have had extensive use with R and with SAS and limited use with other software packages for statistics and, as such, graphics.

The advantages of R that are touted in the article are very real. Its zero cost is wonderful, certainly if you hate paying for things. But there _is_ a price: support consists of a message boards and email and your colleagues. Manuals and documentation are available but are limited and tend to be written with a scent of intellectual eliteism: "feel free to join our community if you can figure it out." Yes, there are many, many, many helpful people. But one must make learning R a dedicated task. One cannot just learn it a little bit. At Vanderbilt, there is a vast support community. At other institutions, that support is more limited. Unless you are one of the really bright ones, plan to _spend_ lots of time on learning. Of course, this can be a great blessing in the long run, but it is a cost nonetheless.

The R community has developed oodles of packages that do oodles of things. But saying that R is good because of that is like saying that the internet is good for news because anyone and everyone has a blog. The internet _is_ good for news but not because everyone has a blog. R is good because you can do what you like. You may or may not find a package to do what you want. My experience has been that the available packages typically give me 80% of what I want; if I really want to do what _I_ want to do, I have to go back to Square One and start from scratch and develop what turns out to be my own package...

And who takes responsibility for the validation of the packages? Well, sometimes it is nobody. This doesn't mean that the packages are wrong; all it means is that the user needs to verify everything if he or she wants it right. And if functions are housed inside other functions inside other functions one might just need to be trusting. One can mock Ms Milley at SAS for statements regarding SAS but the amount of R&D and validation and verification that stands behind their work provides some comfort. That is, of course, no substitute for checking but having a plethora of documentation (written in complete sentences with worked-out examples of code and corresponding output) really is worth _something_. The help files in R can be quite cryptic and can be downright unreadable to the new user. And the ability to call Tech Support at SAS and speak to a human is nearly priceless.

And how many of the packages are maintained when new versions of the software are released? Or when packages upon which these packages are based are updated or modified?

I work from a personal dogma to avoid anything but base R, unless there is really no way around it. I generally don't use public packages anymore.

There are plenty of graphics packages for R but when I use R (and I just did some R-ing last weekend for someone who was not ready to pony up costs for software --- the R solution was great) I stick to the fundamental, or primitive, functions. Almost all I need to do can be done with simple function to draw points, lines, polygons, and text; a simple "go here, do this" paradigm. And R is excellent here --- once the data are in order. Of course, I can do the same thing in SAS with the annotate facility. It's nice to have multiple tools.

One concern I have with R is that it is being used as the default grad school thesis option. Getting a degree these days seems to be a matter of (1) find a problem, (2) write an R package to solve that problem, (3) publish the R code. I worry that writing code is taking the place of doing analysis by thinking hard about the problem. There is a t-shirt that Springer produced that contains the slogan "First plot. Then model." I'm thinking that that ought to be "First think. Then Plot. Then think again." Black-box code help eliminate the thinking part.

There is a certain level of R street cred that comes with publishing packages. Just because we can, does that mean we should?

I believe that there is an need for multiple computing and statistical languages and that using them helps us learn to think about problems in different ways and that that is good. R is a wonderful tool. SAS is a wonderful tool. Both have pros and cons.

More real progress will be made for R when the community develops more real-world texts for use and when the packages that are wrong or severely limited are pulled from service. To paraphrase Keanu Reeves in "Parenthood": "You know, you need a license to buy a dog, to drive a car - hell, you even need a license to catch a fish. But they'll let any ****-******* a****** be an R package author."

R is a great tool but caveat emptor, even when you buy without paying money.

-- rafe donahue (email), January 9, 2009


Challenge for Graphing Software

A fascinating visual display challenge: From NYT 9 Feb 2009

"Michael Sanderson is worried. Dr. Sanderson, a biologist at the University of Arizona, is part of an effort to figure out how all the estimated 500,000 species of plants are related to one another. For years now the researchers have sequenced DNA from thousands of species from jungles, tundras and museum drawers. They have used supercomputers to crunch the genetic data and have gleaned clues to how today's diversity of baobobs, dandelions, mosses and other plants evolved over the past 450 million years. The pace of their progress gives Dr. Sanderson hope that they will draw the entire evolutionary tree of plants within the next few years. "It's within striking distance," Dr. Sanderson said.

There's just one problem. "We have no way to visualize such a tree at the moment," he said. If they tried, they would end up with a blurry, inscrutable thicket. "It would be ironic," Dr. Sanderson said. "We'd be saying, `We've built it, but we can't show it to you.' "

http://www.nytimes.com/2009/02/10/science/10tree.html?_r=1&8dpc

Seems beyond our favorite, Origin 8...

-- Will Semmes (email), February 9, 2009


We have a consultant recommending IDL and an (expensive) product called Tableau , evidently a commercialization of some work by Jim Gray (Polaris) as solutions to a large multivariate data analysis / visualization project.

I've searched the forums here and see IDL referenced positively for appropriate uses (so it has some credibility, thanks).

I don't see references to Polaris / Tableau, and wonder if anyone has impressions of it for any particular uses? I did read thru an article in Communications ACM (Nov 2008), part of a tribute to Jim Gray, and it was generally positive, but looked perhaps fairly special use (HUGE analysis problems?)

I understand if this is not an appropriate use of the forum. TIA for any responses.

-- Jim McGurrin (email), April 15, 2009


DataGraph

I have been working on a pro-level graphing solution for Mac OS X called DataGraph. I also created DataTank, which won an Apple Design award in 2005 (scientific visualization).

DataGraph is very capable and is geared towards clean publication quality graphs and bar charts. DataGraph has a growing community of users, and I really appreciate user feedback. It would be great to get this group of demanding users to stress test it further (already had over two years of it). Looking over the discussion board I see a lot of points that I've thought about, and implemented in DataGraph.

David

-- David Adalsteinsson (email), May 22, 2009


The Omni Group has a new software product for the Mac (in public beta) called OmniGraphSketcher that I haven't seen mentioned here, yet. It seems admirably focused on producing clean, elegant graphs that not only present data accurately but can be easily refined and annotated to be effective communication tools. I think the software presents some interesting user interface ideas. You can download the software, but there are also a couple brief videos ("Screencasts") that demonstrate the software.

http://www.omnigroup.com/applications/omnigraphsketcher

-- Martin Doudoroff (email), May 30, 2009


Hi folks, I am a graduate student from Georgia Tech. I am also a Linux user. I want some tool that can generate lovely plots for my Thesis. Now I have the following needs: 1. I would like all numbers and markings on the axes and title etc to be in Latex standard font. I find changes in font and lack of homogeneity very annoying 2. Various dot and dash patterns should be available to represent the data in black and white. (If people take a print out of my work mostly which will be black and white, i still want clarity) 3. It preferably be free/open source. Hate to admit it but for a grad student living on a stipend, cost is a major concern. Also free software work better on ubuntu 4. Should not take too long to learn. In a few weeks I should be able to do fairly complex stuff

What are your opinions on MATLAB, Mathematica etc? Could you suggest a better tool? Dr. Tufte mentions the use of Illustrator. Can the same be done using Inkscape?

-- vikram (email), August 24, 2009


In response to Derek Cotter’s question in 2007 about exporting graphs from Excel into Inkscape: One reasonably viable way to do it is to first export from Excel into OpenOffice. Either export an entire file, including a graph, or export only the data and create or recreate the graph in OpenOffice’s spreadsheet section. OO’s graphing capabilities and interface are similar to Excel’s, and certainly not any worse.

Once that’s done, you can copy and paste the graph from the spreadsheet program into the OO drawing program. Which can, in turn, export the graph as SVG, ready for import into Inkscape.

Not the fastest of transitions, but possibly worth the trouble if you don’t have a copy of Illustrator. Best of luck.

-- Christopher Boone (email), December 9, 2009


What amazes me about all of the graphing software packages mentioned here (and others I've found elsewhere) is that they all do way too much. They brag about how much they do, when I want to hear about how little they do.

I teach freshman and sophomore engineering and physics and try very hard to get students to understand just how bad chartjunk is, but I cannot find a graphing program that supports that without going over the top with what it can do - and making me pay for it.

I want one that will just do a simple line graph, a linear or polynomial curve fit, in a nice clean easy way without 6 million others things that I don't want my students to to worry about and I certainly don't want to pay for. I can kind of get Excel close but it's still one of the stupidest things on god's green earth. And I have to redo graphs every time. I can save a template, which helps, but I still have to fight with the damn thing.

Lord, I pray for a simple spreadsheet that will do nice clean graphs without chartjunk and some simple curve fits and is free to all schools. Is that really too much to ask? I also want all of my hair back (that takes on a whole different meaning if you switch 'hair' and 'back' around...)

-- K.S. Manning, PhD (email), March 24, 2010


The trouble with graphing software is that there are just too many packages with too steep a learning curve with too few good tutorials. The trouble with graph features of spreadsheets is that if you make things so simple that any darned fool can use them then every darned fool will use them to turn clean data into chartjunk and fat finger contaminated crud. On OS X, I've gravitated to two packages:

1. R with the ggplot2 library, a thing of rare beauty 2. OmniGraphSketcher: if you cut and paste your table into the app it will use somewhat sensible defaults and give you a fair degree of control over presentation or you can save as pdf for work in Illustrator.

-- Richard Careaga (email), March 24, 2010


I have been using STATISTICA (www.statsoft.com) for the analysis and graphing of IP network traffic for years. It has the ability to handle reasonably large data files (a file with 4 million records and 19 double precision variables is not unusual in my work). I find the graphing interface to be intuitive. One of the nice features is that the regression routines are embedded in the graphics programs, so you get the scatter plot, a (linear, exponential, etc) fit, and an equation all with at few mouse clicks.

tds

-- Terry D. Shaw (email), June 11, 2010


Hey everyone.

I am writing to talk about how much I like using Aabel (Mac) for graphing. Although, if you are a Mac user, you definitely owe it to yourself to also check out Omni Graph Sketcher for intuitive, simple and quick graphs you can save in vector graphics formats. Being a geologist, I also have a demand for unique chart types (rose-diagrams, spider charts, stereonets). I found that Aabel is an amazingly powerful (but not super cheap) data exploration application. It has a wealth of chart-types, including all the ones I've wanted, a strong statistics base, and killer graphics output. You can also build layouts to export that pull data from separate sources to mashup or show in a variety of charts on a single page. There is also SHAPEFILE support (ESRI .shp). Which means that you can also do some light GIS analysis with it... and place plots of structural data at places on the "map" in your layout. It often removes the need for me to dump figures into Adobe Illustrator for tweaking because I can tweak everything I want in the program and set templates for future plots. Also, the PDF export capabilities allow you to take your figures into Illustrator *if you need to*. OKay... so seriously... check out Aabel. Now... I'd like to mention something that I noticed around the same time I discovered the program. I have long had to straddle the boundary between Mac-user and Windows-boxes... and there are many powerful stats and graphing programs for PC's. While working on figures for my Master's thesis, I discovered that when I used Pages and Keynote on the Mac, imported PDF (vector) graphics remained sharp and crisp; yet, when imported into Word or Powerpoint (whether on a Mac version or Windows), the images became rasterized and blurred. The difference was very noticeable on screen and even worse when printed. Because of this, I switched to writing my thesis on a Mac laptop to maintain ultra high-quality figures. It was this transition that lead me to seek out a Mac software capable of geostatistical plots and a good interface for curve-fitting. There was not much out there, really, and yet, Aabel has consistently allowed me to make figures that turn heads. I love it. And the figures when exported and brought into Pages (Mac app) remain in vector format and look incredible.

Alright, sorry for the diatribe...

Cheers all, Zach

-- Zach Michels (email), July 7, 2010




Threads relevant to software:


Threads relevant to statistical graphics:
Sports data (along with financial and medical data) are an obvious and natural application of sparklines.