All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Envisioning Information
Visual Explanations
Beautiful Evidence
Paper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
All 4 clothbound books, autographed by the author $150
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $2
The Cognitive Style of Powerpoint $2
Seeing Around + Feynman Diagrams $2
Data Analysis for Politics and Policy $2
catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
Boston MA, October 23, 24, 25
Washington DC, November 6, 7
Bethesda MD, November 9
San Francisco CA, December 4, 5, 6
San Jose CA, December 8
Click here for more information about ET's course and to register.
odds ratios, graphing positive and negative associations together

How do you graph, on one chart, the results of discrete choice logistic regression in which there are positive and negative associations (odds ratios above and below one) for different categories of different variables? Excel, SAS, SPSS and SUDAAN don't seem to offer anything.

I have been doing it by creating a bar chart, with the origin at 1, that shows the magnitude correctly i.e. 0.2, 0.33, 1.0, 3.0, 5.0 as symmetrical, by transforming the odds ratios, and relabeling the grid.

Any suggestions?

-- Marina Counter (email)


Sounds like a job for any competent econometrics or a biometrics package.

-- Edward Tufte


SAS can do this: Plot 95% confidence bounds vertically, with the point estimate, as a HiLo plot. Use a log-scale verticlae axis, include at a minimum a horizontal reference line at y=1. If you wish, add additonal rference lines for clinically significant (as opposed to statistically significant) odds ratios.

-- CJ Alverson (email)


Let me fix the above post:

Compute the 95% confidence bounds for the natural log of your odds ratio.

Use a hilo plot in SAS.

Use a vertical log scaled axis (be sure you specify log base e).

Include a reference line (horizontal) for y=0, which corrsponds to log(1), equal risk for both classes of exposure.

-- CJ Alverson (email)


ideas for graphing results of multinomial (3 levels) logistic regression

Does anyone have ideas about how one might graphically show the relationship between a single continuous predictor variable and the probability of a categorical outcome variable with 3 levels? If it had only two levels, a "logical" approach would be the logistic function. However, adding that 3rd level (done with multinomial logistic regression) has me stumped. Any thoughts? Thanks. -Erick

-- Erick Turner (email)


Odds ratios...interesting summary numbers.

The odds ratio ranges from 0 to positive infinity, with 1.0 indicating equal odds. The problem I have with interpreting the odds ratio is that the magnitude of odds can give the perception of huge differences in likelihood of the outcome, given the predictor.

I prefer to transform the odds ratio into a probability statement. Remember, in logistic regression we model the prob(Y|X) with the function, x / 1 + x, which takes on the range 0 to 1. To transform an odds ratio into a probability, simply calculate: p = proportion in positive category on dependent variable and q = 1 - p. Then, calculate pqb, where p and q are defined above and b is logistic regression coefficient (not odds ratio). Now, pqb is approximately the first derivative of the logistic function evaluated at the mean of the dependent variable.

SO, as an example... Using an odds ratio, you might state that males are 1.5 times more likely than females to be diagnosed with HIV (please, these are hypotheticals). Calculating pqb, you could transform the odds ratio to make the alternative statement, equivalent in meaning, that the difference in the probability of diagnosis with HIV is .05 higher for males than females. If you are a relative frequentist, you might report that, all things being equal, males have a 5% higher rate of diagnosis than females. Of course, confidence intervals help us to understand precision of all point estimates and are easily calculated.

I think probability statements are more easily consumed than odds ratios. Just a personal preference.

-- David Passmore (email)


Thanks for that feedback.

Actually, I am working with probabilities. (It may have been misleading that I entered the question under the topic that included the term "odds ratios", but that topic was the only hit I got with the search term "logistic regression".)

Anyway, my predictor variable is continuous, and my outcome variables, call them y-1 and y-2, range from 0% to 100%, as you've suggested. Graphing x against y-1 gives one sigmoidal curve (typical of the logistic function), while graphing x against y-2 gives another sigmoid curve.

I supposed I could just have the two sigmoid curves in a single figure. So, at a given value of x, one could read off that there's, say, a 30% probability of outcome y-1 and a 45% probability of y-2 (maybe they're supposed to add to 100% -- I don't know).

Maybe what bothers me about the above idea is that I'd be plotting each value of x twice, once against each of the two y's. Just brainstorming, I wonder about something analogous to the trilinear (aka triangular) plot, which allows you to plot x vs y vs z, all in a single point. But I have no idea what this would look like for a logistic function. Maybe it would be a mess.

Any thoughts on this or other ideas on how else one might present such data graphically?

Thanks.

-- Erick Turner (email)


Would any of the kindly contributors be able to help me learn how to create this plot in Stata, as CJ Alverson has done for SAS above?

Gratefully,

Marlow Macht

-- Marlow Macht (email)


I've had luck drawing these sorts of graphs in Stata using a combination of overlayed "twoway" plots. First, create a dataset where each observation has a point estimate, upper bound, and lower bound. Assign an indicator to each variable corresponding to your choice of horizontal or vertical scaling (e.g., _n). Then use "rspike" for the 95% confidence interval and "scatter" for the point estimate.

In Stata syntax, this looks something like: twoway (scatter estimate n )( rspike upper lower, horizontal), xline(1)

-- josh (email)




Threads relevant to software:


Threads relevant to statistical graphics:

Sports data (along with financial and medical data) are an obvious and natural application of sparklines.