Edward Tufte forum: odds ratios, graphing positive and negative associations together

All 5 books, Edward Tufte paperback $180
All 5 clothbound books, autographed by ET $280
Visual Display of Quantitative Information

Envisioning Information

Visual Explanations

Beautiful Evidence

Seeing With Fresh Eyes
catalog + shopping cart

Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $5

The Cognitive Style of Powerpoint $5

Seeing Around + Feynman Diagrams $5

Data Analysis for Politics and Policy $9
catalog + shopping cart

New ET Book

Seeing with Fresh Eyes:
Meaning, Space, Data, Truth

catalog + shopping cart

Analyzing/Presenting Data/Information
All 5 books + 4-hour ET online video course, keyed to the 5 books.

registration page

Current Topics | All Topics

odds ratios, graphing positive and negative associations together

How do you graph, on one chart, the results of discrete choice logistic regression in which there are positive and negative associations (odds ratios above and below one) for different categories of different variables? Excel, SAS, SPSS and SUDAAN don't seem to offer anything.

I have been doing it by creating a bar chart, with the origin at 1, that shows the magnitude correctly i.e. 0.2, 0.33, 1.0, 3.0, 5.0 as symmetrical, by transforming the odds ratios, and relabeling the grid.

Any suggestions?

-- Marina Counter (email)

Sounds like a job for any competent econometrics or a biometrics package.

-- Edward Tufte

Plotting the odds ratio on a log scale is a nice way to retain the symmetry of ratios above and below 1, and can be accomplished in any of those packages.

-- Mike E. (email)

SAS can do this: Plot 95% confidence bounds vertically, with the point estimate, as a HiLo plot. Use a log-scale verticlae axis, include at a minimum a horizontal reference line at y=1. If you wish, add additonal rference lines for clinically significant (as opposed to statistically significant) odds ratios.

-- CJ Alverson (email)

Let me fix the above post:

Compute the 95% confidence bounds for the natural log of your odds ratio.

Use a hilo plot in SAS.

Use a vertical log scaled axis (be sure you specify log base e).

Include a reference line (horizontal) for y=0, which corrsponds to log(1), equal risk for both classes of exposure.

-- CJ Alverson (email)

ideas for graphing results of multinomial (3 levels) logistic regression

Does anyone have ideas about how one might graphically show the relationship between a single continuous predictor variable and the probability of a categorical outcome variable with 3 levels? If it had only two levels, a "logical" approach would be the logistic function. However, adding that 3rd level (done with multinomial logistic regression) has me stumped. Any thoughts? Thanks. -Erick

-- Erick Turner (email)

Odds ratios...interesting summary numbers.

The odds ratio ranges from 0 to positive infinity, with 1.0 indicating equal odds. The problem I have with interpreting the odds ratio is that the magnitude of odds can give the perception of huge differences in likelihood of the outcome, given the predictor.

I prefer to transform the odds ratio into a probability statement. Remember, in logistic regression we model the prob(Y|X) with the function, x / 1 + x, which takes on the range 0 to 1. To transform an odds ratio into a probability, simply calculate: p = proportion in positive category on dependent variable and q = 1 - p. Then, calculate pqb, where p and q are defined above and b is logistic regression coefficient (not odds ratio). Now, pqb is approximately the first derivative of the logistic function evaluated at the mean of the dependent variable.

SO, as an example... Using an odds ratio, you might state that males are 1.5 times more likely than females to be diagnosed with HIV (please, these are hypotheticals). Calculating pqb, you could transform the odds ratio to make the alternative statement, equivalent in meaning, that the difference in the probability of diagnosis with HIV is .05 higher for males than females. If you are a relative frequentist, you might report that, all things being equal, males have a 5% higher rate of diagnosis than females. Of course, confidence intervals help us to understand precision of all point estimates and are easily calculated.

I think probability statements are more easily consumed than odds ratios. Just a personal preference.

-- David Passmore (email)

Thanks for that feedback.

Actually, I am working with probabilities. (It may have been misleading that I entered the question under the topic that included the term "odds ratios", but that topic was the only hit I got with the search term "logistic regression".)

Anyway, my predictor variable is continuous, and my outcome variables, call them y-1 and y-2, range from 0% to 100%, as you've suggested. Graphing x against y-1 gives one sigmoidal curve (typical of the logistic function), while graphing x against y-2 gives another sigmoid curve.

I supposed I could just have the two sigmoid curves in a single figure. So, at a given value of x, one could read off that there's, say, a 30% probability of outcome y-1 and a 45% probability of y-2 (maybe they're supposed to add to 100% -- I don't know).

Maybe what bothers me about the above idea is that I'd be plotting each value of x twice, once against each of the two y's. Just brainstorming, I wonder about something analogous to the trilinear (aka triangular) plot, which allows you to plot x vs y vs z, all in a single point. But I have no idea what this would look like for a logistic function. Maybe it would be a mess.

Any thoughts on this or other ideas on how else one might present such data graphically?

Thanks.

-- Erick Turner (email)

Would any of the kindly contributors be able to help me learn how to create this plot in Stata, as CJ Alverson has done for SAS above?

Gratefully,

Marlow Macht

-- Marlow Macht (email)

I've had luck drawing these sorts of graphs in Stata using a combination of overlayed "twoway" plots. First, create a dataset where each observation has a point estimate, upper bound, and lower bound. Assign an indicator to each variable corresponding to your choice of horizontal or vertical scaling (e.g., _n). Then use "rspike" for the 95% confidence interval and "scatter" for the point estimate.

In Stata syntax, this looks something like: twoway (scatter estimate n )( rspike upper lower, horizontal), xline(1)

-- josh (email)

Threads relevant to software:
Cleaning up Excel's poshlust graphics Communicating software design Complex Organizational Charts ET software? ET Bembo? History Of Unix Chart		Quality of software, software processes and the UML QuarkXpress and Adobe InDesign Software for typesetting a book map program suggestions

Threads relevant to statistical graphics:
Analytical design and human factors Complex Organizational Charts Florence Nightingale's statistical graphics Formatting for Financial Scorecards and Detailed Reports Graphing with dissimilar units Hyperbolic paraboloid constructed with scattered data points Retina communicates to brain at 10 million bits per second: Implications for evidence displays? Round to two digits School Test Data		Sparkline > Steve Jobs > Andy Warhol in Google results Sports graphics Sports data (along with financial and medical data) are an obvious and natural application of sparklines. Tick marks in graphs Wavefields: intense animated data graphics baseline for amount scale comparing weights of irregular shapes making meaningful calculations zebra tables and lists