Plotting Share (Stock) Volumes
January 29, 2003 | Andrew Nicholls
5 Comment(s)
I am trying to plot share trade volumes in a meaningful way, such that exceptions stand out. My problem is that the volume can vary in magnitude on a day to day basis. For example, 800,000 one day 91,000,000 the next. Also, due to special transactions the highest volume may be, for example, 330 million whereas normally it is 1 million. If you plot on a normal scale the abnormal swamps the normal, plot it on a log scale the abnormal does not stand out.
Any suggestions or examples?
On average, logarithms preferred. To borrow from an earlier answer:
As a statistican would say, “If in doubt, take logs.” The world in general is probably lognormally rather than normally distributed, and thus many variables are better measured on a logarithmic scale (which is multiplicative) rather than an additive scale–just as graphics sometimes use ratio scales. Camera settings (f-stops, ISO numbers) are also multiplicative rather than additive.
For more, indeed a lot more, see Edward Tufte, Data Analysis for Politics and Policy, pages 108-131. This material looks at various interpretations of slopes (regression coefficients) for mixes of logarithmic and non-logarithmic scales, and why we might want to use log scales for many kinds of variables.
Using a log scale for trading volume is unusual. More typical is this chart for IBM at Yahoo Finance which uses a log scale for price but not for volume.
Why avoid logs? Some stocks have days with zero volume–problematic for a log scale. Another reason is that for low-volume days people frequently don’t care about detail. In other words, for many investment strategies the difference between trading at 1% of normal volume versus 2% is not important, but the difference between 100% and 200% of normal volume can be significant.
Of course logs still might be the right answer. But it is a subtle problem since different investment strategies focus on different aspects of the data set. If you’re designing a software product your best bet might be to give the user several options (log vs. linear, truncating the y-axis range, etc.) so they can customize for what they find important.
I work in a field (aerosols in clean rooms) where orders of magnitude variations are accompanied by zero values. I devised a linearized log scale to handle the zero values, and it works quite well.
The idea is to replace the scale in the lowest decade between 1 and 1.5 with a linear scale from 0 to 1.5. The other lines in the decade (2,3,4… are unaffected by the change, and if you desire, you can add a line at the value 1 and label it to show the changed relationships. A 0 should definitely be put on top of the normal value 1 for the decade.
The same technique also works for instruments. The nosie around the zero value is easily accommodated, while the rest of the scale is essentially logarithmic.
The value swings indicated in the opening post are of a couple orders of magnitude or more and should be clearly apparent on a log scale.
That said …
You may want to try plotting both the actual data as well as a rolling average (with an averaging period suitable for your needs), all on a log scale. Averaging removes the zero point issue in the trend; you could use the hybrid scale method mentioned above for zero points in the actual curve or stop the scale at some fixed level (say: 100), disregarding the values below for purposes of the chart. This gives you a reading of the volume trend along with a clear view of the volatility in your data. And, of course, provide the dataset along with the chart for a detailed look at the numbers.
You might also look into examining rate of change, probably best laid out on a linear scale. This could be plotted over the actual data at some reference level and would help illuminate the information in the “normal” range of data near the lower part of the chart. You could also average the rate of change over some rolling time period to smooth out the trendline.
Why not use the square root or cube root of the volume? This gives more emphasis to large outliers than a log scale gives, while giving more emphasis to small outliers than a linear scale gives, and treats zero naturally. The disadvantages: sqrt() isn’t natural or canonical for your data, it doesn’t have as nice statistical properties, and the axis labelling won’t tell viewers how to interpolate.