All 4 books by Edward Tufte now in
paperback editions, $100 for all 4
Visual Display of Quantitative Information
Beautiful EvidencePaper/printing = original clothbound books.
Only available through ET's Graphics Press:
catalog + shopping cart
All 4 clothbound books, autographed by the author $150
catalog + shopping cart
Edward Tufte e-books
Immediate download to any computer:
Visual and Statistical Thinking $2
The Cognitive Style of Powerpoint $2
Seeing Around + Feynman Diagrams $2
Data Analysis for Politics and Policy $2catalog + shopping cart
Edward Tufte one-day course,
Presenting Data and Information
Houston TX, January 29
Austin TX, January 31
Dallas TX, February 2
A common problem I have in a corporate environment is to accurately portray an estimate. We are estimating time and cost for software projects. The key issue is that software development estimates are highly inaccurate in the early stages, with errors of 200-400% quite common and a high degree of positive skew is evident. Overruns are more common than underruns and extreme overruns are alarmingly common.
The main problem is to express the uncertainty as clearly as the expectation, without appearing too elaborate.
In the past, we might say "There is a 90% chance that it will take 1-2 years and cost 5-15 million dollars". This magically turns into "You said it would take 1 year and cost $5m!".
We are dealing here with people who have zero statistical sophistication. There is a confounding factor that people want low numbers so their project gets approved over others.
I have some ideas:
1. Plot time or cost by probability.
2. Plot time by cost as a probability density cloud.
3. Express estimates like this (wide range, round numbers)
More than $3m
Less than $30m
P.S. I feel someone will suggest "Do better estimates then you won't have a problem". The fact is, until you have dug into the technical uncertainties and have defined requirements in detail you are subject to major technical risk and scope screep risk. You also have a serious problem of framing - the number the customer first thought of is likely to be way under the mark.
-- Tim Josling (email)
The statement "there is a 90% chance that it will take 1-2 years and cost 5-15 million dollars" is inherently deceptive; it implies that you've performed the exact same project a number of times, observed the variation in time and cost, and can make some sort of statistical claim with some degree of certainty. In reality, it is much more likely that although you've performed similar work, you've never done the exact same project under the exact same conditions. After all, software development is hardly a controlled experiment.
Thus it's better to state your assumptions, along with how sensitive your project results will be to changes in those assumptions. That way you can address head-on those technical and scope risks that are so hard to quantify in the early phases of technical design when much remains unknown.
-- Scott Zetlan (email)
"The statement "there is a 90% chance that it will take 1-2 years and cost 5-15 million dollars" is inherently deceptive; it implies that you've performed the exact same project a number of times, observed the variation in time and cost, and can make some sort of statistical claim with some degree of certainty."
The philosophy of probability and risk is as deep as you want it to be... J. Maynard Keynes even wrote a book about it.
In a general sense, I was trying to say that of projects 'like' this one, 90% would be between 1-2 years. The $64,000 question is what does 'like' mean?
The suggestion above was that like means a large number of identical projects that have been observed.
I would not agree with that. However you do have a point in that projects that are superficially similar can actually be quite different, so it is a very slippery concept. Even a different personality as project sponsor could make a huge difference.
I would be interested how you would frame the response, given that I do believe I have some information about the likely cost, and I have some information about the degree of accuracy of the estimate.
-- Tim Josling (email)
"Thus it's better to state your assumptions, along with how sensitive your project results will be to changes in those assumptions. That way you can address head-on those technical and scope risks that are so hard to quantify in the early phases of technical design when much remains unknown."
It is hard to itemize all the assumptions.
There is often a very big gap between the assumptions that customers make and the assumptions the builders make. For example, the number of bells and whistles, 'obvious' related requirements. Often assumptions are not articulated. What is so obvious on one side it need not be stated is regarded as obvious, though in the other direction, on the other side. Example - I can use this system without an online connection, right? This will run in the UK, right?
To be valid, an assumptions list would need to specify the business requirements and technical design in detail. In fact, this approach works, but it costs a lot of money to get to that point.
Even without that, you can I think validly say something about the size and uncertainty of the size.
Let us assume that we have the business requirements fully agreed and documented, in a 100 page document. The technical design is finalized, in a 100 page document.
Even then there is still risk: team members get sick, the team may have to move, someone may resign or be required for something else, budgets may get cut, the technology may not function as planned. You still have a residual level of risk/uncertainty which needs to be expressed somehow.
What I am looking for is some way to show uncertainty that is intuitive and is comprehensible by people who are not rocket scientists. I want to get people thinking not about a single number but of a range of possibilities.
-- Tim Josling (email)
I know how you feel. I do offshore surveys, where weather plays a big role. My approach has also been to Monte Carlo the budget using uncertainties instead of hard numbers.
A technique I have found useful is to provide the high-end figure to management - "There is a 95% chance we can complete this work for a mere US$12.1 million". On the domestic front, I indicate the same thing in other words - "I will probably be home before Christmas".
This approach avoids my having to explain, justify or apologise for the huge cost spread to management, and avoids my having to explain, justify or apologise for being offshore for her mother's family reunion dinner.
Flippancy aside, I am often forced to present what the market will bear, because it works in practice. It involves distortion and misrepresentation of the data to some degree, but this will happpen the moment you open your mouth to someone who doesn't understand probability, simply from their (mis)understanding of your words, no matter how precise your words may be.
-- Glenn Reynolds (email)
The real problem is one of psychology. In medicine, when you are presenting percentages, people will hear what they want to hear. In fact, it may be better to present the data as: "There is a 10% chance that this project will take over 2 years..."; thereby not giving anyone anything to latch on to. Might not make you too popular with your supervisors though.
No plot is going to get people over hearing what they want to hear.
-- Tarun Nagpal (email)
Unfortunately, cost is only a proxy measure for three other variables: scope, quality, and time. Every project board should be asked to rank these as objectives in order of priority before the project starts: this certainly focuses the mind.
1. We want this nuclear power facility to be built to the highest quality possible.
2. This bridge must cross from bank to bank: we will find the money to complete it if it is fifty yards short.
3. The accounting system must be in at the beginning of the financial year.
4. Refurbish as many of the offices as you can for $50,000.
This gets the board used to thinking in terms of all four of these unknowns being variables that that inter-relate and are each subject to reassessment as new facts are discovered. Paradoxically, in my experience the projects that have most rigid cost contraints are those that are most likely to overspend. Repeated insistence by a board that costs are non-negotiable leads to the team making covert compromises on quality, concealing scope reductions, and taking short cuts to get everything done. My classic example - mentioned elsewhere on this board - is the UK Scottish Parliament Building that had an initial estimate of ??10 million, reached a fixed cost, final answer, no negotiation, no more money, of fifty million and went on to come in at four-hundred and thirty.
-- Martin Ternouth (email)
Since probability is the language of uncertainty, if a receipient of the informaiton with a probability quantifier doesn't have the ability or will to accept and acknowledge a range estimate, then many times "stay with the opposite" of what they "want to hear." To manage those who latch on to a deterministic value (vs stochastic) they want to hear (as portrayed in your original question), this applies, i.e. "it could take up to 2 years, or more, and cost up to $15 Million or more." They can then only "hook onto" your higher number, which should allow you to manage their expectations better. You have received good advice from previous contributors...they hear what they want to hear...so manage it...and manage which areas need to optimized (max 2) and prioritized...scope, cost, time, quality.
-- Paul (email)
I believe that many estimates are developed under the duress of needing to win the contract. This pressure leads people to overlook or minimise their experience of what causes projects to overrun. The focus, understandably, is on the core skills and deliverables (i.e. the reasons you have been asked to quote) rather than on pricing in the risks and dependencies. In addition, estimates are frequently developed without the participation of those people who will actually do the work, who may have a more realistic grasp of the true scope and tasks. I believe it is better to price on the basis of actual work to be done, including a contingency for unplanned events or interruptions, rather than on notional project phases and outputs or on a perceived view of the client's budget. The gap between what the client wants (scope) and the true cost of doing the work (delivery) is called negotiation.
-- Steve (email)
Case Study on roads...
National Audit Office Value for Money Report
Department for Transport: Estimating and monitoring the costs of building roads in England
From the conclusion:
"Robust estimating is a key factor in delivering value for money from road schemes but represents a difficult and challenging task given the timescale of major road projects and the number of potential variables, some of which are outside of the Agency's and local authorities' direct control such as Public Inquiry outcomes."
From a footnote:
"Estimates for 35 of the 36 completed Targeted Programme of Improvement Schemes were prepared before 2003 and included a ten per cent contingency for risk but excluded non recoverable value added tax and inflation. From this base line, the actual costs were 40 per cent higher than these initial estimates. Since 2003 the Agency's estimates have included both value added tax and inflation and in accordance with Treasury guidance issued in 2003 in "The Green Book", have also been increased by between 3 and 45 per cent to compensate for the tendency to underestimate costs (known as `optimism bias'). The original estimates have been adjusted retrospectively, giving the increase of six per cent rather than 40 per cent."
It is interesting to note the heavy emphasis on the "optimism bias" in this and many other reports in the UK.
For a more detailed treatment of the optimism bias have a look at The Green Book from HM Treasury.
From The Green Book
"Optimism bias is the demonstrated systematic tendency for appraisers to be over-optimistic about key project parameters."
Here is another take on the problem:
The Nichols Report
-- Tchad (email)
I came across a discussion of "evidence based scheduling" in software development. Joel on Software
The basic idea is:
- Break all work down into small (<20 hour) tasks. If you're working in days or weeks, you don't really understand the work involved and you will never have a good schedule.
- Track the expected duration for each task, based on the best guess from the programmer who will do the work, not a manager.
- Track the actual duration for each task. Actual is usually longer than expected.
- Track the distribution of actual/expected for the last 6 months. People get better at estimating over time.
- Based on these data, plot the confidence for different completion dates.
-- M Plumb (email)