Guide to Software Project Estimation

(scalablepath.com)

75 points | by todsacerdoti 989 days ago

8 comments

zoomablemind 989 days ago
Too often the software project estimates are driven by some external constraints, not much by the understanding of the effort or complexity.
It's either some existing deadline or reporting/sales cycle, some budget caps, like in grant proposals, or some promises already made by/to 'important people', or some fear of 'small people' to underdeliver etc.
The estimation would all be fine if all involved people shared the same goal and responsibility.
I find it practical to split the desired outcome which needs an estimate into two variants: 1) the most desired/promised/advertized and 2) an at-least viable variant.
If noone can see the second variant, its viability and the effort it needs, then some details or skill force are clearly missing.
If the second variant is estimatable, then the estimate could be used as a basis in dealing with the external constraints.
If the devs are saying that in a given timeframe they can see a prototype done at least and you're fine with that, then no one should be damned if that pans out just to be the case. So it has to be clear from the beginning if it's at all acceptable to put such variant/prototype into production.
gilbetron 989 days ago
Nothing interesting in the article at all, I have no idea why it is on the front page of HN.
As usual with software estimation, unless you've read this paper: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.19.3...
and have a thoughtful response to it (of which I've read some solid ones), I've very little interest in hearing what you have to say about software estimation.
[-]
- diiq 989 days ago
  1) The big if is written out in the abstract -- "if it is accepted that algorithmic complexity is an appropriate definition of the complexity of a programming project." Relating algorithmic complexity to "how long software takes to write" seems, to me, to ignore that the vast majority of my time as a developer is spent discovering and communicating requirements, handling human questions, not writing novel code. The conclusion touches on this, but ignores it.
  2) Even if you accept that "if", this is like a halting-problem proof. Fine; it is impossible to estimate the complexity of ALL software. That does not mean that it's useless to quantify the complexity of software in limited but well-behaved problem spaces. How much of any commercial project is actually spent working on the cutting edge of computer science, facing complete unknowns? An estimate being wrong occasionally is worth most estimates being mostly right.
  3) Why do you consider a 20 year old paper that's only been cited 16 times to be critical reading about estimation, when a vast body of research in forecasting exists, written by people who have measurements of the accuracy of estimates to base their theoretical models on?
  [-]
  - gilbetron 989 days ago
    > 3) Why do you consider a 20 year old paper that's only been cited 16 times to be critical reading about estimation, when a vast body of research in forecasting exists, written by people who have measurements of the accuracy of estimates to base their theoretical models on?
    Have some links to this vast body?
    I've encountered very few over the years that actually qualify as "science" as much of it is fake or close to fake.
    http://shape-of-code.coding-guidelines.com/2021/01/17/softwa...
    [-]
    - diiq 988 days ago
      Forty volumes of the "Journal of Forecasting" would be a place to start for peer-reviewed articles, I guess
      https://onlinelibrary.wiley.com/toc/1099131x/current
      [-]
      - diiq 988 days ago
        I suppose if you consider software to be somehow magically different than all the other human activities which go over-time or over-budget, maybe you could claim that research doesn't apply. But I guess I'd still expect you to know about the fundamentals of the multiple researched approaches to getting experts to predict accurate numbers in the face of uncertainty and social pressure, before deciding to discount it entirely for our specific field.
        But it's certainly made a big difference for me in practice, and given the gushing about Steve McConnell in another comment thread, I'm not alone.
- codemac 989 days ago
  That paper makes such a huge, and largely incorrect assumption in the abstract. I've never seen anyone make this argument.
  The old parable about bugs, that it's 10x harder to fix a bug than it is to write the bug, shows us that the time to implement something complex is not related to it's complexity.
  For example, if I wanted to build something that randomly selects a function from github, and then runs that code on your laptop - I bet I could estimate and implement that code.. but good luck every defining it's features, complexity, etc in any mathematical way.
  But you don't need to define things that way to have reasonable estimates, for the same reason that when you build a shed in your yard, you don't need to understand all the physics of how it's held up.
xcambar 989 days ago
I am and always will be skeptical about software estimation.
I am even more skeptical about the promises of software evaluation methods.
But what I am the most skeptical about is teams avoiding software estimation altogether because they share the two skepticisms above.
[-]
- commandlinefan 989 days ago
  > avoiding software estimation altogether because they share the two skepticisms above
  Well, to put your mind at ease, I don't avoid software estimation because I share your skepticisms (although I do), I avoid it because I've observed that it's a complete waste of time. Nobody, at least never in my 30-year career, has ever asked for an honest estimate of how long it would take to produce a software product. What they have asked for is somebody to agree that the time that they have budgeted will be enough for the (vague, still being defined and still to be defined even beyond the timeline) software project and take the "blame" when it inevitably doesn't.
  [-]
  - tupac_speedrap 989 days ago
    Yep, every story pointing session is basically just think of a Fibonacci number and round it up. Finishing early makes your scrum master leave you alone but finishing late makes you and your team look bad and "unagile" and then you get even less done next sprint because you are stuck in meetings. The scrum master always wins because nobody is ever doing enough Agile.
  - xcambar 989 days ago
    I agree.
    > somebody to agree that the time that they have budgeted will be enough for the [...] software project
    For me, that's still software estimation.
diego_moita 989 days ago
Just one more consultant doing "branding".
What causes estimates to fail are the unknowns: unavoidable surprises when implementing something new, unexpected change in requirements, etc.
It would be easy to have accurate estimates if there were no unknowns. But every innovative project must always be a march to unknown territory.
seph-reed 989 days ago
I got pretty decent at software estimation at my last job.
I would spend an entire day "pre-programming" everything in my head, estimating the length of each little chunk, adding them up, then multiply by ~2.
It worked for me. But I still would never trust the estimates.
[-]
- p0nce 989 days ago
  Instead of 2: Multiply by π for new kind of projects. Multiply by φ for known projects.
- d6ba56c039d9 989 days ago
  Another angle.
  'How long did a project this size take last time?'.
  As an aside, years ago I worked at a company that did thorough (and inaccurate) bottom-up schedules. I got dinged for not using quarter hour accuracy in the various task estimates.
nicholasjarr 989 days ago
Never been good at estimation. Software Estimation from Steve McConnell is in my reading list for a long time now. From the little I have seen from it, it look good (I already read Code Complete from him and recommend it). Do you guys have any tips for estimation?
[-]
- quietbritishjim 989 days ago
  I like the advice in Thinking Fast and Slow by Daniel Kahneman about estimating, which is not specific to software but still very applicable to it:
  Start with a known past project that is in some way similar in magnitude and adjust from there. For example, "this is twice as complex as some other project I did, and that took 2 months so this one might take 4 months". Most importantly, resist the temptation to say "although 1 of those 2 months was because of unexpected thing X so I shouldn't include that". Overall, it's highly flawed, but much less highly flawed than anything else. This is called "reference class forecasting".
  He gave a really compelling explanation of why estimates are almost always underestimates by a significant amount, and this technique is the best defence against it, but I won't try to resummarise because I'll surely misrepresent it. But I do recall he gave an example where he and some colleagues were trying to make a school syllabus about deductive biases, and underestimated the effort required for their own project.
  [-]
  - AlbertCory 989 days ago
    Thank you, quietbritishjim. I actually met Dr. Kahneman at Google, although I didn't introduce him. I got to ask him at lunch:
    "Dr. Kahneman, you've been at this for 40 years. Do you think you've changed anyone's ways of thinking?"
    He smiled and said "No, not even my own!" and then recounted how in his personal life he'd made a mistake which he'd written about extensively (not the one about planning, though). It's a human failing, not a methodological one.
    I'm also vague about his example, but I think it was a new textbook. He asked his committee to reflect on their own past experiences with similar books. "Two years" was the past experience. Then they decided that it really should be six months, and that's the estimate they went with.
    No one wants to accept that shit happens and it's going to happen again. That's why estimation is hard.
  - nicholasjarr 989 days ago
    Interesting. Will add it to my toolbelt. Thanks.
- jbay808 989 days ago
  Whatever number you come up with, treat it as the median of a long-tailed distribution (if it matters, the lognormal).
  To get the mean (expectation), multiply your estimate by about 1.6. To get the 95% confidence bound, multiply by 5. To get the 99% confidence bound, multiply your estimate by 10.
  Understand why a distribution results in different numbers for different audiences, and why that's not the same as being inconsistent.
  Use the mean for calculating sprint workload and capacity planning, because the average is what matters for that, not the accuracy of any single job. If your manager understands probability then give them all these numbers, otherwise give them the 95% confident value, which you should also give others internally who depend on that specific job being done. Give marketing the 99% confident number even if they understand probability, because they're looking for a commited deadline they can use externally. They will push hard for an early date because they want the work done quickly, but they actually don't want to hear your optimistic estimate. It's easy to make that mistake.
  When requirements are understood, experienced developers are actually very, very good at estimating median completion times even just by gut feeling, but often fail to account for the distribution, especially when communicating with stakeholders, which makes them take heat when they're sometimes wrong by a factor of ten.
  [-]
  - quietbritishjim 989 days ago
    > Whatever number you come up with, treat it as the median of a long-tailed distribution (if it matters, the lognormal). To get the mean (expectation), multiply your estimate by about 1.6. To get the 95% confidence bound, multiply by 5. To get the 99% confidence bound, multiply your estimate by 10.
    I like the idea of multipliers but the maths here is just meaningless fluff to justify a particular number. If your initial estimate really was a median then it would be an overestimate (i.e. the project ends up taking less time) in about 50% of cases. In practice I find that initial estimates are overestimates in about 0% of cases!
    [-]
    - jbay808 989 days ago
      > If your initial estimate really was a median then it would be an overestimate (i.e. the project ends up taking less time) in about 50% of cases
      It is, yes. And it frequently happens that you go to fix something, which seems really difficult, and then you realize that it's actually an easy fix or not a problem at all. But when you fix one thing in half the time you expect, and another in twice the time you expect, this doesn't average out, because the average of 0.5 and 2 is not 1.0.
      You might just be discarding the cases where estimates were found to be conservative, either because delays are more impactful and memorable, because the underestimates were close enough to be treated as on-time, or because the slack in the schedule was used to buy time for something else, originally out of scope, that was lumped in.
      Anyway, these numbers aren't just pulled out of a hat. It comes from studying vast amounts of high quality (but unfortunately, not publicly available) data collected comparing developer estimates and measured outcomes.
      [-]
      - quietbritishjim 988 days ago
        I was responding to what you said in your original comment:
        > Whatever number you come up with, treat it as the median ...
        Especially the "whatever number you come up with" bit, which seems aimed at everyone quite generally. When people usually come up with a number, it is likely to be a substantial underestimate. This isn't just a cognative bias where I've forgotten the times the task turned out to be simpler - evidence shows this to be the general trend (a la Thinking Fast and Slow, as I mentioned in another comment). So your rule that it should be treated as a median isn't correct in the majority of cases.
        > these numbers ... comes from studying vast amounts of high quality ... data
        Well that's a different matter. Of course, whether the result of analysing that data is a median or a mean (or something else) depends on how exactly you analysed it.
  - diiq 989 days ago
    Would upvote twice if I could -- These are 5-star rules of thumb.
    (I'm glad to see lognormal making more inroads in software estimation. McConnell is great, but assuming the normal distribution leads to some weird edge cases.)
  - matttrotter 989 days ago
    And prepare for them to give you strange faces! I had a manager look at my estimate and then told me to multiply it by 3, which I thought was ludicrous. Turned out to be accurate.
  - ohthehugemanate 989 days ago
    That's just story points with extra steps.
    You're already using an abstracted measure of time, by working with a derivative value of "developer estimated hours". You're already doing timeline projections on the average throughput of your "adjusted developer hours" unit. That's most of the value right there.
    You can get even better results, with a little less cognitive load, by applying the research that people are much more consistent in estimating complexity than time (note that your method relies on consistency, not accuracy, to succeed). A quick imagination exercise validates this point for most of us: You bought a new IKEA sofa - how much time will it take to build? Honestly hard to do, and we're never accurate. But consider instead: how hard is it? Way easier to answer. And if you already know how long it takes you on average to finish other tasks of similar apparent difficulty...
    Try using your exact same system, but ask people to estimate the task in terms of complexity. Use any scale you like, as long as the units have consistent value in your developers' minds (I like "cups of coffee", personally). Make your Dev team agree on the difficulty score for each Feature, to ensure that consistency.
    Side benefit: Devs stop worrying about time and taking shortcuts (aka "technical debt") to meet their time estimate that you don't believe anyway. They're also a lot more likely to consider hidden risks and sources of extra complexity in the estimate.
    Then you just track the actual throughput with a confidence interval, and use that to make timeline projections with a confidence interval based on that tracking.
    TLDR: try asking Devs to estimate complexity rather than time, and use a moving average with confidence interval rather than the static 1.6 multiplier to make timeline projections. You'll find your projections more accurate and developers less stressed about it. You'll also have reinvented story points.
    [-]
    - jbay808 989 days ago
      Unfortunately, that just masks the difficulty by using ambiguous terms that nobody knows if they agree on, and makes communication hard with 3rd party stakeholders who don't share your conventions. When marketing wants to know when something will be done, we can argue about whether dev-weeks or calendar dates are more appropriate, but I think I'd get told right off if I tried to tell them it would take a hundred "story points".
      There's no shortcut to avoid the requirement to present different summary statistics to different stakeholders. It's a consequence of decision theory. Unless they're equipped to understand the whole distribution.
      It's also the wrong sort of rounding. I think an ikea sofa might take an hour, but if it took all day I'd be pretty shocked. But with software tasks, it's important to accept that the distribution is long-tailed. Sometimes it really will take 10x as long as you expected, and that's not your fault. Story points would have to abandon all meaning to capture that much variance.
      I don't recommend incentivizing estimates, though. A big benefit of recognizing a developer estimate as short-hand for the median of a distribution is that when the time doesn't match the estimate, it doesn't mean the estimate was "wrong" or "bad", and the developer shouldn't feel bad.
      [-]
      - ohthehugemanate 989 days ago
        Sorry if I was unclear, when talking to management outside of the project, you express in terms of time/calendar dates. The arbitrary units are just a more accurate and less pressured way of getting to time values, than "developer estimated hours times a static multiplier."
        [-]
        jbay808 989 days ago
        It's not a static multiplier; I thought I was clear that it's very much a context-sensitive multiplier, which depends on risk tolerance (which you get, straightforwardly, from how far you integrate the tail of the distribution).
- diiq 989 days ago
  McConnell's 50/90 approach makes a big difference in my opinion because it lets you encode your uncertainty. The extra math means you don't need to be good at estimation as long as you know roughly how bad you are at it.
  If that seems like too much effort, I also run quotes.vistimo.com , which takes a similarly (if slightly more advanced) statistical approach, but does all the math for you.
- smallerfish 989 days ago
  Start by listing out the features in a spreadsheet. For each feature, think through it and list out the stories, one per row. Create a section per feature in the spreadsheet (i.e. put a line under each group of rows).
  For each story, add an initial estimate (in terms of developer-days). This is your "low" estimate. Now in a second column, add a "high" (potential-but-reasonable "worst case") estimate. If you're looking at more than 10-15 days for either column for a story you should probably break the story up some more.
  Now add a 3rd/4th column, which are the low/high estimates multiplied by 1.3 ("fudged low" / "fudged high"). Total up all stories per feature in a row at the bottom of each feature's section. Divide by team size, divide by business days, round up to nearest integer, and you have your calendar weeks for each feature.
  When sales/marketing ask you for estimates, you then respond "between X and Y calendar weeks from [completion of previous feature]". Just be aware that they will hear X, so make sure Y is very clearly included in every communication where the dates are being discussed. If "previous feature" slips, make sure to communicate clearly that "next feature" has also pushed back by however many weeks. You'll be tempted to, but don't be optimistic with progress reports or estimates of where you are in the range - undersell and over-deliver, and you'll keep more allies on the business side.
- stronglikedan 989 days ago
  > Do you guys have any tips for estimation?
  Stick to your guns when Sales tries to get you to change your estimate (and they will). Tell them they can discount the project, or change any other variable they need to satisfy the customer, but don't ever let them touch the time estimate. Not really a tip for making the time estimate, but keep your ass covered once you do.
- cm277 989 days ago
  Here goes nothing:
  - Have the people that will run the work (tech lead, senior dev, whatever) break the work down in meaningful chunks/modules.
  - The number of modules is important: it should be roughly equal to the man-months you're trying to budget for --just a rule of thumb. Basically, not two few and not too many.
  - Have the same people give you two numbers for each chunk: the best case scenario based on what their gut tells them and the worst case scenario (where 'worst' here is a bit short of nuclear winter, but not optimistic).
  - Your project will take the total of the averages of the min/max estimates.
  You're welcome. Source: 20 years of delivering tech projects on time...
- mmcdermott 989 days ago
  I loved McConnell's books, having read Code Complete, Software Estimation and Rapid Delivery.
  Besides what is covered in those books, I've found it extremely useful to document assumptions. Every single estimate has some mental model of the project to be done. Code to be reused, vendors to integrate and, most importantly, things that won't be done. The real project almost always breaks with some of those high-level assumptions, but that tends to be lost in the shuffle.
  Attaching assumptions to the estimate makes it much easier to do a post-mortem.
  A powerful memory cannot compare with pale ink.
- commandlinefan 989 days ago
  > Never been good at estimation
  Me neither. When I was first starting out, that really stressed me out a lot until I realized that I didn't work with anybody else who was "good" at it - that is, I didn't work with or know anybody who could take a list of requirements written out in English and produce a timeline that had any relationship to how long the software would take to be ready to use.
  Been doing this professionally since 1992. I still haven't met anybody who was "good" at estimation.
  [-]
  - handrous 989 days ago
    I've reached the point where I think that if you're not going to do the NASA Space Shuttle program thing of specifying the whole program to the smallest detail before you start writing the actual code, you may as well just start working, release often, and evaluate periodically whether the thing looks on-track to be worth the cost, cancelling if it's not. Just spend the estimation money on development instead.
- okl 989 days ago
  Read that damn book :D It's a treasure trove of information, not only for estimating software projects! For example, learning to differentiate between estimate, target/goal, and commitment.
  Personally, I'm often dumbfounded that folks still use planning poker when there are so many more reliable methods as discussed in the book, e.g., wideband Delphi.
- qznc 989 days ago
  Second McConnell. It is a great reference for all kinds of estimations around software development.
  That is also its downside. It is a reference. Not a textbook to learn things in a pedagogical structure.
  [-]
  - nicholasjarr 989 days ago
    Yeah. I notice this when I tried to read it the last time. I was expecting something more like Code Complete. I don't know, maybe it is the subject: code is way more interesting than estimation :)
automatic6131 989 days ago
I have no opinion on the content of the article, because light grey on white is barely readable, and it would require far too much energy to read.