Sloccount’s sloppy slant – or – how to manipulate programming projects the Wheeler Way

Sloccount is my newest Awesome Software Discovery. It’s a great idea, but is far too simple to do what it claims: estimate effort and expense of a product based on lines of code. And really, I wouldn’t expect it to be that great. The model used to estimate effort is certainly not the author’s fault, as it isn’t his model. But that idiot (David Wheeler) doesn’t just say it’s a neat idea – he actually uses this horrible parody of good software to “prove” that linux is worth a billion dollars. For the record, I prefer linux for doing any kind of development. I hate Windows for development that isn’t highly visual in nature (Flash, for instance kind or requires Win or Mac), and Macs are out of my price range for a computer that doesn’t do many games. So Linux and I are fairly good friends. I just happen to be sane about my liking of the OS. (Oh, and BSD is pretty fracking sweet, too, but Wheeler didn’t evaluate it, so neither will I)

The variables

To show the absurdity of sloccount, here’s a customized command line that is assuming pretty much the cheapest possible outcome for a realistic project. The project will be extremely easy for all factors that make sense in a small business environment. We assume an Organic model as it is low-effort and most likely situation for developing low-cost software.

Basically I’m assuming a very simple project with very capable developers. I’m not assuming the highest capabilities when it comes to the dev team because some of that stuff is just nuts – the whole team on a small project just isn’t likely to be having 12+ years experience, and at the top 10% of all developers. But the assumptions here are still extremely high – team is in the top 75% in all areas, and 6-12 years of experience, but pay is very low all the same. This should show a pretty much best-case scenario.

Also, I’m setting overhead to 1 to indicate that in our environment we have no additional costs – developers work from home on their own equipment, we market via a super-cheap internet site or something (or don’t market at all and let clients do our marketing for us), etc.

Other factors (from sloccount’s documentation ):

  • RELY: Very Low, 0.75
    • We are a small shop, we can correct bugs quickly, our customers are very forgiving. Reliability is just not a priority.
  • DATA: Low, 0.94
    • Little or no database to deal with. Not sure why 0.94 is the lowest value here, but it is so I’m using it.
  • CPLX: Very Low, 0.70
    • Very simple code to write for the project in question. We’re a small shop, man, and we just write whatever works, not whatever is most efficient or “cool”.
  • TIME: Nominal, 1.00
    • We don’t worry about execution time, so this isn’t a factor for us. Assume we’re writing a GUI app where most of the time, the app is idle.
  • STOR: Nominal, 1.00
    • Same as time – we don’t worry about storage space or RAM. We let our users deal with it. Small shop, niche market software, if users can’t handle our pretty minimal requirements that’s their problem.
  • VIRT: Low, 0.87
    • We don’t do much changing of our hardware or OS.
  • TURN: Low, 0.87
    • I don’t know what this means, so I’m assuming the best value on the grid.
  • ACAP: High, 0.86
    • Our analysts are good, so we save time here.
  • AEXP: High, 0.91
    • Our app experience is 6-12 years. Our team just kicks a lot of ass for being so underpaid.
  • PCAP: High, 0.86
    • Again, our team kicks ass. Programmers are very capable.
  • VEXP: High, 0.90
    • Everybody kicks ass, so virtual machine experience is again at max, saving us lots of time and money.
  • LEXP: High, 0.95
    • Again, great marks here – programmers have been using the language for 3+ years.
  • MODP: Very High, 0.82
    • What can I say? Our team is very well-versed in programming practices, and make routine use of the best practices for maintainable code.
  • TOOL: Very High, 0.83
    • I think this is kind of a BS category, as the “best” system includes requirements gathering and documentation tools. In a truly agile, organic environment, a lot of this can be skipped simply because the small team (like 2-3 people) is so close to the codebase that they don’t have any need for complexities like “proper” requirements gathering. Those things on a small team can really slow things down a lot. So I’m still giving a Very High rating here to reflect speedy development, not to reflect the grid’s specific toolset. For stupid people (who shouldn’t even be reading this article), this biases the results against my claim, not for it.
  • SCED: Nominal, 1.00
    • Not sure why nominal is best here, but it’s the lowest-effort value so it’s what I’m choosing. Dev schedules in small shops are often very flexible, so it makes sense to choose the cheapest option here.

So our total effort will be:

0.75 * 0.94 * 0.70 * 1.00 * 1.00 *                # RELY - STOR
0.87 * 0.87 * 0.86 * 0.91 * 0.86 *                # VIRT - PCAP
0.90 * 0.95 * 0.82 * 0.83 * 1.00 *                # VEXP - SCED
2.3                                               # Base organic effort

= 0.33647 effort

We’re also going to assume a cheap shop that pays only $40k a year to programmers, because it’s a small company starting out. Or the idiot boss only pays his kids fair salaries. Or something.

Command line:

sloccount --overhead 1 --personcost 40000 --effort 0.33647 1.05

Bloodsport Colosseum

For something simple like Bloodsport Colosseum, this is an overly-high, but acceptable estimate. With HTML counted, the estimate is 5.72 man-months. Without, it’s 4.18 man-months. We’ll go with the average since my HTML counter doesn’t worry about comments, and even with rhtml having embedded ruby, the HTML was usually easier than the other parts of the game. So this comes to 4.95 months. That’s just about 21 weeks (4.95 months @ 30 days a month, divided by 7 days a week = just over 21). At 40 hours a week that would work out to 840 hours. I spent around 750 hours from start (design) to finish. I was very unskilled with Ruby and Rails, so this estimate being above my actual time is certainly off (remember I estimated for people who were highly skilled), and a lot of the time I spent on the project was replacing code, not just writing new code. But overall it’s definitely an okay ballpark figure.

When you start adding more realistic data, though, things get worse.

If you simply assume the team’s capabilities are average instead of high (which is about right for BC), things get significantly worse, even though the rest of the factors stay the same:

0.75 * 0.94 * 0.70 * 1.00 * 1.00 *                # RELY - STOR
0.87 * 0.87 * 1.00 * 1.00 * 1.00 *                # VIRT - PCAP
0.90 * 0.95 * 0.82 * 0.83 * 1.00 *                # VEXP - SCED
2.3                                               # Base organic effort

= 0.4999 effort

This changes our average from 4.95 man-months to 7.3 months, or about 31 weeks. That’s 1240 hours of work, well more than I actually spent. From design to final release, including the 1000-2000 of lines of code that were removed and replaced (ie, big effort for no increase in LoC), I spent about 40% less time than the estimate here.

…And for the skeptics, no, I’m not counting the rails-generated code, such as scripts/*. I only included app/, db/ (migration code), and test/.

However, this still is “close enough” for me to be willing to accept that it’s an okay estimate. No program can truly guess the effort involved in any given project just based on lines of code, so being even remotely close is probably good enough. The problem is when you look at less maintainable code.

Just for fun, you can look at the dev cost, which is $21k to $28k, depending on whether you count the HTML. I wish I could have been paid that kind of money for this code….

Murder Manor

This app took me far less time than BC (no more than 150-200 hours). I was more adept at writing PHP when I started this than I was at writing Ruby or using Rails when I started BC. But the overall code is still far worse because of my lack of proper OO and such. So I tweak the numbers again, to reflect a slightly skilled user of the language, but worse practices, software tools, and slightly more complex product (code was more complex even though BC as a project had more complex rules. Ever wonder why I switched from PHP for anything over a few hundred lines of code?):

0.75 * 0.94 * 0.85 * 1.00 * 1.00 *                # RELY - STOR
0.87 * 0.87 * 1.00 * 1.00 * 1.00 *                # VIRT - PCAP
0.90 * 0.95 * 1.00 * 1.00 * 1.00 *                # VEXP - SCED
2.3                                               # Base organic effort

WHOA. Effort jumps to 0.8919! New command line:

sloccount --overhead 1 --personcost 40000 --effort 0.8919 1.05

This puppy ends up being 3.4 months of work. That’s 14.5 weeks, or 580 hours of work — around triple my actual time spent!

Looking at salary info is something I tend to avoid because as projects get big, the numbers just get absurd. In this case, even with a mere 3500-line project, the estimate says that in the environment of cheap labor and no overhead multiplier, you’d need to pay somebody over $10k to rewrite that game. Good luck to whatever business actually takes these numbers at face value!

But these really aren’t the bad cases. Really large codebases are where sloccount gets absurd.

Big bad code

Slash ’em is a great test case. It isn’t OO, is highly complex, and has enough areas of poor code that I feel comfortable using values for average- competency programmers. So here are my parameters, in depth:

  • RELY: Very Low, 0.75
    • Free game, so not really any need to be highly-reliable.
  • DATA: Nominal, 1.00
    • The amount of data, in the form of text-based maps, data files, oracle files, etc. is pretty big, so this is definitely 1.00 or higher.
  • CPLX: Very High, 1.30
    • Complex as hell – the codebase supports dozens of operating systems, and has to keep track of a hell of a lot of data in a non-OO way. It’s very painful to read through and track things down.
  • TIME: High, 1.11
    • Originally Nethack was built to be very speedy to run on extremely slow systems. There are tons of hacks in the code to allow for speeding up of execution even today, possibly to accomodate pocket PCs or something.
  • STOR: Nominal, 1.00
    • I really can’t say for sure if Slash ‘Em is worried about storage space. It certainly isn’t worried about disk, as a lot of data files are stored in a text format. But I don’t know how optimized it is for RAM use – so I choose the lowest value here.
  • VIRT: Nominal, 1.00
    • Since the app supports so many platforms, this is higher than before. I only chose Nominal because once a platform is supported it doesn’t appear its drivers change regularly if at all.
  • TURN: Low, 0.87
    • Again, I don’t know what this means, so I’m assuming the best value on the grid.
  • ACAP: Nominal, 1.00
    • Mediocre analysts
  • AEXP: Nominal, 1.00
    • Mediocre experience
  • PCAP: Nominal, 1.00
    • Mediocre programmers
  • VEXP: Nominal, 1.00
    • Okay experience with the virtual machine support
  • LEXP: Nominal, 1.00
    • Mediocre language experience
  • MODP: Nominal, 1.00
    • The code isn’t OO, which for a game like this is unfortunate, but overall the code is using functions and structures well enough that I can’t really complain about a lot other than lack of OO.
  • TOOL: Nominal, 1.00
    • Again, nominal here – the devs may have used tools for developing things, I really can’t be sure. I know there isn’t any testing going on, so I can be certain that 1.00 is the best they get.
  • SCED: Nominal, 1.00
    • The nethack and slash ’em projects are unfunded, and have never (as far as I can tell) worried about a release schedule. Gotta choose the cheapest value here.

Total:

0.75 * 1.00 * 1.30 * 1.11 * 1.00 *                # RELY - STOR
0.87 *                                            # TURN (the rest are 1.00)
2.3                                               # Base organic effort

Total is now 2.166 effort. New command line, still assuming cheap labor and no overhead:

sloccount --overhead 1 --personcost 40000 --effort 2.166 1.05

Slash ‘Em is a big project, no doubt about it. But the results here are laughable at best. The project has 250k lines of code, mostly ansi c. The estimate is that this code would take nearly 61 man-years of effort. The cost at $40k a year would be almost $2.5 million! With an average of just under 24 developers, the project could be done in two and a half years.

I worked for a company a while ago that built niche-market software for the daycare industry. They had an application that took 2-3 people around 5 years to build. It was Visual C code, very complex, needed a lot more reliability than Slash ‘Em, was similar in size (probably closer to 200k lines of code), and had a horrible design process in which the boss would change his mind about which features he wanted fairly regularly, sometimes scrapping large sections of code. That project took at most 15 man-years to produce. To me, the claim that Slash ‘Em was that much bigger is a great reason to make the argument that linux isn’t worth a tenth what Wheeler claims it is. Good OS? Sure. But worth a billion dollars??

Linux and the gigabuck

I’m just not sure how anybody could buy Wheeler’s absurd claim that Linux would cost over a billion dollars to produce. Sloccount is interesting for sure, particularly for getting an idea of one project’s complexity compared to another project. But using the time and dollar estimates is a joke.

Wheeler’s own BS writeup proves how absurd his claims are: Linux 6.2 would have taken 4500 man-years to build, while 7.1, released a year later, would have taken 8000 man-years. I’m aware that there was a lot of new open source in the project, and clearly a small team wasn’t building all the code. But to claim that the extra 13 million lines of code are worth 3500 years of effort, or 400 million dollars…. I dunno, to me that’s just a joke.

And here’s the other thing that one has to keep in mind: most projects are not written 100% in-house. So this perceived value of Linux due to the use of open source isn’t exclusive to Linux or open source. At every job I’ve had, we have used third-party code, both commercial and open source, to help us get a project done faster. At my previous job, about 75% of our code was third-party. And in one specific instance, we paid about a thousand dollars to get nearly 100,000 lines of C and Delphi code. The thing with licensing code like this is that the company doing the licensing isn’t charging every user the value of their code – they’re spreading out the cost to hundreds or even thousands of users so that even if their 100k lines are worth $50k, they can license the code to a hundred users at $1000 a pop. Each client pays 2% of the total costs – and the developmers make more money than the code is supposedly worth. And clearly this saves a ton of time for the developer paying for the code in question.

If you ignore the fact that big companies can use open source (or commercially-licensed code), you can conjure up some amazing numbers indeed.

I can claim that Bloodsport Colosseum is an additional 45 months of effort simply by counting just the ruby gems I used (action mailer, action pack, active record, active support, rails, rake, RedCloth, and sqlite3-ruby). Suddenly BC is worth over $175k (remember, labor is still $40k a year and I am still assuming a low-effort project) due to all the open source I used to build it.

Where exactly do we draw the line, then? Maybe I include all of Ruby’s source code since I used it and its modules to help me build BC. Can I now claim that BC is worth more than a million dollars?

Vista is twice as good as Linux!

As a final proof of absurdity, MS has a pretty bad track record for projects taking time, and the whole corporate design/development flow slowing things down. Vista is supposed to be in the realm of 50 million lines of code. Using the same methods Wheeler used to compute linux’s cost and effort, we get Vista being worth a whole hell of a lot more:

Total physical source lines of code:                    50,000,000
Estimated Development Effort in Man-Years:              17,177
Estimated cost (same salaries as linux estimate,        $2.3 billion
  $56,286/year, overhead=2.4)

To me these numbers look just as crazy as the ones in the Linux estimate, but MS being the behemoth it is, I’m not going to try and make a case either way. Just keep in mind that MS would have had to dedicate almost 3,000 employees to working on Vista full-time in order to get 17,177 years of development done in 6.

The important thing here is that by Wheeler’s logic, Vista is actually worth more than linux. By a lot.

Linux fanatics are raving idiots

So all you Linux zealots, I salute you for being so fiercely loyal to your favorite OS, but coming up with data like this (or simply believing in and quoting it) just makes linux users appear a ravenous pack of fools. Make your arguments, push your OS, show the masses how awesome Linux can be. But make sound arguments next time.

digg this!

3 Replies to “Sloccount’s sloppy slant – or – how to manipulate programming projects the Wheeler Way”

  1. Believing so blindly in data which cannot be proven to either be true, nor entirely false, makes Linux fanatics (of which I am one) who quote this as the Bible (which I do not) look almost as bad as Mac fanatics, who adhere to the praises of a platform based on no information whatsoever. I enjoyed reading this. I use sloccount pretty consistently. I have fun looking at the dollar value and wishing that my projects were actually worth that, though they’re never nearly worth it. Anyway, good read.

  2. Oh and I can’t seem to get sloccount to count HTML. I’ve tried –addlangall and –addlang html but no dice. Any suggestions?

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.