Sloccount is my newest Awesome Software Discovery. It’s a great idea, but is
far too simple to do what it claims: estimate effort and expense of a product
based on lines of code. And really, I wouldn’t expect it to be that great.
The model used to estimate effort is certainly not the author’s fault, as it
isn’t his model. But that idiot (David Wheeler) doesn’t just say it’s a neat
idea – he actually uses this horrible parody of good
software to “prove” that linux is worth a billion dollars.
For the record, I prefer linux for doing any kind of development. I hate
Windows for development that isn’t highly visual in nature (Flash, for instance
kind or requires Win or Mac), and Macs are out of my price range for a computer that doesn’t do
many games. So Linux and I are fairly good friends. I just happen to be sane about
my liking of the OS. (Oh, and BSD is pretty fracking sweet, too, but Wheeler
didn’t evaluate it, so neither will I)
The variables
To show the absurdity of sloccount, here’s a customized command line that is
assuming pretty much the cheapest possible outcome for a realistic project.
The project will be extremely easy for all factors that make sense in a small
business environment. We assume an Organic model as it is low-effort and
most likely situation for developing low-cost software.
Basically I’m assuming a very simple project with very capable developers.
I’m not assuming the highest capabilities when it comes to the dev team
because some of that stuff is just nuts – the whole team on a small project
just isn’t likely to be having 12+ years experience, and at the top 10% of
all developers. But the assumptions here are still extremely high – team
is in the top 75% in all areas, and 6-12 years of experience, but pay is
very low all the same. This should show a pretty much best-case scenario.
Also, I’m setting overhead to 1 to indicate that in our environment we have
no additional costs – developers work from home on their own equipment, we
market via a super-cheap internet site or something (or don’t market at all
and let clients do our marketing for us), etc.
Other factors (from sloccount’s documentation ):
- RELY: Very Low, 0.75
- We are a small shop, we can correct bugs quickly, our customers are very
forgiving. Reliability is just not a priority.
- DATA: Low, 0.94
- Little or no database to deal with. Not sure why 0.94 is the lowest value
here, but it is so I’m using it.
- CPLX: Very Low, 0.70
- Very simple code to write for the project in question. We’re a small shop,
man, and we just write whatever works, not whatever is most efficient or
“cool”.
- TIME: Nominal, 1.00
- We don’t worry about execution time, so this isn’t a factor for us. Assume
we’re writing a GUI app where most of the time, the app is idle.
- STOR: Nominal, 1.00
- Same as time – we don’t worry about storage space or RAM. We let our users
deal with it. Small shop, niche market software, if users can’t handle our
pretty minimal requirements that’s their problem.
- VIRT: Low, 0.87
- We don’t do much changing of our hardware or OS.
- TURN: Low, 0.87
- I don’t know what this means, so I’m assuming the best value on the grid.
- ACAP: High, 0.86
- Our analysts are good, so we save time here.
- AEXP: High, 0.91
- Our app experience is 6-12 years. Our team just kicks a lot of ass for being
so underpaid.
- PCAP: High, 0.86
- Again, our team kicks ass. Programmers are very capable.
- VEXP: High, 0.90
- Everybody kicks ass, so virtual machine experience is again at max, saving
us lots of time and money.
- LEXP: High, 0.95
- Again, great marks here – programmers have been using the language for 3+
years.
- MODP: Very High, 0.82
- What can I say? Our team is very well-versed in programming practices, and
make routine use of the best practices for maintainable code.
- TOOL: Very High, 0.83
- I think this is kind of a BS category, as the “best” system includes
requirements gathering and documentation tools. In a truly agile, organic
environment, a lot of this can be skipped simply because the small team
(like 2-3 people) is so close to the codebase that they don’t have any need
for complexities like “proper” requirements gathering. Those things on a
small team can really slow things down a lot. So I’m still giving a Very
High rating here to reflect speedy development, not to reflect the grid’s
specific toolset. For stupid people (who shouldn’t even be reading this
article), this biases the results against my claim, not for it.
- SCED: Nominal, 1.00
- Not sure why nominal is best here, but it’s the lowest-effort value so it’s
what I’m choosing. Dev schedules in small shops are often very flexible,
so it makes sense to choose the cheapest option here.
So our total effort will be:
0.75 * 0.94 * 0.70 * 1.00 * 1.00 * # RELY - STOR
0.87 * 0.87 * 0.86 * 0.91 * 0.86 * # VIRT - PCAP
0.90 * 0.95 * 0.82 * 0.83 * 1.00 * # VEXP - SCED
2.3 # Base organic effort
= 0.33647 effort
We’re also going to assume a cheap shop that pays only $40k a year to
programmers, because it’s a small company starting out. Or the idiot boss
only pays his kids fair salaries. Or something.
Command line:
sloccount --overhead 1 --personcost 40000 --effort 0.33647 1.05
Bloodsport Colosseum
For something simple like Bloodsport Colosseum, this is an overly-high, but
acceptable estimate. With HTML counted, the estimate is 5.72 man-months.
Without, it’s 4.18 man-months. We’ll go with the average since my HTML
counter doesn’t worry about comments, and even with rhtml having embedded
ruby, the HTML was usually easier than the other parts of the game. So this
comes to 4.95 months. That’s just about 21 weeks (4.95 months @ 30 days a
month, divided by 7 days a week = just over 21). At 40 hours a week that
would work out to 840 hours. I spent around 750 hours from start (design) to
finish. I was very unskilled with Ruby and Rails, so this estimate being
above my actual time is certainly off (remember I estimated for people who
were highly skilled), and a lot of the time I spent on the project was
replacing code, not just writing new code. But overall it’s definitely an
okay ballpark figure.
When you start adding more realistic data, though, things get worse.
If you simply assume the team’s capabilities are average instead of high
(which is about right for BC), things get significantly worse, even though the
rest of the factors stay the same:
0.75 * 0.94 * 0.70 * 1.00 * 1.00 * # RELY - STOR
0.87 * 0.87 * 1.00 * 1.00 * 1.00 * # VIRT - PCAP
0.90 * 0.95 * 0.82 * 0.83 * 1.00 * # VEXP - SCED
2.3 # Base organic effort
= 0.4999 effort
This changes our average from 4.95 man-months to 7.3 months, or about 31
weeks. That’s 1240 hours of work, well more than I actually spent. From
design to final release, including the 1000-2000 of lines of code that were
removed and replaced (ie, big effort for no increase in LoC), I spent about
40% less time than the estimate here.
…And for the skeptics, no, I’m not counting the rails-generated code, such
as scripts/*. I only included app/, db/ (migration code), and test/.
However, this still is “close enough” for me to be willing to accept that it’s
an okay estimate. No program can truly guess the effort involved in any given
project just based on lines of code, so being even remotely close is probably
good enough. The problem is when you look at less maintainable code.
Just for fun, you can look at the dev cost, which is $21k to $28k, depending
on whether you count the HTML. I wish I could have been paid that kind of
money for this code….
Murder Manor
This app took me far less time than BC (no more than 150-200 hours). I was
more adept at writing PHP when I started this than I was at writing Ruby or
using Rails when I started BC. But the overall code is still far worse because
of my lack of proper OO and such. So I tweak the numbers again, to reflect a
slightly skilled user of the language, but worse practices, software tools,
and slightly more complex product (code was more complex even though BC as a
project had more complex rules. Ever wonder why I switched from PHP for
anything over a few hundred lines of code?):
0.75 * 0.94 * 0.85 * 1.00 * 1.00 * # RELY - STOR
0.87 * 0.87 * 1.00 * 1.00 * 1.00 * # VIRT - PCAP
0.90 * 0.95 * 1.00 * 1.00 * 1.00 * # VEXP - SCED
2.3 # Base organic effort
WHOA. Effort jumps to 0.8919! New command line:
sloccount --overhead 1 --personcost 40000 --effort 0.8919 1.05
This puppy ends up being 3.4 months of work. That’s 14.5 weeks, or 580 hours
of work — around triple my actual time spent!
Looking at salary info is something I tend to avoid because as projects get
big, the numbers just get absurd. In this case, even with a mere 3500-line
project, the estimate says that in the environment of cheap labor and no
overhead multiplier, you’d need to pay somebody over $10k to rewrite that
game. Good luck to whatever business actually takes these numbers at face
value!
But these really aren’t the bad cases. Really large codebases are where
sloccount gets absurd.
Big bad code
Slash ’em is a great test case. It isn’t OO, is highly complex, and has
enough areas of poor code that I feel comfortable using values for average-
competency programmers. So here are my parameters, in depth:
- RELY: Very Low, 0.75
- Free game, so not really any need to be highly-reliable.
- DATA: Nominal, 1.00
- The amount of data, in the form of text-based maps, data files, oracle files,
etc. is pretty big, so this is definitely 1.00 or higher.
- CPLX: Very High, 1.30
- Complex as hell – the codebase supports dozens of operating systems, and has
to keep track of a hell of a lot of data in a non-OO way. It’s very painful
to read through and track things down.
- TIME: High, 1.11
- Originally Nethack was built to be very speedy to run on extremely slow
systems. There are tons of hacks in the code to allow for speeding up of
execution even today, possibly to accomodate pocket PCs or something.
- STOR: Nominal, 1.00
- I really can’t say for sure if Slash ‘Em is worried about storage space. It
certainly isn’t worried about disk, as a lot of data files are stored in a
text format. But I don’t know how optimized it is for RAM use – so I choose
the lowest value here.
- VIRT: Nominal, 1.00
- Since the app supports so many platforms, this is higher than before. I
only chose Nominal because once a platform is supported it doesn’t appear
its drivers change regularly if at all.
- TURN: Low, 0.87
- Again, I don’t know what this means, so I’m assuming the best value on the
grid.
- ACAP: Nominal, 1.00
- AEXP: Nominal, 1.00
- PCAP: Nominal, 1.00
- VEXP: Nominal, 1.00
- Okay experience with the virtual machine support
- LEXP: Nominal, 1.00
- Mediocre language experience
- MODP: Nominal, 1.00
- The code isn’t OO, which for a game like this is unfortunate, but overall
the code is using functions and structures well enough that I can’t really
complain about a lot other than lack of OO.
- TOOL: Nominal, 1.00
- Again, nominal here – the devs may have used tools for developing things, I
really can’t be sure. I know there isn’t any testing going on, so I can
be certain that 1.00 is the best they get.
- SCED: Nominal, 1.00
- The nethack and slash ’em projects are unfunded, and have never (as far as
I can tell) worried about a release schedule. Gotta choose the cheapest
value here.
Total:
0.75 * 1.00 * 1.30 * 1.11 * 1.00 * # RELY - STOR
0.87 * # TURN (the rest are 1.00)
2.3 # Base organic effort
Total is now 2.166 effort. New command line, still assuming cheap labor and
no overhead:
sloccount --overhead 1 --personcost 40000 --effort 2.166 1.05
Slash ‘Em is a big project, no doubt about it. But the results here are
laughable at best. The project has 250k lines of code, mostly ansi c. The
estimate is that this code would take nearly 61 man-years of effort. The
cost at $40k a year would be almost $2.5 million! With an average of just
under 24 developers, the project could be done in two and a half years.
I worked for a company a while ago that built niche-market software for the
daycare industry. They had an application that took 2-3 people around 5 years
to build. It was Visual C code, very complex, needed a lot more reliability
than Slash ‘Em, was similar in size (probably closer to 200k lines of code),
and had a horrible design process in which the boss would change his mind
about which features he wanted fairly regularly, sometimes scrapping large
sections of code. That project took at most 15 man-years to produce. To me,
the claim that Slash ‘Em was that much bigger is a great reason to make the
argument that linux isn’t worth a tenth what Wheeler claims it is. Good OS?
Sure. But worth a billion dollars??
Linux and the gigabuck
I’m just not sure how anybody could buy Wheeler’s absurd claim that Linux
would cost over a billion dollars to produce. Sloccount is interesting
for sure, particularly for getting an idea of one project’s complexity
compared to another project. But using the time and dollar estimates is
a joke.
Wheeler’s own BS writeup
proves how absurd his claims are: Linux 6.2 would have
taken 4500 man-years to build, while 7.1, released a year later, would have
taken 8000 man-years. I’m aware that there was a lot of new open source in the
project, and clearly a small team wasn’t building all the code. But to claim
that the extra 13 million lines of code are worth 3500 years of effort, or
400 million dollars…. I dunno, to me that’s just a joke.
And here’s the other thing that one has to keep in mind: most projects are not
written 100% in-house. So this perceived value of Linux due to the use of
open source isn’t exclusive to Linux or open source. At every job I’ve had,
we have used third-party code, both commercial and open source, to help us get
a project done faster. At my previous job, about 75% of our code was third-party.
And in one specific instance, we paid about a thousand dollars to get
nearly 100,000 lines of C and Delphi code. The thing with licensing code like
this is that the company doing the licensing isn’t charging every user the
value of their code – they’re spreading out the cost to hundreds or even
thousands of users so that even if their 100k lines are worth $50k, they can
license the code to a hundred users at $1000 a pop. Each client pays 2% of
the total costs – and the developmers make more money than the code is
supposedly worth. And clearly this saves a ton of time for the developer paying
for the code in question.
If you ignore the fact that big companies can use open source (or
commercially-licensed code), you can conjure up some amazing numbers indeed.
I can claim that Bloodsport Colosseum is an additional 45 months of effort
simply by counting just the ruby gems I used (action mailer, action pack,
active record, active support, rails, rake, RedCloth, and sqlite3-ruby).
Suddenly BC is worth over $175k (remember, labor is still $40k a year and I am
still assuming a low-effort project) due to all the open source I used to
build it.
Where exactly do we draw the line, then? Maybe I include all of Ruby’s source
code since I used it and its modules to help me build BC. Can I now claim
that BC is worth more than a million dollars?
Vista is twice as good as Linux!
As a final proof of absurdity, MS has a pretty bad track record for projects
taking time, and the whole corporate design/development flow slowing things
down. Vista is supposed to be in the realm of 50 million lines of code.
Using the same methods Wheeler used to compute linux’s cost and effort, we
get Vista being worth a whole hell of a lot more:
Total physical source lines of code: 50,000,000
Estimated Development Effort in Man-Years: 17,177
Estimated cost (same salaries as linux estimate, $2.3 billion
$56,286/year, overhead=2.4)
To me these numbers look just as crazy as the ones in the Linux estimate, but
MS being the behemoth it is, I’m not going to try and make a case either way.
Just keep in mind that MS would have had to dedicate almost 3,000 employees
to working on Vista full-time in order to get 17,177 years of development done
in 6.
The important thing here is that by Wheeler’s logic, Vista is actually worth
more than linux. By a lot.
Linux fanatics are raving idiots
So all you Linux zealots, I salute you for being so fiercely loyal to your
favorite OS, but coming up with data like this (or simply believing in and
quoting it) just makes linux users appear a ravenous pack of fools. Make your
arguments, push your OS, show the masses how awesome Linux can be. But make
sound arguments next time.
digg this!