Mark Kollasch's Incomparable Software-Making Weblog

Go ahead and try to compare it. You can't.

11 February 2010

Predicting Fault Incidence using Software Change History

...is the title of a paper, published July 2000 in IEEE Transactions on Software Engineering, by Todd L. Graves, Alan F. Karr, J. S. Marron, and Harvey Siy. They propose a method for estimating the number of faults in a program based on data collected about the development process rather than just the program itself, built on the insight that bugs do not arise in a vacuum, but must at some point be added to the program. (They examined a number of common (for the time) metrics of "complexity," most of which correlated well with each other and with size metrics such as lines of code, suggesting circumstantially that process measures are more meaningful predictors than artifact measures.)

Specifically, they studied an enormous telephone-switching program by Bell Labs, written in C and a domain-specific language, and used the source control history as their data set. Seems sensible, right? And in the end, their statistical model did demonstrate more predictive power than just sheer lines-of-code, without even considering any attribute of the software itself.

The most useful predictive measure that they examined was the average age of each change in a module, weighted by the size of the change; the older, the less buggy. Obviously, this doesn't mean that the best way to fix bugs is to let your code gather dust in a repository for a few years. Rather, what it means is that the longer a piece of code has been in use and under development, the more likely it is that any bug in it has been found and corrected. It amounts to an endorsement of thorough testing.


A correlation was found between the number of changes made in one version and the number of faults found in the next. It is self-evident that bugs cause changes, but this finding also suggests that changes cause bugs.This is similar to the above finding, but its implications are more circular. Like another strong predictor, that the number of faults in previous versions correlates well with the number of faults in the current version, it seems merely an affirmation of ontological inertia: buggy code tends to remain buggy; or even tautological: buggy code is buggy.

Interestingly, and with implications for the open source movement, there was no correlation whatsoever between fault incidence in a module and the number of developers which committed changes to that module (and, in a program of this size and age, this sometimes measured in excess of one hundred developers): the authors noted that "too many cooks" did not spoil the proverbial broth, but their data also show that many eyes did not make all bugs shallow.

Labels: ,

0 Comments:

Post a Comment

<< Home