For a while now I've been trying to reconcile my skepticism about most automated testing with the importance many smart people (including hiring-type folks!) place on it. Some of the problem is that I've never seen it done really well, so at this point I lack the experience to safely find a way to those safe, refreshing waters between
- the Scylla of tests that work over trivial functionality and
- the Charybdis of test that blow up even with the most well meaning of refactorings.
I keep on pondering... though I live in fear that I'll never know if I'm just rationalizing my own laziness and prejudice for "making stuff!" over "engineering", or if I can be confident enough in my own experience to really justify a stance that doesn't love unit tests.
Thinking about my own coding process: I break what I want the software to do into almost ridiculously small pieces and so have a super tight code-run-inspect results loop... (I really hate situations where the a whole aggregate of code is not working but I have no idea where my assumptions are incorrect. (Pretty much all debugging is finding out which of your assumptions you made in code form is wrong, and when.)) So as I code up a complex bit, I write a lot of "runner" code that exercises what I'm doing, so that I can get to a point where I'm looking at the results as quickly as possible. This might be where I part ways from the Flock: I view this code as scaffolding to be thrown away once the core is working, but the Unit Testing faithful would have me change that into a test that can be run at a later date. There are two challenges to that: one is most of my scaffolding relies on my human judgement to see if it's right or wrong, and the other is my scaffolding is designed for the halfway points of my completed code. Parlaying it into a test for the final result that gives a yay or a nay seems tough; doing so in away that survives refactorings and also does a good job of evaluating the UI aspect of it (often a big part of what I'm working on) seems almost impossible.
Some of my skepticism, too, comes from the idea that... small bits of code tend to be trivial. It's only when they're working together that complexities arise, chaos gets raised, and assumptions get challenged. So I'm more of a fan of component testing. And it's great that you've made your code so modular that you can replace the database call with a mock object but... you know, when I think about the problems I've seen on products I've built, it's never stuff in these little subroutines or even the larger components. It's because the database on production has some old wonky data from 3 iterations ago that never got pushed to the DBs the programmers are using. It's because IE8 has this CSS quirk that IE9 doesn't. In other words, stuff that automation is just terrible at finding.
Two other things add to the challenge:
- A coder can only write tests for things he or she can imagine going wrong. But if they could imagine it going wrong, they would have coded around that anyway.
- It's very hard to get someone to find something they don't really want to find. (i.e. a bug in their code.) This is why I only put limited faith in coders doing their own testing, at least when it's time to get real.
So, this is where I am now. But I'm defensive about it, worried it makes me look like a hack... and to some extent I can be a hack, sometimes, but I'm also a hack who is good at writing robust, extensible, relatively easy to understand code. Anyway, to defend myself, I've sometimes paraphrased this idea that "any sufficiently powerful testing system is as complex (and so prone to failure!) as the system it's trying to test." But I may have just found the original source of this kind of thinking, or at least one of the two... it comes from part of a talk Joel Spolsky gave at Yale:
In fact what you'll see is that the hard-core geeks tend to give up on all kinds of useful measures of quality, and basically they get left with the only one they can prove mechanically, which is, does the program behave according to specification. And so we get a very narrow, geeky definition of quality: how closely does the program correspond to the spec. Does it produce the defined outputs given the defined inputs.The problem, here, is very fundamental. In order to mechanically prove that a program corresponds to some spec, the spec itself needs to be extremely detailed. In fact the spec has to define everything about the program, otherwise, nothing can be proven automatically and mechanically. Now, if the spec does define everything about how the program is going to behave, then, lo and behold, it contains all the information necessary to generate the program! And now certain geeks go off to a very dark place where they start thinking about automatically compiling specs into programs, and they start to think that they've just invented a way to program computers without programming.Now, this is the software engineering equivalent of a perpetual motion machine. It's one of those things that crackpots keep trying to do, no matter how much you tell them it could never work. If the spec defines precisely what a program will do, with enough detail that it can be used to generate the program itself, this just begs the question: how do you write the spec? Such a complete spec is just as hard to write as the underlying computer program, because just as many details have to be answered by spec writer as the programmer. To use terminology from information theory: the spec needs just as many bits of Shannon entropy as the computer program itself would have. Each bit of entropy is a decision taken by the spec-writer or the programmer.So, the bottom line is that if there really were a mechanical way to prove things about the correctness of a program, all you'd be able to prove is whether that program is identical to some other program that must contain the same amount of entropy as the first program, otherwise some of the behaviors are going to be undefined, and thus unproven. So now the spec writing is just as hard as writing a program, and all you've done is moved one problem from over here to over there, and accomplished nothing whatsoever.This seems like a kind of brutal example, but nonetheless, this search for the holy grail of program quality is leading a lot of people to a lot of dead ends.
I would really appreciate feedback from veterans of the testing wars here, either people who see where I'm coming from, or who vehemently disagree with me, or best yet who see where I'm coming from but can show guide me to the Promise Land of testing feeling like a good of knowing code is working and communicating with other developers rather than an endless burden of writing everything twice and still having the thing flop in production.