11 August 05 - 19:58Bug Between Keyboard and Chair
James Bach has written a practical, useful, excellent entry on
investigating intermittent bugs (or other types of problems, but us software types are more likely to be interested in investigating software defects).
A lot of the advice applies not just to intermittent problems, but to persistent ones as well: the ones that stubbornly resist our best efforts to solve them. For instance, bugs that you stay up all night chasing, fixing one thing after another, each time thinking you've nailed it but seeing it pop up again, undefeated, like a jack in the box.
James' advice is in crisp checklist style, so it will come in handy as a way to jog your brain if you ever come across one of these pesky problems. Note how many of the items relate not so much to the "objective" definition of the problem, but to the role of the observer - that's you.
Most of the time, we approach debugging from a purely objective standpoint. For instance, we use tactics like the binary chop: delete half the code, if the bug still occurs it must be in the remaining half, otherwise it was in the deleted half. (Works fine, except when the bug stops occuring if you delete
either half; or, more irritating still, keeps occurring whichever half you delete.) We list hypotheses and devise tests that which invalidate them, and so on.
These are all useful tactics. But problems don't exist in isolation - wherever there's a problem, there must be a problem-solver. In many situations that I recall, an intermittent or a persistent problem directly had something to do with
my efforts to solve it.
(If you have war stories of that kind to share, feel free to use that "Comments" link...)
- Testing -
-
§ ¶
10 August 05 - 19:05You don't hack the code, why hack the process ?
We rely a lot on software tools already, so why not for user stories ? Why do you make such a big deal of using index cards for user stories, rather than an electronic document ?
No big deal. It's just that the properties of the two media are different. This means that people will behave differently when using one as opposed to the other.
By comparison, think of reading books on paper vs. on-screen. There are practical, economic and affective differences. Practical: it's easier to carry the paper book around, to flip idly through the pages, etc. Economic: if you lose the book, it's cheap to replace, and cheap to use for other purposes (doorstop, fire starter, whatever). Affective: many people prefer the "feel" of paper books.
All this is a matter of preferences that are rather easily shifted, but nevertheless it has a very real and tangible effect on behaviour: people treat stuff on paper differently than they treat stuff inside computers.
Similarly index cards imply different behaviours than an electronic equivalent. "Looking at the stories", when they are on cards, involves getting up from your seat, going to the wall display where they are arrayed, perhaps taking them down from the wall temporarily before pinning them back. On your way to the wall you'll be looking around the office at your coworkers, perhaps waving "hi" or having a word with someone in passing.
Contrast the computerized "user story": you open the Excel spreadsheet, look at it, close the document. Economical in effort - you have hardly moved a muscle. You have also not had a conversation, or a chance to notice your coworkers. Of course these are subtle effects - but does that mean they will have negligible consequences ?
Now, your responsibility, when you take the decision of tracking stories in software, is to *think* about what effects it will have on how people estimates schedule and effort, how they prioritize by value, how easily they will converse about story details, and so on.
Here's a suggestion. Take a blank sheet of paper. Draw a vertical line down the middle, and a horizontal line through. The left-hand side is "user stories on index cards". In the top quadrant, write at least five important positive effects they could have on your project. In the bottom, write five important negatives. Do the same thing on the right hand side, for electronic stories. Then repeat the whole thing within a group with your teammates. Once you've done this, decide what to do. A few months later, re-evaluate the decision, again in a group. Did it bring the expected benefits ? Did it have the expected shortcomings ?
There are other ways of making a considered decision about cards vs. electronic. My point is that it's not enough to think to yourself "just because Beck says so, doesn't mean cards are really important, so I'll just use software, because a tool is cool".
If you do that, please don't go around saying you have "adapted XP". You have merely hacked around with it. And hacking is precisely what one would want to do less of.
The question at the top paraphrases a newsgroup posting I wanted to reply to - except for some reason the nntp server keeps bouncing my posts back. Responding here serves my purposes just as well...
- Extreme Programming -
-
§ ¶
09 August 05 - 15:27Fitnesse, I almost knew you
I've been doodling Customer Tests for a "virtual GoBan", in response to an
old UseNet thread recently resurrected that aimed to compare TDD with a more
formal Design By Contract approach.
As mere doodles, the tests are not executable. I'm writing them using the
tool I favor for all documentation: the ASCII text editor. Reading the tests,
one should form a good idea what the software does; take away a basic
understanding of the board and stones, the rules of the game, and a "theory
of operations" - some model of how the software knows to do what it does.
Here, for instance, is one of the first half-dozen tests, just starting to
lead up to interesting functionality:
# When playing a game, players take turns playing; black always starts
BOARD 2
START
. .
. .
BLACK'S TURN
That style of testing came naturally - I'm sending "commands" to the
application (before the blank line), then somehow getting it to display
its state (game board and an indication of whose turn it is).
Turning the example into an executable test is a matter of parsing the string,
splitting it at the blank line, treating the first part as commands and
verifying that what the program "outputs" at the end is the same as the last
part.
At this point I had a minor brainstorm. I remembered that one of the most
charming Wikis around, Sensei's Library, has a nifty markup format for Go
boards, that can render just about any Go configuration or fragment thereof
to a neat-looking PNG image. This would be particularly appealing for tests
meant to serve as documentation & specification.
Thinking of Wikis, I was primed to think of Fitnesse. Fitnesse is to
acceptance tests what Sensei's Library wiki engine is to Go boards; it's a
domain-specific Wiki. As a Wiki, provides a central repository for documents
that multiple people can edit collaboratively. As an acceptance test engine,
it provides a way to exercise application code against human-readable
examples, and verify them automatically.
So, I invested some time integrating Fitnesse and the Go markup renderer.
And ran into a few enlightening problems.
The first thing I had to do was choose a type of Fixture. In Fit (the
acceptance test engine underlying Fitnesse), Fixtures are somewhat like an
inventory of test grammars to choose from - they are different ways of
representing "if this happens to the program, I want it to respond with that".
There is ColumnFixture, which is useful when the application has several
inputs, and one or more outputs: "if I set the input values X, Y and Z
just so, then it should come back with exactly the speed of light."
There is RowFixtures, used to test the result of queries ("Give me
all X such that Y.") And there is ActionFixture, which corresponds roughly
to the style above: a series of commands, and verifications.
ActionFixture is not, from my present uninformed point of view, the best
feature of Fit; from the little I've seen it seems to do better at
representing regular inputs and outputs, à la ColumnFixture. For script-
like tests (which is what ActionFixtures appear to be), some of the other
acceptance testing tools around might even feel more flexible; Fit's major
constraint (as well as major advantage) is the table-like structure of test
code. It looks great as documentation, but is more cumbersome to write.
The next problem was in converting my test boards to PNG images. What I
wanted was this: when you looked at a test for the virtual GoBan, you'd see
the commands in the left column, and the resulting boards on the right. I had
come across "custom data types" in the Fit documentation: the implied promise
was that, beyond strings and integers, I could supply actual Domain Objects
in Fit cells. What I needed to do was provide a way of turning my objects
into strings, and parsing them from strings. Easy enough.
It didn't work - and couldn't work.
The problem is that Fitnesse never calls the custom test code provided in
the developer's own classes - at least not until you press the Test button,
to actually execute the tests. Before that point, all it does is render its
specialized Wiki markup to tables, rows and cells. If I wanted to spice up
the rendering before test execution, I had to make changes to the Wiki
engine.
It turned out that Fitnesse does provide a way of extending its Wiki markup
formats; subclassing (yuck) a WikiWidget class and hinting at its name in
a "plugins.properties" file will do the trick. So, I went about providing
a WikiWidget to do just that.
That's when I ran into the next problem.
It is natural for my Go board representations to span multiple lines: you
see a Go board even when you look at the ASCII. Rendering it to an image
is a bonus, but not necessary. The problem is that the table markup format
in Fitnesse doesn't at all deal well with cell data that spans multiple
lines. Each table row must fit in one line; if it doesn't, the rest of the
line is ignored and the Wiki displays it as if it wasn't part of the table
at all.
I looked around for a fix. I browsed the Fitnesse source code, the Fit source
code, and the tests for both. I noticed something interesting while I was
there: there is quite a lot of code to Fit and Fitnesse, but it's still
fairly easy to navigate. Some of that is due to the code being well-factored;
some of that is due to the tests providing convenient entry points, close to
the code, for such explorations; and some of that is due to the splendid
and ever improving navigation capabilities in modern IDEs. (Today I was
using Eclipse, but I'm sure IDEA would have been fine.)
Still, what emerged at the end of the day was that I was basically screwed,
short of opening up the Fitness code itself for some radical surgery.
The only thing, apparently, that can span multiple lines is a markup format
for "literal" text (not interpreted at all by the Wiki engine); this is
handled by a WikiWidget called PreProcessorLiteralWidget.
Now, by that stage I was no longer doodling but frankly hacking. So, with
little remorse, and safe in the knowledge that I'd just toss the whole thing
when I was done, I copied and pasted, wholesale, the code of PreProcessorLiteralWidget
into my own plugin widget.
It didn't work. (It couldn't work.)
It turns out that PreProcessorLiteralWidget doesn't pull this trick by its
lone self. It gets an assist from the rendering engine, which grants it the
special status of being the only widget capable of processing "literals".
There was a slight sense of outrage at this. From the outside, the "built in"
plugins appear to be just the same, and to be treated just the same, as the
ones written by Fitnesse extension developers. And perhaps they are, all of
them - but for one exception I'm now aware of, PreProcessorLiteralWidget.
So, I sat back and took stock. Fitnesse is a Wiki specialized for acceptance
testing, which wraps the testing tool Fit, whose strengths are complementary
to those of tools geared to script-style tests. I seem to be working toward
script-style tests, written in an improvised markup format which conflicts
with what Fitnesse uses. No big deal, just a missed encounter - and I now know more about Fit and Fitnesse.
Still, I'm feeling just a little bit comforted in one of my prejudices
about acceptance testing: it's better to start with the tests (even doodled
tests) than to start with the testing tool. Let your tests tell you whether
the tool is suitable - not the other way round.
Doodle first - hack later.
- Extreme Programming -
-
§ ¶
02 August 05 - 14:31Software sphexishness
I'm with Thomas Gagne when he says:
The next paradigm in programming should be modeled after how humans deal with procedure and exceptions, and learn from it. HUMANS are incredibly flexible and adaptable, and can respond in real-time to change.
Definitely a profound idea. I'm far from sure that we're anywhere near ready for that - we're encumbered by ideas about cognition and computing that make it difficult to even think of stating the problem that way.
It's not just humans who are flexible - so are many other systems geared to survival, or at least stability. Stable systems show equifinality - they get to the same end state irrespective of starting conditions (or perturbations) within some range of tolerance. Species in an ecosystem are intricately interdependent, but most ecosystems can afford to lose one species.
The software systems we're able to design also show interdependence, but when one module fails (even a small one) too often the whole thing fails. The famous Ariane crash is a good example of this fragility; the bug that was the ultimate cause of the crash had caused a fault that was entirely irrelevant, at the time it occurred, to keeping the rocket in the air. A more proximate cause of the crash was that the main guidance computer interpreted an error message as if it had been a command to the boosters.
Think about it. That's a bit like a car driver hearing someone sneeze... But rather than disregard the "utterance" as irrelevant, the driver is compelled to assing meaning to it. Arbitrarily, the driver decides it means "turn right all the way". As errors go, it's totally bizarre.
And yet so many systems "designed" under the prevailing assumptions as to what constitutes "design" exhibit this kind of behaviour, this mix of awesome intelligence and utter stupidity. (I'm reminded of the quality Doug Hofstadter calls "Sphexishness".) The lesson ? It's time to confront the hypothesis that there's something wrong with prevailing assumptions as to what constitutes "design". For instance, the idea that emergent behaviour makes systems inherently unsafe, whereas "designed" behaviour is inherently safer.
- Software Development -
-
§ ¶