Wednesday, June 13, 2012

Green Patches


// Start with a quote relevant to today’s subject... even if you’re not sure who said it originally.
// To do: find out who said it (T.M. 06/2012)
Documentation is like sex: when it is good, it is very, very good; and when it is bad, it is better than nothing.
- Dick Brandon, possibly.


// First paragraph, introducing the subject and attempting to use humour.
// @return LOL on success, WTF on failure
The problem with the above quote is that it seems to be correct.
It completely neglects, however, how unpleasant it is to have somebody shove their... documentation... where you don’t want it. Bad sex is better than nothing? I don’t think so.

// Second paragraph, talking about the purpose of documentation, attempting humour again.
Why is it that documentation, when it’s good, is very very good? It’s not because it’s an amazing piece of prose with cunning criticism about modern society. It’s not because the rhymes and meter make it a poem so beautiful as to make grown men weep. It won’t even summon Cthulhu from the depths if you chant it under a full moon.
No, the purpose of documentation is to help you understand the code. If it allows you to understand something that you’d be unable to understand otherwise, then it’s very, very good indeed.
But not all documentation is like that. // Ominously stating the obvious

Some documentation, like the comments I’ve liberally inserted into the above text, just don’t fit this purpose. They don’t explain anything that you need to have explained to you, they don’t tell you anything you don’t already know, and in at least one case they’re a bit senseless. But they do make the text messy and harder to read. In particular, because documentation tends to be in a different language (English and C++ are not the same language), it’s like trying to listen to one of those foreign politicians and his real-time translator simultaneously.

Comments almost always break the text. A well written function is close enough to a coherent paragraph that you can read and understand, but then someone randomly inserts a sentence in a foreign language right in the middle of it. You can probably read it, of course, but it’s just so much more difficult than it should have been.

Now, before you stone me to death, I’d like to point out that I’m not, I repeat, not saying that all documentation is bad and must be avoided.
I’m just saying that documentation, like most overhyped but potentially harmful things, should be kept to a bare minimum. Some things require an explanation, and some things don’t. Consider this absolutely real piece of code defining data members inside some class (C++):

// Handle to window
HANDLE m_hwnd;

// Resource ID
UINT m_resourceID;

Please allow my good friend Nicolas to summarize the obvious reaction:

// True story
For some reason, many people appear to think that this sort of documentation is not just beneficial, but mandatory, so let’s consider it seriously. We’ll start with the first data member.
Option 1: this gets read by a person who knows what a handle to a window is. This person surely knows that this usually gets abbreviated to hwnd, and it even says HANDLE, and he doesn’t need the documentation.
Option 2: this gets read by a person who doesn’t know what a handle to a window is. This person reads the data member’s name and type, looks up at the documentation, realizes that it’s completely useless, and decides that it’s time for another coffee break.
I hope you can see that there is no third option.
The second data member is even more fun: it actually contains less information than the code it tries to document, because it has no type.
So I hope everybody would agree that these comments contribute nothing to the code’s clarity.

Do these comments do any harm though? Let’s see.
As I’ve pointed out before, shorter is better. This documentation bloats the text by 100% (not counting the empty lines). In a real class, with 30 data members instead of 2, this would take 90 lines.
90 lines! Can you imagine trying to find anything inside something that huge? Sure, our monitors are getting bigger and our pixels are getting smaller, but that’s still way more than you can fit on a monitor, so you’d have to scroll while trying to handpick the relevant bits /* attempting to locate the relevant bits */ out of a huge list /* inside a long list */ that repeats everything twice /* that repeats each word two times */.
Was the last sentence fun to read?

Now suppose you write those same data members like normal humans write normal lists. Suddenly it’s only 30 lines. Does that feel too long to conveniently skim through? No problem, now that it’s no longer mind-numbingly long you can easily sort it by type, category, degrees of separation from Kevin Bacon, or anything else, and add little labels for each sub-list, and still have most of it (and probably all labels) fit on your monitor. The result: a short list in one language that a human can read.

The same idea applies to documenting inside functions. We all know that functions should be kept as short as possible... but too often people decide to waste a huge chunk of your monitor’s precious real-estate with code like this:

// try to open file, check the return value
if (!File.Open(“nuclear_codes.txt”)) {
// oops, failed!
return E_FAILED_MISERABLY;
}

// True story; the names of files and symbols have been altered to protect their identities.
Since this, too, is apparently considered good practice, let’s overanalyze again:
The first comment tells us that File.Open attempts to open a file, and that putting it inside an if checks the return value. Thank you, kind programmer, for this valuable info. Please scroll back up to Nicolas.
The second comment, though whimsical and cheerful, tells us that if File.Open returned false, it means that it failed. Which, assuming I have never programmed before, I could not possibly guess by the context (return E_FAILED_MISERABLY).
Obviously, this last example wouldn’t work if you replaced File.Open with foo and E_FAILED_MISERABLY with 3. But this code right here simply does not require any documentation. It’s self explanatory.

That, in a nutshell, is what code should strive to be.
Instead of worrying about how to carefully document every expression in your code, worry about how to write each expression in a way that does not require too much documentation. Save the documentation for ideas, not expressions. Why does this algorithm work? Why does this code do this odd thing? What does this function assume about its input? What is the purpose of this module/ block? These are all things worth documenting, and as you can see, none of them would logically result in green patches of comments in the middle of your code.

No comments:

Post a Comment