Code Matters: What's In A Name (part 1)

“Real names tell you the story of the things they belong to in my language, in the Old Entish as you might say. “
- Treebeard, Lord of the Rings

Do you know Muhammad?
Come on, you must know Muhammad. It’s one of the most common names in the world, not to mention the name of several celebrities. I’m sure that anyone reading this knows at least one Muhammad. But when I asked the question you had no way of knowing which Muhammad I was referring to, so you can’t know whether or not you know him. Worse yet, if I had asked for any details about him, this simple misidentification could have led to a lot of confusion.
This could have been resolved if only I added another identifying detail. Perhaps the word “prophet”.
Alright then. Let’s try something else.

Do you know Pratibha Devisingh Patil?
This is a much less common name. It also came with a middle and last name, to make it more specific. I believe there’s about a billion people, give or take a few hundred millions, who would immediately know the person I’m referring to. But I’m pretty sure most of the people reading this have no idea who this person might be, and neither did I until a few minutes ago, when I googled “who is the current president of India”.

This is related to coding, of course. A great deal of coding is about giving things names, and one of the most significant things that make the average piece of code difficult to read is bad names. Variables need names, functions need names, types need names. I hope that the above examples showed that making a name mean the same thing to two people is not trivial, and therefore deserves some thought.

The main thing to notice in the above examples is this: context is everything. After all, every word in every language is just a name for something. “Dog” is simply the name we English speakers have agreed to use for a certain type of animal, but obviously to anyone who doesn’t speak English it’s at best a meaningless syllable, and at worst a completely different word.

The name Muhammad would probably contain enough information if used in a conversation between two people who share an acquaintance by that name. More importantly, as soon as the identity of Muhammad is clear to the other side, the name suddenly takes on a whole new meaning, and contains a lot of information. If you know Muhammad, you probably know that he’s relatively tall and has black hair and wears glasses, and you might also know that he’s 38 years old and has a dog. If he’s more than a mere acquaintance, you might also know that his favorite ice cream flavor is strawberry, and he often plays football on Tuesday nights. Pretty impressive compression algorithm, fitting all of that information into one word!

The name Pratibha Devisingh Patil showed us, however, that simply adding information does not necessarily improve matters. Adding the middle and last name told us nothing more than the middle and last name. It gave you no hint about her height or hair colour at all. In fact, someone who isn’t from that part of the world might not even guess from the name that it’s a her. If you don’t know the person, the name is not going to help you one bit.
The middle and last name only help to make the name more specific, so that if you decide to search for information you can tell this person apart from other Pratibhas.

So, contrary to what appears to be the common Entish opinion, a name probably shouldn't tell the whole story, because that would take awhile. Humans, being mortals, are always in a hurry and therefore keep trying to make names shorter. You barely see any Jonathans or Alexanders around anymore; they’ve all been turned into Johns and Alexes (or Sashas, if you’re from that general area). If you know Jonathan, then “John” is enough to tell you the whole story. If you don’t, then you’ll have to be told the whole story anyway, and adding a few letters or words won’t make a difference. Israel’s prime minister might technically be Binyamin Netanyahu, but when one Israeli mentions “Bibi” to another Israeli, rest assured that everybody knows whom it refers to and using the full name won’t add any value.
From all of the above we can reach some pretty unambiguous conclusions about which parts of a name are necessary and which are redundant. In addition, we have overwhelming evidence that wild untamed humans, living in their natural habitats (i.e. soft furniture in front of a large LCD monitor) display a clear tendency to make names as short as humanly possible. Now, let’s see if the conclusions we just reached can be applied to actual programming.

Muhammad is a bit like i.
Everybody knows i. When you see a variable named i, you can immediately conclude that a) you’re probably inside a loop, b) this loop probably uses i as its counter, and c) i is probably an integer, probably unsigned, and probably used as the index for some collection (i stands for “index”, after all) or just a counter. Just like when you mention your coworker Muhammad to another coworker who knows him, he will immediately assume with a high probability many facts that he knows about Muhammad, because in the context of talking to you, the name Muhammad means that certain Muhammad that you both know. If he doesn’t know Muhammad, well, just imagine your confusion when you see someone using i outside of a loop. It’s all in the context.

This is why I’m baffled to see people iterate over a list with a variable named ui_index. That’s 8 times more characters than i, and does not contain a single bit of extra information. Translated back to people’s names, this is the equivalent of asking your coworker “Have you seen Muhammad, the relatively tall guy with the black hair and glasses, today?”. He already knows that stuff. He works with Muhammad, and now he just thinks you’re a bit eccentric for saying odd stuff like that. If you have more than one Muhammad in your team at work, well, that’s what last names (which are often used in such cases instead of the first name) and nicknames are for, and they’re more effective than describing the guy every time you mention him. Back to the code, if you have 2 nested loops and need 2 variables, would the name ui_index actually help? It’s a pretty safe bet that the other variable is also an unsigned integer and an index*, and ui_index2 simply doesn’t provide valuable extra information. In fact, it just adds confusion because you now have to look carefully to tell the difference between the variables, since they’re 89% identical. And the names don’t tell you anything about the contents, since each of them could be the index of practically anything, and if you’re thinking of adding that to the names as well, then you’ve just said “Hey, have you seen Muhammad, the relatively tall guy with the black hair and glasses, who’s 38 years old and has a dog, and whose favorite ice cream flavor is strawberry, today?” to your coworker, and he’s probably smiling at you nervously while dialing the number for security.

Pratibha Devisingh Patil, on the other hand, is like FACTMapperUtils.ReplaceEUWithMRU(string) (no offense).
Woah, look at that name! It’s full of letters and words! Surely your average Entish programmer must be delighted with such a detailed name!
Wrong. Even Treecontrol, the Entish programmer I just made up, would be baffled, because he has no idea what this Mapper is, and what the replacement implies in practice and what side effects it might have and what has to be initialized for it to work. Since he’s an Ent, he might try to solve this by replacing all those initials with full names, which just shows you why Ents don’t make good programmers. I can tell you that FACT is Fast Characterization Tool, EU is Electronic Unit, and MRU is Magnetic Receiving Unit, but none of those details helps in this case, because each of those still means nothing. And, of course, if you expand all those initials then you get an even longer and more confusing name that has no extra information.

This brings us to the essence of names.
Names are just identifiers. Everyone who works on the FACT project knows what FACT is, but there is no reason to expect anybody else to know it. In fact, it wouldn’t surprise me to hear that there are people who knows what FACT is without knowing what the initials stand for. And why shouldn’t they? When you hear someone mention the USA you don’t have to mentally substitute it for the United States of America in order for the sentence to make sense. USA is, by now, a name in its own right.
In short, if you’re out of the loop about the relevant subject, a longer name will just confuse you instead of helping. If you need to ask someone, whether it’s a coworker or Google, about what something means, you only need the name to be specific enough; beyond that point any additional letter is a waste of time.

In conclusion, the purpose of names is to create distinction and identity. A name should act like a pointer, containing minimal information but pointing you unambiguously in the direction of the rest of the information, whether that information is in Google, another source code file, or in your own head.
The only two things to worry about are:
1. Is the name distinct enough in the context? Shouting “hey kid!” might work if there’s only one kid in a 2km radius, but in a full classroom it just isn’t specific enough.
2. Is the target audience likely to associate this name with the entity you’re referring to? If you have a Spanish friend that nobody else in the room has ever heard about, just saying “I wonder what Jesus is doing right now” might be confusing.
The purpose of a name is not description, or you end up being an Ent. Which is fine by me, I’m not a racist or anything, just keep in mind that by the time you finish introducing yourself, I’ll be releasing version 2.1 of my software.

* Unless you’re the sort of programmer who uses iterators instead. That’s a different case because of the way the variable gets used, and might justify a more detailed name in a nested loop.

Code Matters

Saturday, June 9, 2012

What's In A Name (part 1)

No comments:

Post a Comment