Code Matters

Saturday, August 4, 2012

Another Dimension

Since I keep saying that shorter is better, and at the same time that code needs to be readable, I figured I should dedicate a post to the other dimension of your monitor, which tends to be overlooked when people think about code readability. That's right - an entire (not very long) post about the horizontal placement of your code!

First of all, indentation. It pains me that I feel I have to bring it up at all, but it seems that the subject of indentation is not obvious to all experienced, professional programmers out there.
The ~~common~~ correct way of doing it is adding a tab (2-4 spaces usually) at the beginning of the line every time you enter a new nested block, and a block should be defined as anything that counts as a local scope in C. That is to say, if your if() has only one line after it and no braces, it's still a block and needs to be indented, but switch() cases, on the other hand, aren't blocks, it's just a fancy multiple goto statement inside one big nested block, and there's no reason to waste 2 tabs on it as some people determinedly do.

One thing that can ruin this simple indentation method is C/C++ preprocessor instructions, which, as you know, don't get indented because they're not part of any block per se. Many a function has been butchered by a sudden spike in the indentation, like so:

            ...
            if (x>9000) {
                fire (LASER);
#ifdef _DEBUG
                log ("It's over 9000!!!");
#endif
            }
            ...

That just isn't pretty, and not as easy to read.
Why not do something like this instead?

#ifdef _DEBUG
#define DEBUG_ONLY
#else
#define DEBUG_ONLY //
#endif

            ...
            if (x>9000) {
                fire (LASER);
                DEBUG_ONLY  log ("It's over 9000!!!");
            }
            ...

Better, don't you think?
And if you tend to have _DEBUG blocks longer than one line and don't want to specify DEBUG_ONLY in every line (for some reason), you could still keep the pretty indentation like this:

#define DEBUG_START #ifdef _DEBUG
#define DEBUG_END #endif

Fortunately for this post, there are a few other good uses for your Tab key.
One thing I'm surprised I rarely see is code formatted to look like a table. Whenever you have a series of statements that are very similar in some way, they could become more readable that way. Be it an array initialization, the declaration of data members, or just several lines that do similar things (e.g. the code at the end of this post), making it look like a table helps the reader understand the structure and the similarity between the lines, while at the same time drawing attention to the differences between them. For example:

int m_cVampires;
int m_cZombies;
CWeapon m_weapon;
string m_name;

Just looks less nice than:

int       m_cVampires;
int       m_cZombies
CWeapon   m_weapon
string    m_name;

That's a very simple example, of course. This method would be much more useful in this sort of code:

AddButton (ICON_NEW,         "New File",   "Ctrl+N",   x+=32);
AddButton (ICON_OPEN,        "Open File",  "Ctrl+O",   x+=32);
AddButton (ICON_SAVE,        "Save File",  "Ctrl+S",   x+=32);
AddButton (ICON_PRINT,       "Print",      "Ctrl+P",   x+=48);
AddButton (ICON_BOLDTEXT,    "Bold",       "Ctrl+B",   x+=32);
AddButton (ICON_ITALICTEXT,  "Italic",     "Ctrl+I",   x+=32);
...

In my opinion, the above piece of code looks much better than it would have without the tabs.

A third use for all that empty space on your fancy wide screen is documentation, and this tends to work well combined with the previous idea; when you have a list of things, and you're keeping the names in your code properly short, you might like to add a few green words next to each item for clarification, for example:

int       m_cVampires;     // number of vampires on the screen
int       m_cZombies       // number of zombies on the screen
CWeapon   m_weapon         // player's current weapon
string    m_name;          // player's name

That saves you a whole lot of space compared to the common method of writing each of those things above the item it describes, and in my personal opinion is also easier on the eyes.

I'm afraid these are all the relevant ideas I have at the moment, but I think that even those could be very helpful. Since wide screens have become the standard, instead of the long screens that would probably be better for anyone who uses their computer to read or write things, we might as figure out a way to use each line to its full potential.

Got any more space-saving ideas? Feel free to comment!

Saturday, July 21, 2012

Tomorrow's Programmers

Lately I find myself thinking a lot about how, given the chance, I'd attempt to help the next generation of programmers become better at it. It's pretty natural, considering my deep hatred towards what currently passes for education in this field. I figured that since there's no chance of these ideas becoming reality (or earning money), I'll just try to write down some of the thoughts tumbling inside my head.
Mind you, this isn't my usual ravings about how I think things should be done; this is what I, personally, with the benefit of hindsight as well as professional experience, wish I had when I was younger.
This is going to be long.

The biggest problems with the current state of education in this field are pretty obvious. A person who both loves to program and is good at it would rarely pick a career path that pays 80% less than actual programming. Also, someone, somewhere, has managed to convince everyone that the most important skill for programmers to learn is abstract math, rather than, you know, programming. Combine that with the fact that this sort of thing is really, really hard to teach, and you've got yourself a problem.

I don't know how to fix the first 2 problems, so I'll mostly try to tackle the third. The only way to learn about programming is to program, a lot, and seriously. All this "computer science" business is like trying to make someone a good athlete by teaching them a lot about anatomy, nutrition, maybe physics; yeah, it might allow a good athlete to become an excellent athlete, but unless they spend several hours a day practicing the actual sport, they will never become athletes at all. This is the wrong place to start.

What I'm imagining here is based mostly on a lot of programming work. You might be able to code a nifty little game in 8 hours and sell it to everyone who owns an iPhone and become a millionaire, and that's admirable. But one day you will start working on a large scale project, and you're going to find a completely new set of problems. These problems aren't impossible, some not even very hard, but they will take you a few years of work to solve for yourself, which, in my opinion, begs the question: why waste several years in school?
I guess this idea would be best to implement on high school kids; definitely old enough to be good at it, and yet might actually have the necessary time.

So the central points of my idea are these:

Get a group of teenagers who like programming and have done it before (these prerequisites are non-negotiable).
Divide into small groups (2-3), each of which will spend 2 years working on a project of their choice.
Set certain goals in advance, but also keep expanding the project as you go along.
Some lectures can teach about the relevant stuff programmers need to know but the academy currently ignores, e.g. source control, bug tracking, and so on.
Let the big group work together a lot, whether in code reviews, random testing, ideas for new features, and so on.
Tell everyone to complain about everything they can, e.g. how hard it was to code this feature or to test that change.
Create unnecessary challenges of the sort that might come up in a real project.
If some form of grading is absolutely required, grade the ability to face realistic problems, such as efficiently implementing new features, finding and fixing bugs, etc.

I suppose the unusual prerequisites deserve to be explained first.

High schools, colleges and universities are already full of people who hate everything about programming but know that that's the easiest way to get a high paying job at a good company. Good for them. They all grow up to be terrible programmers, of course, because software just isn't the sort of thing you can excel at without enjoying the process. I see no reason to make special effort of this sort for people who aren't going to be great either way and probably don't even care about it.

As for having programmed before, it really doesn't have to be serious, just evidence that a person enjoys programming enough to do it even if they don't have to. Also this plan doesn't really apply to people with zero experience, since it aims at creating serious software over a long time. I don't know how long it would take to bring a person who has never programmed before up to a level where they can start gradual work on a large project, but in the scope of 2 years for the whole thing it probably would be too much.

The whole thing should, I think, be a part of high school. Plenty of people start messing with programming earlier than that, so I'm sure that by that point some would have what it takes to start working on something bigger.

Teamwork is a terrible and evil thing, and also inevitable. That's fine, there's a limit to what a person can achieve on their own, especially while still young and inexperienced. That's why I think that this sort of thing should be done in small teams: to allow the project to be bigger, to keep people from working alone, and to teach them from an early point about various aspects of programming as part of a team (source control, modularity, communication, interfaces, and just plain teamwork).

Two years seems like an adequate time for creating a large project of some sort and experiencing some part of its natural life cycle: it leaves time for serious design, developing version 1.0, testing, and then plenty of time for maintenance, repairs, and adding interesting features. Also it seems like the sort of time frame that can fit into high school. An important bonus is the fact that plenty of problems simply never have time to appear in short projects, and they can be the toughest ones.

Each group should come up with their own idea for a project. It should be something that they find interesting and fun, something that can be done by a small group in the given time frame, and something with a lot of room for expansion. It can really be just about anything: a new social network, operating system, real time strategy game, an application that uses your cellphone camera to solve your math homework, anything at all. As long as it can keep the people working on it interested for the project's duration, can have a version 1.0 in under a year, and can be expanded and enhanced in enough ways to fill a second year, it's good.

I imagine that 2-3 weeks should be enough for each group to come up with a basic idea. Just a few sentences, a mission statement of sorts, something that you can use to get a vague idea of what you're doing, what it should look like.

Since the purpose of this project is not to make money or fill any particular role in anyone's life, designing everything in advance isn't right. It would also make things much less fun. Besides, it's not like it works in real life; designs always have plenty of missing details, neglected edge cases, and good old mistakes. Features get added and dropped, as do various requirements. Might as well start as simple as you can and iterate a lot.

If you're developing a game, you could add more levels, characters, improved graphics.
If you're developing a website, you could support more browsers, improve security, redesign.
You could decide to add support for unusual hardware, have fake advertisement space, anything at all.
And there is no reason to plan any of it in advance. You can finish building your site which sells hamsters online (free delivery!) and once you're done, decide that you might as well branch on to ferrets and guinea pigs, because you have absolutely no financial goal to meet, no shareholders to please, no bosses to argue with.

The purpose is the journey, rather than the destination.
There is no harm in finishing the journey every other month, and then simply picking a new direction to travel in for some time.

If you want this to be like some sort of class in high school, some lectures will probably have to happen. Just as well, there is material to cover that might be relevant for everyone, regardless of what project they've decided to work on.

The academy currently seems to neglect many things that are crucial to any real software development. In my opinion, even the ones that you could in theory expect students to pick up on their own get ignored due to the highly demanding academic schedule.
In a nutshell, I'm referring to the sort of things you have in the Joel Test. A few of these points aren't relevant, but source control, builds, bug tracking, specs, schedules, unit testing, backups, even tools, these are things you can only hope to learn on your first job because (in my experience) nobody will mention them before that point. If we're already fantasizing, might as well change that.

I suppose that at many points the people involved in this will sit together in a classroom, just like any other high school class. There's plenty of things for them to do then.

Showing the various projects to everyone, explaining what they're good for, and maybe getting hit with nasty criticism, are all things you might as well learn early. Having more people to point out flaws, or praise impressive achievements, would improve the whole thing. Not to mention how good innocent bystanders are for finding surprising bugs, and doing usability testing. Maybe even deliberately try to break each other's security.

Beyond that, people could be each other's pretend-customers and ask for features and changes and discuss schedules for them. This would allow you to generate surprising ideas and feature requests, and simulate a real project's development better than any number of tired old teachers. Interaction with a large number of various people can bring a lot of life into a project in ways you'll just never find in the old and boring academy.

Saying that people need to complain a lot must have seemed out of place. Let me explain.
Perfectionism is a flaw in certain situations, there's no doubt about it. It can get you stuck on things you might as well ignore, and suffer needlessly from things that seem normal to others.
On the other hand, the first step on the road to making something better is saying "this could be better". This whole blog is a result of perfectionism; the few companies I've worked for so far haven't gone bankrupt yet, but I find it infuriating when I have to spend an entire week working on a task that has no right to take more than a few hours.

In short, according to my life philosophy, it's up to us to notice problems (or mere imperfections) and strive to fix them. Look at the number of programming languages out there, each one of them is a result of someone looking at an existing language and saying "Well, it's OK, but I really wish it could do X". (Yeah, alright, except for LISP).

This is why everyone working on this project should take the time to learn ways to get around pesky little things that slow them down. A feature was hard to code? Think of what could have been done to make it easier. It runs too slowly on Windows 3.1? Find out why and see if you can (and should?) optimize it. The code is too ugly, some bugs managed to slip through the rigorous testing, the debug/test cycle is too slow, mention each problem and see how you can solve it, and let other people point out things you might not have considered problems and try to solve them as well.

These things are best done at an early age, I think. Humans are disturbingly good at adapting to things, it's best to come up with solutions before you grow to accept the problems as a natural part of life.

One of the things the idea relies on, as I've mentioned earlier, is that the people doing the project are into the whole thing. It's good, because you can abuse that a bit.
A seasoned programmer can probably think of a dozen things that can make a real project hellishly annoying, and the same programmer might retroactively realize what could have been done to make it a bit less hellish. Since the purpose of this whole thing is to prepare programmers before they get to face these problems in the wild, it might be beneficial to simulate some such problems.

It could be anything, really. A demand to have a stable build that you can release as fast as humanly possible. An accidental change in the code that causes a horrible bug that now needs to be found and fixed. An unexpected opportunity to earn a gazillion dollars (in small change) by creating a side version containing limited features. The sudden invention of a new technology and the chance to capitalize on it in this very project. Faulty hardware, hard-to-reproduce bug, zombie apocalypse, anything that would pose a reasonable challenge (fine, maybe not zombie apocalypse).

Just ask some random programmers about big, preferably unexpected problems they've faced at some point in their career and try to apply them to the existing projects.
The harder the training, the easier the battle.

Since I've been thinking of all this in the context of a high school class, there's no doubt that someone might want to stick an exam or two in it, because grades are everything. Luckily, you can get grades without exams!

Since we're trying to teach real-life stuff here, it makes sense to get grades on things that matter in real life.
Releasing a good polished product matters in real life. Being able to respond to change requests quickly and efficiently matters in real life. The actual criteria would have to be improvised based on what the actual projects are, I suppose, but I'm sure that for any project you could come up with an extra feature that wouldn't take months to implement, but would be a good test of how well the project is made in terms of modularity and flexibility.

But on the whole, this should be about finding the middle ground between fun and education, and not about grades. As I said at the start, there is just no point trying to excel at something you don't enjoy. That's why I think everyone should pick and design their own projects, that's why there should be social interaction, and that's why there shouldn't be too much time wasted on designing in advance.

Blimey, did you actually read through all of that? You're crazy, man!

Well, these are just the main points of the many, many things that run through my head in this context.
In my 4 years of learning comp sci in school and too many years of university, the subject of actually developing software was practically taboo. In the first lecture of one of the intro courses they actually told us: "Computer Science is no more about computers than astronomy is about telescopes" (wrongly attributed to Dijkstra, I believe). What the hell? You can study theoretical astronomy for a lifetime and know less about Mars than some kid with a telescope. What use is that to anyone?

The current method does not train good programmers. "Computer science" mathematicians have their place, yes, and it can be safely away from computers just like in the above quote. But the world needs many programmers and not so many mathematicians, and that needs to be taken into account.
I have never once seen a problem that I could solve using fancy algorithms, and yet I wasted all this time in pointless classes learning about linked lists and BFS and deterministic automata. I could and should have spent that same time writing software, learning the tools of the trade, preparing to do the job I intend to spend my life doing.

All this math can fit into advanced courses, but first things first: teach programmers how to program.

Saturday, July 14, 2012

Variables and Other Demons

if (someverylong.variable.name.m_bSomething == true)
{
another.ridiculously.longname.m_bSomething =
another.ridiculously.longname.m_bSomething  &&
  true;
}
else
{
another.ridiculously.longname.m_bSomething  =
another.ridiculously.longname.m_bSomething  &&
  false;
}

This is a real piece of code, except for the variable names which have been altered to protect their identities. The original names were the same length though.
Now, since I suffer from a rare case of Compulsive Refactoring Disorder, whenever I run into a sub-optimal piece of code I immediately try to make it better. I take medication for it, but still every day is a struggle. Anyway, this one turned out to be interesting.

This code doesn't immediately scream "I'm wrong". I mean, it's a bit heavy on the eyes, but that stuff just happens when you have huge structs within huge structs. What's a programmer to do?
Spoiler alert: this code is wrong. If you can see the solution right now, you win 2 points!

I'll use a magic trick here which I've learned from arch-druids versed in the arts of mystical runes.
This one arcane rune is called ampersand, and when properly used it grants you the power to command a demon as if you knew its True Name, by binding the beast to a name of your choosing. And it works on variables, too!
I'm going to rename these two demons X and Y, and also drop their protective braces temporarily. Check this out.

bool &X = someverylong.variable.name.m_bSomething;
bool &Y = another.ridiculously.longname.m_bSomething;

if (X == true)
Y = Y && true;
else
Y = Y && false;

Can you see it yet? You get 1 point if you do. And if you don't:

	X==true	X==false
Y==true	true	false
Y==false	false	false

Well, that looks familiar, doesn't it?
What I'm trying to say is that the entire monstrosity up there sums up to:

Y = Y && X;

This expression actually has less elements than the original code has lines.

I admit that the names X and Y aren't very informative and might not even fit the occasion.
But the purpose of this live example was to show how something like long variable names can be enough to effectively blind you to the code's meaning and purpose. In this case as soon as the names were made shorter, it was much easier to see what actually happens, and the inherently suspicious bits were much more visible than before.

If you have any interesting examples of your own, don't hesitate to comment!

Monday, July 9, 2012

Collective Nouns

Most people probably know that in English, groups of various nouns have various collective nouns. Some people might even know this list of supernatural collective nouns.

Personally, I feel that the wide field of software rightfully deserves some such nouns of its own, and have decided to start compiling (pun intended) a list of what seem like good ideas.

I hope to see some comments with nouns that should be added to the list!

A list of arrays

A thicket of if-statements

A feast of bytes

A dynasty of classes (inherited)

An incest of classes (virtually inherited)

An oblivion of compiler warnings

An Ikea of databases

A stutter of DWORDs

A distraction of humans (fem.)

An array of lists

A murder of macros

A mess of objects

A crash of pointers

An overflow of recursive functions

A trouble of regular expressions

A spool of strings

An arrogance of templates

A knot of threads

A uncertainty of variables (esp. data members)

A landfill of variables (uninitialized)

A babble of words

Saturday, July 7, 2012

Another Optimization

When you finish your project and decide to optimize it, you usually look for that little loop where the processor inevitably spends 80% of the run-time, and find a way to make it faster. If it's much slower than it needs to be, you're likely to go to as far as using a different language for just that part of the code.
This logic should be applied to programmers too, not just processors.
Now, the numbers aren't consistent, but you're almost sure to spend at least 30% of your development time on debugging. Often more. Much, much more. Some people put it as high as 90% even. That's a loop you should really strive to optimize at all costs.
Let's talk about debugging then.

We all get stuck in the compile/debug cycle (AKA edit/compile/test cycle).
The compiler is usually beyond our control (and often fairly quick for minor changes).
The editing depends on a lot, but very often you just want to make many tiny changes and test their effect, or you think you can fix a bug with just one tiny change, so that's pretty quick too.
Then you hit the testing part, where you have to run the program, enter your username and password and maiden name and social security number and shoe size, then you loyally watch the loading screen for several minutes while the computer resets the hardware, initializes the data structures, synchronizes the clocks, refills the shark tank, contacts the server, and... wait, what was I trying to check again?

This is all a waste of time. Fortunately, it's possible to avoid it. Here are some methods to do that:

Debug Mode: I don't mean selecting Debug in that little combo box and then having the privilege of breakpoints, line-by-line debugging, and those few extra log prints that depend on the _DEBUG flag. That's fine for some things, but not many. Instead, you can introduce a new running mode for your application, and use it to allow yourself to do things that no regular users should ever be able to do:

Skip some initialization processes. If the application normally reads a gigantic database from a remote server but all you want to debug is the feature that snaps the toolbar to the left side, no reason to start reading databases.
Run things in the wrong order. If you're working on a game and want to iterate on the design of level 2, it's ridiculous to have to run through level 1 each time. It's equally ridiculous to have to run through the first quarter of level 1 just to iterate on the second quarter.
Skip other things. Splash screens, username and password prompts, data verification, confirmation messages, anything that isn't related to what you're doing.
"Break" various components in run-time to simulate errors. If you want a truly robust application, you should be able to test a whole array of errors, and making these errors actually happen in the right moment might require a lot of effort.
There's nothing inherently wrong with extra log printing when _DEBUG is defined, actually.

Data Injection: Assuming your program does more than calculate a large prime number, there is usually some input involved. Input comes from various sources, and 4 out of 5 times you really don't want to depend on what these sources happen to be doing at the moment.

Simulate devices. If your program monitors a nuclear reactor, but your company only has 4 of them and they're all being temporarily used by another team, you better have a Fake Nuclear Reactor Simulation Module or you literally can't work until they're done.
Simulate servers. It's fine that you're rewriting the 3D display of a spaceship's position around Neptune, but that's no reason to wait for 5 minutes for the data from the space station every time you want to check that the wings are rendered in just the right shade of magenta. Prepare fake input and a way to read it.
Reproduce results and bugs. The guys at QA found out that your cat-tracking software crashes only when the cat chases the laser pointer just so. Instead of convincing the test cat to do the exact same thing dozens of times until you find the memory leak, simply save a recording of the Psychoscopic Feline Monitor data and you'll be able to work on the exact same data sets with zero effort until you locate the problem.
Just save time. Yes, of course users have to supply a username and password, and enter a Captcha correctly, and give a urine sample, just to log onto the system. As a developer you may want to log on hundreds of times per day, and even if it only takes you a few seconds, the wasted time accumulates. Besides, who enjoys typing the same thing again and again?
Unit tests, but that's pretty obvious.

Debugging UI: There is nothing wrong with investing some time in windows and forms that the end-user will never see. I think the odds are decent that you'll spend more time working on the program than the average user will spend using it, anyway.

Watch the unwatchable. There's plenty of reasons why putting a breakpoint in the code just to check the value of a variable is impossible (multithreading being the most obvious one), but there's nothing stopping you from displaying this data somewhere on the screen, in whichever format you want (you hear that, everyone-who-ever-used-a-string-in-C++?).
Bypass the "compile" part of the cycle. You just want every component of the GUI to have the perfect color, why should you recompile for every tiny change when you can just have a nice console with sliders and color pickers and such?
If you're using the right language, you can simply change the code while running.
Change data while running. This allows you to undo changes and reset states without having to re-run the whole thing, or easily simulate specific conditions that are too much effort to create manually like a real user would.

Yes, every one of these ideas is time consuming to some extent. But if your project is going to last more than a few months then the advantages will make up for it pretty quickly. Not to mention that some things you build for one project might then be reused for many other projects over many years.

There are probably many other tips and tricks out there, but these are the ones I could think of today. If anyone knows other useful methods, I'd love to hear about it.