Tuesday, September 14, 2010

Some Musings on Redundancy

I was talking about the DRY principle before. Don't Repeat Yourself. From reading programming books, it seems like redundancy is the worst possible sin, for which one should be immediately banished to the ninth ring of hell (though it seems like it's frequently committed in practice). But I have to wonder if redundancy is always a bad thing.

Well, it's pretty obvious that redundancy isn't always a bad thing. Leaving the world of computer science, and looking at engineering, redundancy is frequently a good thing. To take an extreme case, life support systems on spacecraft are multiply redundant. Which is good, because life is awfully fragile in low-Earth orbit. Even here on Earth, redundancy in engineering tends to be a good thing. It has the downside of costing a more, but has the advantage of preventing one thing going wrong from destroying everything.

But redundancy in engineering has little (if anything) to do with redundancy in programming. So, let's look at something a little more information based -- linguistics. Language is chock-full of redundancy. Here's a simple example: "I run", "He runs". What's with that extra s? We know who the sentence is talking about from the pronoun. Why bother with noun-verb agreement at all? Because the world is a noisy place. There's always some amount of background noise around. Frequently, people talk to each other in the middle of crowd, where everyone else is talking too, which is a pretty incredible feat if you think about it. And in a noisy environment, some amount of spoken information is going to be lost in the background. And a little bit of redundancy can help you make sure you actually heard what you thought you heard. This form of redundancy still has its cost: it takes longer to get a complete message across.

And after this cross-discipline trek, I'll finally step back into programming, from human languages, to programming languages. I've been learning some Groovy for work. Groovy is a language that's built on top of Java. It does everything Java does and adds in some cool features of its own. One of the things it does, and part of its core philosophy, is to remove Java's unnecessary and redundant fluff. For example, Java requires a semicolon at the end of every statement. But most of the time, a single statement is on a single line. So, Groovy lets you use a new line to end the statement. You can still use the semicolon if you want, but it's not necessary. It's redundant. Stuff like that is all over the place. It makes the code shorter, but it also makes it (at least for someone new to Groovy) more difficult to understand. Finding a method's return type, for example, is no longer, necessarily, simply looking at the method's declaration. It's still unambiguous, but it's harder to find.

And not only is it harder for a human, it makes mistakes more difficult for the compiler to catch mistakes. If you have information stored in two places, and you change one (intentionally or accidentally), the compiler can alert you to the inconsistency. If you changed it intentionally, you'll be reminded to change the other. If you changed it accidentally, you'll be reminded to fix it. If the information is stored in only one place, the compiler has no way of checking if you really meant the change. Here's an example with Python (since I haven't been using Groovy enough to encounter a good example of this yet). In Python, functions can be treated just like any other variable. I was writing a program in which I wanted to get the result of one function (which took no arguments) and then pass that to another function. Simple code like this: x = funcA; funcB(x); See the problem? It should have looked like this: x = funcA(); funcB(x); Those parentheses after funcA make a big difference. Without them, funcA itself is passed to funcB instead of the result of funcA. If you consider a theoretically ideal language which has absolutely no redundancy (brainfuck comes close), then any arbitrary string would compile and run. Which means if you make a single typo, it will still work, it just won't do what you want it to do.

I was gonna talk more about other forms of redundancy in programming. Higher level stuff in the overall design of the program rather than the nuts and bolts of the language. But this post is plenty long enough as it is, so I'll wrap up with conclusions now. Is redundancy a bad thing? Not necessarily. The advantages can outweigh the disadvantages. But there are always disadvantages. In most of my examples, the costs were pretty small compared to the benefits. But, especially in the higher level, more abstract stuff, the costs can be significant. So, I'll bring up another coding principle: Code by intention. If you're going to do something redundant, do it for a good reason. Do it intentionally.

No comments:

Post a Comment