Monday, September 20, 2010

The Road to San Francinattle

A problem I've encountered in making games and random map generators is giving places names. If it's a randomly generated map, then it needs randomly generated names. Doing it by hand would defeat the purpose. Another possibility would be to compile a large list of city names and then have the program pick names randomly. But that would less interesting as it could never come up with a new name. It would be great if a program could know the rules of what would make for good city naming. But since I'm not about to invent artificial intelligence, something a bit simpler is in order.

And this weekend, I came up with the perfect solution. A Markov chain random text generator. A Markov chain is a random process in which the next state is dependent only on the current state. In the case of a Markov chain random text generator, the current state is the last n characters (or words) generated. My program works with characters, but the concept applies exactly the same to words.

A Markov chain text generator first reads some sample text. From this it builds kind of probability table of which letters follow which other letters more or less frequently. Then it generates the letters (considering only the last n letters generated), weighting them so the more common letters and letter combinations occur more frequently.

My particular generator is slightly different than most others because it knows how to begin and end. Most other Markov text generators are used to make large blocks of text based off of entire books. Mine is to be used for short names. The beginning and end matter more, and it's good to have a name that ends in a logical way such as "ton" or "ville" instead of just being cut off in the middle. So I treat the end of the name as a character the same as the rest. It gets put in the chart the same as other and it gets generated the same as others, except that when it's generated, the name is done and it stops generating more.

Here's an example of how it works, using a name it actually generated: San Francinattle. For early testing, I used a list of the 100 largest cities in the US, including San Francisco, Cincinatti and Seattle. I'll only consider those three for simplicity. More names affects the probabilities involved, but not the principles. In this example, I'm using order = 3.

For the first letter, two names have "S" and one has "C", so it has a 66.7% chance of generating "S". With just "S" it has a 50-50 shot of producing "e" or "a" next. With our limited dataset, once "Sa" is generated, the only next option is 'n'. That continues until we've generated "San Franci". The order is 3, so it only considers "nci" when generating the next letter. That letter combination was in Cincinatti, in addition to San Francisco. It doesn't care about what came before this, so it has a 50-50 chance of generating an "s" or an "n", and in this case it went with "n". Now with this combination of letters, Cincinatti is the only game in town, so it has to stick with it for the next four letters, but when the last generated letters are "att" then it matches Seattle. At which point, it finishes up the city name.

And so, a brand new city name, inspired by other city names, and following English spelling convention.

After some successful test runs, I compiled a list of over 19,000 US city names (information provided by your friendly neighborhood census bureau (and a little bit of programming to strip out unnecessary parts)).

Here are some of the names generated with order = 1:
Orrdak Lanson
Pachelug
Mior Juengsperes Beleckbumonty
Qubrt Blle
Nesoala Isvid
Splllay
War Hilahe Crgeyrtosn
Rionnengh Fr
Ibbon
Sove
Order = 2:
St. Cimandsonisond
Rippnevillsbortham
Mase
Vinge
Easton
Blue Haveroe
Peenvilley
Mintond
Forwing
Willinbold
Wod
Bly
New Citeradermon
Order = 3:
Tonkeen
Chur
Big Vall
Pleadis
Summitol
McRae
Roscomb
McBainfieldersvillenwood
Meside
Uphalittles
Glens
South Burleve Porth
Runa
Hess Hillage Placket
Belph
Warway
Order = 4:
Washton
Brook
Iona
Denver
Graham
Buncombes
Dewey
Mario
Dolanagan
Onster
Hampton
Magner
Greenview
Bee City
Saling Gretna
It's amazing how few orders are between unpronounceable gobbledygook and reproducing sample names exactly. Orders 2 and 3 seem pretty good to me though. It produces names which are easy to imagine could be real, but probably aren't.

I also got a list of about 2,000 German cities. Here are some names generated using those (order = 3):
Alzenbach
Coswintern
Völklingen
Ludwig
Kötheim
Güstralsrode
Löwentha
Pappenhüttenburg
Kalbernburg
Wolgau
Geislingen

Tuesday, September 14, 2010

Some Musings on Redundancy

I was talking about the DRY principle before. Don't Repeat Yourself. From reading programming books, it seems like redundancy is the worst possible sin, for which one should be immediately banished to the ninth ring of hell (though it seems like it's frequently committed in practice). But I have to wonder if redundancy is always a bad thing.

Well, it's pretty obvious that redundancy isn't always a bad thing. Leaving the world of computer science, and looking at engineering, redundancy is frequently a good thing. To take an extreme case, life support systems on spacecraft are multiply redundant. Which is good, because life is awfully fragile in low-Earth orbit. Even here on Earth, redundancy in engineering tends to be a good thing. It has the downside of costing a more, but has the advantage of preventing one thing going wrong from destroying everything.

But redundancy in engineering has little (if anything) to do with redundancy in programming. So, let's look at something a little more information based -- linguistics. Language is chock-full of redundancy. Here's a simple example: "I run", "He runs". What's with that extra s? We know who the sentence is talking about from the pronoun. Why bother with noun-verb agreement at all? Because the world is a noisy place. There's always some amount of background noise around. Frequently, people talk to each other in the middle of crowd, where everyone else is talking too, which is a pretty incredible feat if you think about it. And in a noisy environment, some amount of spoken information is going to be lost in the background. And a little bit of redundancy can help you make sure you actually heard what you thought you heard. This form of redundancy still has its cost: it takes longer to get a complete message across.

And after this cross-discipline trek, I'll finally step back into programming, from human languages, to programming languages. I've been learning some Groovy for work. Groovy is a language that's built on top of Java. It does everything Java does and adds in some cool features of its own. One of the things it does, and part of its core philosophy, is to remove Java's unnecessary and redundant fluff. For example, Java requires a semicolon at the end of every statement. But most of the time, a single statement is on a single line. So, Groovy lets you use a new line to end the statement. You can still use the semicolon if you want, but it's not necessary. It's redundant. Stuff like that is all over the place. It makes the code shorter, but it also makes it (at least for someone new to Groovy) more difficult to understand. Finding a method's return type, for example, is no longer, necessarily, simply looking at the method's declaration. It's still unambiguous, but it's harder to find.

And not only is it harder for a human, it makes mistakes more difficult for the compiler to catch mistakes. If you have information stored in two places, and you change one (intentionally or accidentally), the compiler can alert you to the inconsistency. If you changed it intentionally, you'll be reminded to change the other. If you changed it accidentally, you'll be reminded to fix it. If the information is stored in only one place, the compiler has no way of checking if you really meant the change. Here's an example with Python (since I haven't been using Groovy enough to encounter a good example of this yet). In Python, functions can be treated just like any other variable. I was writing a program in which I wanted to get the result of one function (which took no arguments) and then pass that to another function. Simple code like this: x = funcA; funcB(x); See the problem? It should have looked like this: x = funcA(); funcB(x); Those parentheses after funcA make a big difference. Without them, funcA itself is passed to funcB instead of the result of funcA. If you consider a theoretically ideal language which has absolutely no redundancy (brainfuck comes close), then any arbitrary string would compile and run. Which means if you make a single typo, it will still work, it just won't do what you want it to do.

I was gonna talk more about other forms of redundancy in programming. Higher level stuff in the overall design of the program rather than the nuts and bolts of the language. But this post is plenty long enough as it is, so I'll wrap up with conclusions now. Is redundancy a bad thing? Not necessarily. The advantages can outweigh the disadvantages. But there are always disadvantages. In most of my examples, the costs were pretty small compared to the benefits. But, especially in the higher level, more abstract stuff, the costs can be significant. So, I'll bring up another coding principle: Code by intention. If you're going to do something redundant, do it for a good reason. Do it intentionally.

Wednesday, September 1, 2010

Today is the First Day of Autumn

Why? Because I say so and I really like autumn. That's why.

There are those who say that solstices and equinoxes mark the beginning of each season. In which case we're still three weeks away from autumn. Others say that solstices and equinoxes mark the midpoint of each season.

Nonsense, I say! Solstices and equinoxes have to do with the alignment of Earth's tilt to the Sun. Seasons have to do with the weather and the temperature. The two are related, but not the same.

If you say solstices mark the beginning of seasons, then a week before Christmas isn't winter yet, which seems kind of ridiculous. If you say equinoxes mark the midpoint, then Valentine's Day is already spring, which is even more ridiculous.

And more importantly, there is no first day of any season. As my brother and I were discussing on Facebook, everything is continuous, and seasons are a prime example of that. It's not like there is one single day where the temperature drops from 30°C to 20°C and all the trees change color. It's a gradual transition, like red turning to orange in a rainbow. Any line of demarcation is going to have to be arbitrary.

And since it has to be arbitrary, it might as well line up with another, well established arbitrary date-point. In this case, the first of the month.

And so, I decree: The first day of autumn is September 1st. By extension, the first day of winter is December 1st, the first day of spring is March 1st and the first day of summer is June 1st. And it lines up much better with the weather that way.

Tuesday, August 17, 2010

AlDraw, Version 2.0

AlDraw is the program I use to make things like this:
(By the way, I just uploaded a bunch onto my picasa album)

I've been working on improving it. I've decided I want to do a major overhaul, and make it version 2.0.

There are three big things I want to focus on:

  • Improving code design. -- When I first wrote it, I didn't really give any thought or attention to making it good code and easily maintainable and extensible. As I gradually expanded it, I rewrote the worst parts to make it more flexible. But for the scale of changes I want, I don't think that's going to be enough. Also, I want to be able to show the code to another programmer and not have them think "What the hell were you doing?"
  • Improving usability. -- This is probably the worst thing about the program right now, because I'm the only one who uses it. There are a lot of things about the program that make perfect sense to me, but wouldn't to anyone who just started using it, because they didn't write it, or get used to its shortcomings as they worked to fix them. If anyone is willing to help test it, let me know.
  • Add features. -- Copy and paste to make repeated designs like tiling easier. Make zooming and panning more fluid. Add colors. Pretty straightforward.
Also, as I want this to be a big step forward, now would be a good time to rename it. I would welcome any suggestions. Possible ideas: keep AlDraw, GeomeDraw, Constructomatic, Euclidomizer...

Sunday, August 15, 2010

The Problem of Evil

A lot of Christians claim that God is all-powerful, all-knowing and perfectly good. And yet, earthquakes, hurricanes, tsunamis, tornadoes, floods, fires and all sorts of other natural disasters happen everyday which destroy infrastructure and kill people.

I'll let a Greek philosopher (probably not actually Epicurus, though usually attributed to him) make the connection.

Is God willing to prevent evil, but not able?
Then he is not omnipotent.
Is he able, but not willing?
Then he is malevolent.
Is he both able and willing?
Then whence cometh evil?
Is he neither able nor willing?
Then why call him God?
One of the most common solutions to the problem of evil is to say God has "mysterious ways" and these acts serve some greater purpose. But God is supposed to be omnipotent. Not just really, really powerful - all-powerful. If he were all-powerful, he could achieve the same purposes without the killing and destruction.

Some people say that God can't be morally judged the same as humans, but I don't see why not. Whether something is moral or not isn't determined by the actor's power or knowledge. (Note to self: write a post going into more detail about this.)

One of the most... interesting... solutions to this problem that I've seen is that natural disasters are caused by human evil and sinfulness and for God to prevent them would interfere with our freewill. Of course, there was no explanation of how humans cause natural disasters, or how God would be interfering with freewill by eliminating them.

And then of course, there is the solution that there is no such entity that is omnipotent, omniscient and omnibenevolent. Personally, I find that to be the most parsimonious solution.

Monday, August 9, 2010

Questioning

I know you won't believe me, but the highest form of Human Excellence is to question oneself and others.
-Socrates

In my previous post I talked about why truth is a good thing. Continuing that, questioning is also good. Because questioning is one of, and perhaps the single most important tool in finding the truth.

Not everything you believe is true. Not everything I believe is true. In order to discard our false beliefs, we must first find them. And the only way to do that is to question our beliefs.

Consider the modern fable of the five monkeys. The monkeys continued believing that it was a bad thing to try to get the bananas, even though conditions had changed. Just because something was true doesn't mean it still is.

This is why I so strongly support the freedom of speech and more generally the freedom of belief. The only way for the truth to prevail is for it to be critically examined.

Wednesday, July 28, 2010

Truth is Important

Truth is important.

It may sound obvious. That's what we've been taught since childhood. Lying is bad; honesty is good. But our parents and teachers are not infallible. They could have been wrong. Lying and bullshitting certainly seems quite prevalent in the behavior of those at the top.

Also, a major component of philosophy is to check your assumptions. Wrong assumptions lead to wrong conclusions, which is bad philosophy. So, are we sure that the truth is a good thing?

My answer is yes. First for practicality. If you're pursuing another goal, the truth will only help you get there. If you're trying to make people happy, you need to know what will actually make them happy. If you do something that you believe will make people happy, but are mistaken, you will achieve the exact opposite of your goal. And this is true of any goal. Even if your goal is to dissemble and mislead, you'll be able to do it better if you know the truth.

But beyond that, I feel that truth is good in and of itself. I can't really articulate why. It's a non-rational preference, the same way preferring pleasure to pain or happiness to unhappiness is non-rational. And I feel it's a very important preference, of the magnitude of pleasure or happiness.