Searching for Tao, 7573 Kilometres from Home

I've been rolling code in a professional capacity since 1999, and a quick look at my CV will tell you that I've been around. I've worked for 19 companies in 18 years (yes, there was a lot of overlap), while I did some travelling and gained some experience. I have seen some fantastically horrible code, and I think it's time that I compile a list of what not to do.

There are a lot of books and websites dedicated to good software design, but very few that actually preach good habits, even fewer that point out the really atrocious ones. That's the point of this page. As I run into them, I'll be building a collection here. Maybe one day I'll turn it into a book, but for now, this is it.

Note that the code samples here will all likely be in Python, because it's what I code in most of the time, but they will apply to most object-oriented languages.

Variable Names

I can't believe that this kind of thing is as common as it is among adult programmers. It's like we all need to take a kindergarten class again, with our teacher calmly repeating: "Use your words".

Variables are containers (or pointers) to data, and that data has a nature of it's own. Maybe it's a file name, or blog comment, or byte count for stream parser. Whatever it is, naming it appropriately counts. Take this code snippet for example:

Bad

		for t in lrts:
			t.switch("left")

Even if you stare at that snippet for a good long time, and even if you have more context around it, there's no way any non-psychic can know that we're talking about trains here.

Good

		for train in light_rail_trains:
			train.switch("left")

Suddenly, the code becomes clear. New developers can sit down at their console and know immediately what this code is doing. The take-away should be that if a new developer has to ask you what your code does, it's wrong.

Comments

The very next thing people say in defence of the above code is typically: "But that's why you comment it!". This is wrong-headed. Comments are a means to give an overview of what a lot of code is doing, not a a play-by-play of the code. Comments, if they exist at all, should be limited to short, concise paragraphs, preceding legible, understandable code that can be parsed easily without having to read every line.

Bad

		while za.tmp < 100:  # Wait for the pizza to reach 100C
			za.bake()  # .bake() increases temperature by 1C

Good

		while pizza.temperature < Pizza.COOKED_TEMPERATURE:
			pizza.bake()

The first is both pointlessly cryptic, and pointlessly verbose, while the second example is self-documenting.

Database Structure

This only really applies to people working with a relational database, but it's so very important. Honestly, if you don't pay proper attention to your database schema early on and throughout your development process, you deserve to have your programmer card taken away.

Most of what they teach you in school about SQL-style storage is about normalisation. Normalisation is important, but too many people stop there and don't consider things like column types. Worse still, some devs get it in their head that splitting their data across multiple tables using 1:1 ratios is always a good idea. It rarely is.

We'll use a transactions system here, because these are all-too-often overly convoluted, leading to far too much pain down the road. Assume that we want to be able to keep track of every transaction made, and still have easy-access to a running balance value.

Bad

		transactions
			id           int   Primary key
			user         int   Foreign key: references the users table
			delta        int   Amount changed
			description  text

		balances
			id           int  Primary key
			transaction  int  Foreign key: references the transactions table
			balance      int  Balance after this transaction

		current_balances
			balance  int  Foreign key: references the balances table
			user     int  Foreign key: references the users table

There are so many things wrong with this layout it's hard deciding where to begin:

The current_balances table is completely irrelevant. All it does is force a join whenever you want to get a user's balance.
The balances table provides no useful information aside from the balance, which could just as easily been rolled into the transactions table, or better yet, the users table.

Better

		transactions
			id           int   Primary key
			user         int   Foreign key: references the users table
			delta        int   Amount changed
			balance      int   The running balance at this point in the transaction
			description  text

This does away with the redundant tables and keeps everything where you need it, but it leaves one bit of redundancy: the balance column. It's really just the result of a replaying of the transaction log. Still, this is a step in the right direction.

Best

		users
			...
			balance  int  A calculated amount, populated whenever a new transaction is created.

		transactions
			id           int   Primary key
			user         int   Foreign key: references the users table
			delta        int   Amount changed
			description  text

Typically, the only reason you want the running balance for a user is that you want to know how much cash that user has right now without having to do an expensive join against the potentially lengthy transactions table. Therefore, your best option would be to automatically write that value into the user profile whenever a transaction is created. This way, no joins are required to get the piece of information you use most often, and you do away with needless complexity. All this costs is a regular update, which can be achieved by manually running a query, or setting up a signal in your framework, or even using database triggers.

Constants

For gods' sake, use them. Seriously, what good is this sort of thing?

		if person.status == 1:
			# Promote them
		elif person.status in (2,3,4):
			# Fire them

How is anyone supposed to know what 4 is, without grepping for "4" backward through your code? This is a much better idea:

		# MyModel.py
		class Person:
			STATUS_AWESOME = 1
			STATUS_ASLEEP  = 2
			STATUS_HIGH    = 3
			STATUS_IDIOT   = 4


		# somefile.py
		from MyModel.Person import STATUS_AWESOME, STATUS_ASLEEP, STATUS_HIGH

		if person.status == STATUS_AWESOME:
			# Promote them
		elif person.status in (STATUS_ASLEEP, STATUS_HIGH, STATUS_IDIOT):
			# Fire them

This illustrates the premise that good code is self-documenting. Stick to that, and you're usually alright.

Not Invented Here (NIH)

Most modern languages support some sort of library of reusable modules. In Python-land, it's pip, Ruby has gem, Javascript has npm, etc. Whatever the resource may be, please use it. There's nothing like sitting down to start work on a new project to find that someone has decided to write custom versions of code that already exists out there.

To give you a rough idea, I've seen company-specific implementations of a database API, database migrations, SSL, and Cron -- all in the same project. Imagine the overhead of having to train new people on the special way we do X here, not to mention the fact that community-maintained modules are very likely to be better tested than whatever someone-in-your-office-who-no-longer-works-there hacked together under a deadline.

Impossibly Long Functions

It doesn't matter what language you're using. If you're regularly writing functions or methods that exceed 100 lines, you're doing it wrong. If you're writing if/else blocks inside if/else blocks inside for loops, inside if/else blocks, you're doing it wrong. Short code is readable, and more importantly, testable code.

Bad


		if food_type == "cake":

			for ingredient in ingredients:
				if ingredient.type == "solid":
					my_food.solid_ingredients.add(ingredient)
				elif ingredient.type == "liquid":
					my_food.liquid_ingredients.add(ingredient)
				else:
					my_food.magic_ingredients.add(ingredient)
			my_food.bake()

		elif food_type == "sauce":

			bowl1 = Bowl()
			for ingredient in ingredients:
				if not ingredient.to_be_warmed:
					bowl1.add(ingredient)
			bowl1.mix()

			bowl2 = Bowl()
			for ingredient in ingredients:
				if ingredient.to_be_warmed:
					bowl2.add(ingredient)
			bowl2.mix()
			bowl2.heat()

			return bowl1 + bowl2

		elif food_type == "meat":
				...

Better

		def make_cake(ingredients):
				# Do cake stuff

		def make_sauce(ingredients):
				# Do sauce stuff

		def make_meat(ingredients):
				# Do meat stuff

		food_mapping = {
			"cake": make_cake,
			"sauce": make_sauce,
			"meat": make_meat,
		}
		food_mapping[food_type](ingredients)

Even Better

		class Food:

			def __init__(self, ingredients):
				self.ingredients = ingredients

			@classmethod
			def get_by_type(name):
				return {
					"cake": Cake,
					"sauce": Sauce,
					"meat": Meat,
				}[name]

			def make():
				raise NotImplementedError()

		class Cake(Food):

			def make():
				for ingredient in self.ingredients:
					self._process_ingredient(ingredient)
				self.bake()

			def _process_ingredient(ingredient):
				if ingredient.type == "solid":
					...
				elif ingredient.type == "liquid":
					...
				...

			def bake():
				# Do whatever

		class Sauce(Food):

			def make():
				# Do sauce stuff

		class Meat(Food):

			def make():
				# Do meat stuff

		Food.get_by_type(food_type).make()

Of course you can (and should) break up the other nested if/else and for blocks into separate methods if they're complex at all. This is what makes the 3rd option so much better: you can move that logic into a series of private methods so your code remains readable and encapsulated from the rest of your project.

How Not to Code

Variable Names

Bad

Good

Comments

Bad

Good

Database Structure

Bad

Better

Best

Constants

Not Invented Here (NIH)

Impossibly Long Functions

Bad

Better

Even Better