Projects /How Not to Code

I've been rolling code in a professional capacity since 1999, and a quick look at my CV will tell you that I've been around. I've worked for twelve companies in thirteen years, while I did some travelling and gained some experience. I have seen some fantasically horrible code, and I think it's time that I compile a list of what not to do.

There are a lot of books and websites dedicated to good software design, but very few that actually preach good habits, even fewer that point out the really atrocious ones. That's the point of this page. As I run into them, I'll be building a collection here. Maybe one day I'll turn it into a book, but for now, this is it.

Note that the code samples here will all likely be in Python, because it's what I code in most of the time, but they will apply to most object-oriented languages.

Variable Names

I can't believe that this kind of thing is as common as it is among adult programmers. It's like we all need to take a kindergarten class again, with our teacher calmly repeating: "Use your words".

Variables are containers (or pointers) to data, and that data has a nature of it's own. Maybe it's a file name, or blog comment, or byte count for stream parser. Whatever it is, naming it appropriately counts. Take this code snippet for example:

Bad

    for t in lrts:
        t.switch("left")
  

Even if you stare at that snippet for a good long time, and even if you have more context around it, there's no way any non-psychic can know that in that we're talking about trains here.

Good

    for train in light_rail_trains:
        train.switch("left")
  

Suddenly, the code becomes clear. New developers can sit down at their console and know immediately what this code is doing. The take-away should be that if a new developer has to ask you what your code does, it's wrong.

Comments

The very next thing people say in defence of the above code is typically: "But that's why you comment it!". This is wrong-headed. Comments are a means to give an overview of what a lot of code is doing, not a a play-by-play of the code. Comments, if they exist at all, should be limited to short, concise paragraphs, preceding legible, understandable code that can be parsed easily without having to read every line.

Bad

    while za.tmp < 100:  # Wait for the pizza to reach 100C
        za.bake()  # .bake() increases temperature by 1C
  

Good

    while pizza.temperature < Pizza.COOKED_TEMPERATURE:
        pizza.bake()
  

The first is both pointlessly cryptic, and pointlessly verbose.

Database Structure

This only really applies to people working with a relational database, but it's so very important. Honestly, if you don't pay proper attention to your database schema early on and throughout your development process, you deserve to have your programmer card taken away.

Most of what they teach you in school about SQL-style storage is about normalisation. Normalisation is important, but too many people stop there and don't consider things like column types. Worse still, some devs get it in their head that splitting their data across multiple tables using 1:1 ratios is always a good idea. It rarely is.

We'll use a transactions system here, because these are all-too-often overly convoluted, leading to far too much pain down the road. Assume that we want to be able to keep track of every transaction made, and still have easy-access to a running balance value.

Bad

    transactions
      id           int   Primary key
      user         int   Foreign key: references the users table
      delta        int   Amount changed
      description  text

    balances
      id           int  Primary key
      transaction  int  Foreign key: references the transactions table
      balance      int  Balance after this transaction

    current_balances
      balance  int  Foreign key: references the balances table
      user     int  Foreign key: references the users table
  

There are so many things wrong with this layout it's hard deciding where to begin:

  • The current_balances table is completely irrelevant. All it does is force a join whenever you want to get a user's balance.
  • The balances table provides no useful information aside from the balance, which could just as easily been rolled into the transactions table, or better yet, the users table.

Better

    transactions
      id           int   Primary key
      user         int   Foreign key: references the users table
      delta        int   Amount changed
      balance      int   The running balance at this point in the transaction
      description  text
  

This does away with the redundant tables and keeps everything where you need it, but it leaves one bit of redundancy: the balance column. It's really just the result of a replaying of the transaction log. Still, this is a step in the right direction.

Best

    users
      ...
      balance  int  A calculated amount, populated whenever a new transaction is created.

    transactions
      id           int   Primary key
      user         int   Foreign key: references the users table
      delta        int   Amount changed
      description  text
  

Typically, the only reason you want the running balance for a user is that you want to know how much cash that user has right now without having to do an expensive join against the potentially lengthy transactions table. Therefore, your best option would be to automatically write that value into the user profile whenever a transaction is created. This way, no joins are required to get the piece of information you use most often, and you do away with needless complexity. All this costs is a regular update, which can be achieved by manually running a query, or setting up a signal in your framework, or even using database triggers.

Constants

For gods' sake, use them. Seriously, what good is this sort of thing?

    if person.status == 1:
        # Promote them
    elif person.status in (2,3,4):
        # Fire them
  

How is anyone supposed to know what 4 is, without grepping for "4" backward through your code? This is a much better idea:

    # MyModel.py
    class Person(object):
        STATUS_AWESOME = 1
        STATUS_ASLEEP  = 2
        STATUS_HIGH    = 3
        STATUS_IDIOT   = 4

    # somefile.py
    from MyModel.Person import STATUS_AWESOME, STATUS_ASLEEP, STATUS_HIGH
    if person.status == STATUS_AWESOME:
        # Promote them
    elif person.status in (STATUS_ASLEEP, STATUS_HIGH, STATUS_IDIOT):
        # Fire them
  

This illustrates the premise that good code is self-documenting. Stick to that, and you're usually alright.

Not Invented Here (NIH)

Most modern languages support some sort of library of reusable modules. In Python-land, it's pip, Ruby has gem, Javascript has npm, etc. Whatever the resource may be, please use it. There's nothing like sitting down to start work on a new project to find that someone has decided to write custom versions of code that already exists out there.

To give you a rough idea, I've seen company-specific implementations of a database API, database migrations, SSL, and Cron -- all in the same project. Imagine the overhead of having to train new people on the special way we do X here, not to mention the fact that community-maintained modules are very likely to be better tested than whatever someone-in-your-office-who-no-longer-works-there hacked together under a deadline.