I've been rolling code in a professional capacity since 1999, and a quick look at my CV will tell you that I've been around. I've worked for 19 companies in 18 years (yes, there was a lot of overlap), while I did some travelling and gained some experience. I have seen some fantastically horrible code, and I think it's time that I compile a list of what not to do.
There are a lot of books and websites dedicated to good software design, but very few that actually preach good habits, even fewer that point out the really atrocious ones. That's the point of this page. As I run into them, I'll be building a collection here. Maybe one day I'll turn it into a book, but for now, this is it.
Note that the code samples here will all likely be in Python, because it's what I code in most of the time, but they will apply to most object-oriented languages.
I can't believe that this kind of thing is as common as it is among adult programmers. It's like we all need to take a kindergarten class again, with our teacher calmly repeating: "Use your words".
Variables are containers (or pointers) to data, and that data has a nature of it's own. Maybe it's a file name, or blog comment, or byte count for stream parser. Whatever it is, naming it appropriately counts. Take this code snippet for example:
for t in lrts: t.switch("left")
Even if you stare at that snippet for a good long time, and even if you have more context around it, there's no way any non-psychic can know that we're talking about trains here.
for train in light_rail_trains: train.switch("left")
Suddenly, the code becomes clear. New developers can sit down at their console and know immediately what this code is doing. The take-away should be that if a new developer has to ask you what your code does, it's wrong.
The very next thing people say in defence of the above code is typically: "But that's why you comment it!". This is wrong-headed. Comments are a means to give an overview of what a lot of code is doing, not a a play-by-play of the code. Comments, if they exist at all, should be limited to short, concise paragraphs, preceding legible, understandable code that can be parsed easily without having to read every line.
while za.tmp < 100: # Wait for the pizza to reach 100C za.bake() # .bake() increases temperature by 1C
while pizza.temperature < Pizza.COOKED_TEMPERATURE: pizza.bake()
The first is both pointlessly cryptic, and pointlessly verbose, while the second example is self-documenting.
This only really applies to people working with a relational database, but it's so very important. Honestly, if you don't pay proper attention to your database schema early on and throughout your development process, you deserve to have your programmer card taken away.
Most of what they teach you in school about SQL-style storage is about normalisation. Normalisation is important, but too many people stop there and don't consider things like column types. Worse still, some devs get it in their head that splitting their data across multiple tables using 1:1 ratios is always a good idea. It rarely is.
We'll use a transactions system here, because these are all-too-often overly convoluted, leading to far too much pain down the road. Assume that we want to be able to keep track of every transaction made, and still have easy-access to a running balance value.
transactions id int Primary key user int Foreign key: references the users table delta int Amount changed description text balances id int Primary key transaction int Foreign key: references the transactions table balance int Balance after this transaction current_balances balance int Foreign key: references the balances table user int Foreign key: references the users table
There are so many things wrong with this layout it's hard deciding where to begin:
current_balances
table is completely irrelevant. All it
does is force a join whenever you want to get a user's balance.
balances
table provides no useful information aside from
the balance, which could just as easily been rolled into the
transactions
table, or better yet, the users
table.
transactions id int Primary key user int Foreign key: references the users table delta int Amount changed balance int The running balance at this point in the transaction description text
This does away with the redundant tables and keeps everything where you
need it, but it leaves one bit of redundancy: the balance
column. It's really just the result of a replaying of the transaction log.
Still, this is a step in the right direction.
users ... balance int A calculated amount, populated whenever a new transaction is created. transactions id int Primary key user int Foreign key: references the users table delta int Amount changed description text
Typically, the only reason you want the running balance for a user is that
you want to know how much cash that user has right now without
having to do an expensive join against the potentially lengthy
transactions
table. Therefore, your best option would be to
automatically write that value into the user profile whenever a transaction
is created. This way, no joins are required to get the piece of
information you use most often, and you do away with needless complexity.
All this costs is a regular update, which can be achieved by manually
running a query, or setting up a signal in your framework, or even using
database triggers.
For gods' sake, use them. Seriously, what good is this sort of thing?
if person.status == 1: # Promote them elif person.status in (2,3,4): # Fire them
How is anyone supposed to know what 4
is, without grepping
for "4" backward through your code? This is a much better idea:
# MyModel.py class Person: STATUS_AWESOME = 1 STATUS_ASLEEP = 2 STATUS_HIGH = 3 STATUS_IDIOT = 4 # somefile.py from MyModel.Person import STATUS_AWESOME, STATUS_ASLEEP, STATUS_HIGH if person.status == STATUS_AWESOME: # Promote them elif person.status in (STATUS_ASLEEP, STATUS_HIGH, STATUS_IDIOT): # Fire them
This illustrates the premise that good code is self-documenting. Stick to that, and you're usually alright.
Most modern languages support some sort of library of reusable modules.
In Python-land, it's pip
, Ruby has gem
,
Javascript has npm
, etc. Whatever the resource may be,
please use it. There's nothing like sitting down to
start work on a new project to find that someone has decided to write
custom versions of code that already exists out there.
To give you a rough idea, I've seen company-specific implementations of a database API, database migrations, SSL, and Cron -- all in the same project. Imagine the overhead of having to train new people on the special way we do X here, not to mention the fact that community-maintained modules are very likely to be better tested than whatever someone-in-your-office-who-no-longer-works-there hacked together under a deadline.
It doesn't matter what language you're using. If you're regularly writing
functions or methods that exceed 100 lines, you're doing it wrong. If
you're writing if/else
blocks inside if/else
blocks inside for
loops, inside if/else
blocks,
you're doing it wrong. Short code is readable, and more
importantly, testable code.
if food_type == "cake": for ingredient in ingredients: if ingredient.type == "solid": my_food.solid_ingredients.add(ingredient) elif ingredient.type == "liquid": my_food.liquid_ingredients.add(ingredient) else: my_food.magic_ingredients.add(ingredient) my_food.bake() elif food_type == "sauce": bowl1 = Bowl() for ingredient in ingredients: if not ingredient.to_be_warmed: bowl1.add(ingredient) bowl1.mix() bowl2 = Bowl() for ingredient in ingredients: if ingredient.to_be_warmed: bowl2.add(ingredient) bowl2.mix() bowl2.heat() return bowl1 + bowl2 elif food_type == "meat": ...
def make_cake(ingredients): # Do cake stuff def make_sauce(ingredients): # Do sauce stuff def make_meat(ingredients): # Do meat stuff food_mapping = { "cake": make_cake, "sauce": make_sauce, "meat": make_meat, } food_mapping[food_type](ingredients)
class Food: def __init__(self, ingredients): self.ingredients = ingredients @classmethod def get_by_type(name): return { "cake": Cake, "sauce": Sauce, "meat": Meat, }[name] def make(): raise NotImplementedError() class Cake(Food): def make(): for ingredient in self.ingredients: self._process_ingredient(ingredient) self.bake() def _process_ingredient(ingredient): if ingredient.type == "solid": ... elif ingredient.type == "liquid": ... ... def bake(): # Do whatever class Sauce(Food): def make(): # Do sauce stuff class Meat(Food): def make(): # Do meat stuff Food.get_by_type(food_type).make()
Of course you can (and should) break up the other nested
if/else
and for
blocks into separate methods if
they're complex at all. This is what makes the 3rd option so much better:
you can move that logic into a series of private methods so your code
remains readable and encapsulated from the rest of your project.