I've been rolling code in a professional capacity since 1999, and a quick look at my CV will tell you that I've been around. I've worked for twelve companies in thirteen years, while I did some travelling and gained some experience. I have seen some fantastically horrible code, and I think it's time that I compile a list of what not to do.
There are a lot of books and websites dedicated to good software design, but very few that actually preach good habits, even fewer that point out the really atrocious ones. That's the point of this page. As I run into them, I'll be building a collection here. Maybe one day I'll turn it into a book, but for now, this is it.
Note that the code samples here will all likely be in Python, because it's what I code in most of the time, but they will apply to most object-oriented languages.
I can't believe that this kind of thing is as common as it is among adult programmers. It's like we all need to take a kindergarten class again, with our teacher calmly repeating: "Use your words".
Variables are containers (or pointers) to data, and that data has a nature of it's own. Maybe it's a file name, or blog comment, or byte count for stream parser. Whatever it is, naming it appropriately counts. Take this code snippet for example:
for t in lrts: t.switch("left")
Even if you stare at that snippet for a good long time, and even if you have more context around it, there's no way any non-psychic can know that in that we're talking about trains here.
for train in light_rail_trains: train.switch("left")
Suddenly, the code becomes clear. New developers can sit down at their console and know immediately what this code is doing. The take-away should be that if a new developer has to ask you what your code does, it's wrong.
The very next thing people say in defence of the above code is typically: "But that's why you comment it!". This is wrong-headed. Comments are a means to give an overview of what a lot of code is doing, not a a play-by-play of the code. Comments, if they exist at all, should be limited to short, concise paragraphs, preceding legible, understandable code that can be parsed easily without having to read every line.
while za.tmp < 100: # Wait for the pizza to reach 100C za.bake() # .bake() increases temperature by 1C
while pizza.temperature < Pizza.COOKED_TEMPERATURE: pizza.bake()
The first is both pointlessly cryptic, and pointlessly verbose.
This only really applies to people working with a relational database, but it's so very important. Honestly, if you don't pay proper attention to your database schema early on and throughout your development process, you deserve to have your programmer card taken away.
Most of what they teach you in school about SQL-style storage is about normalisation. Normalisation is important, but too many people stop there and don't consider things like column types. Worse still, some devs get it in their head that splitting their data across multiple tables using 1:1 ratios is always a good idea. It rarely is.
We'll use a transactions system here, because these are all-too-often overly convoluted, leading to far too much pain down the road. Assume that we want to be able to keep track of every transaction made, and still have easy-access to a running balance value.
transactions id int Primary key user int Foreign key: references the users table delta int Amount changed description text balances id int Primary key transaction int Foreign key: references the transactions table balance int Balance after this transaction current_balances balance int Foreign key: references the balances table user int Foreign key: references the users table
There are so many things wrong with this layout it's hard deciding where to begin:
current_balancestable is completely irrelevant. All it does is force a join whenever you want to get a user's balance.
balancestable provides no useful information aside from the balance, which could just as easily been rolled into the
transactionstable, or better yet, the
transactions id int Primary key user int Foreign key: references the users table delta int Amount changed balance int The running balance at this point in the transaction description text
This does away with the redundant tables and keeps everything where you need
it, but it leaves one bit of redundancy: the
It's really just the result of a replaying of the transaction log. Still,
this is a step in the right direction.
users ... balance int A calculated amount, populated whenever a new transaction is created. transactions id int Primary key user int Foreign key: references the users table delta int Amount changed description text
Typically, the only reason you want the running balance for a user is that
you want to know how much cash that user has right now without
having to do an expensive join against the potentially lengthy
transactions table. Therefore, your best option would be to
automatically write that value into the user profile whenever a transaction
is created. This way, no joins are required to get the piece of information
you use most often, and you do away with needless complexity. All this
costs is a regular update, which can be achieved by manually running a
query, or setting up a signal in your framework, or even using database
For gods' sake, use them. Seriously, what good is this sort of thing?
if person.status == 1: # Promote them elif person.status in (2,3,4): # Fire them
How is anyone supposed to know what
4 is, without grepping
for "4" backward through your code? This is a much better idea:
# MyModel.py class Person(object): STATUS_AWESOME = 1 STATUS_ASLEEP = 2 STATUS_HIGH = 3 STATUS_IDIOT = 4 # somefile.py from MyModel.Person import STATUS_AWESOME, STATUS_ASLEEP, STATUS_HIGH if person.status == STATUS_AWESOME: # Promote them elif person.status in (STATUS_ASLEEP, STATUS_HIGH, STATUS_IDIOT): # Fire them
This illustrates the premise that good code is self-documenting. Stick to that, and you're usually alright.
Most modern languages support some sort of library of reusable modules.
In Python-land, it's
pip, Ruby has
npm, etc. Whatever the resource may be,
please use it. There's nothing like sitting down to
start work on a new project to find that someone has decided to write
custom versions of code that already exists out there.
To give you a rough idea, I've seen company-specific implementations of a database API, database migrations, SSL, and Cron -- all in the same project. Imagine the overhead of having to train new people on the special way we do X here, not to mention the fact that community-maintained modules are very likely to be better tested than whatever someone-in-your-office-who-no-longer-works-there hacked together under a deadline.