Searching for Tao, 7573 Kilometres from Home

December 31, 2016 12:38 +0000 | Family Grandpa Programming 1

I built a thing for my family this Christmas and I wanted to post about it briefly.

If you're one of the few people dedicated enough to follow this blog, you'll know that my grandfather died last year, and that he was sort of the family videographer. What you likely don't know however is that this year, on my trip home I acquired his entire collection of DVDs that he'd been accumulating over the years.

This some really old stuff:

Around the Christmas tree when I was 3 or 4 years old
My dog learning tricks for the first time
My parent's wedding
My graduation
My mother as a child in Romania
My grandparents, so much younger, with friends in Romania
My niece, Violet

It was an amazing collection spanning 4 generations over 39 DVDs, and I spent a few days on that trip home ripping every last one of the disks onto a portable hard drive so I could take the raw data home for a special project.

Well that project is now finished, so for those of you who don't care about the technical aspects, here's the link. I shared the URL with my family by email on Christmas day since I was on the other side of the world for the holiday festivities this year, but all in all, it seems to have gone over well.

My father has suggested that I expand on the collection with my own videos in the future -- I may just do that, though I'm more of a still photos guy. We'll see.

The Technical

This whole thing was a HUGE pain in the ass, so I want to document the process, perhaps if only for future websurfers looking to do something similar.

The Problem

The videos were in DVD format. Thankfully, it was digital, but it's certainly not web-friendly. The video data needed to be ripped from the disks and compressed into a web-friendly format that was high-quality enough to preserve the video, but in a file small enough to stream to Canada-quality internet connections.

Also, the DVDs were terribly organised and not indexed in any way. The disks often had multiple title tracks, sometimes duplicate tracks, and there were tracks that just contained garbage data.

Oh, and there was a time constraint. I only had the disks for a few days when I was in Canada. I wasn't going to take them back to the UK with me.

The Process

It was basically done in three stages:

Raw DVD > .iso file > .webm file

The .iso file step was just a clean & easy way to back up all of the DVDs without having to worry about accidentally missing something while I was hurriedly trying to get through them all in Canada. By turning 39 DVDs into 39 files on a USB drive, I could be sure that I wouldn't accidentally lose data during the ripping process.

As it turns out, this was a good plan, since it took a few weeks of tinkering with this project before I realised that some disks had multiple titles on them.

The creation of the .iso files was easy. I just put the disk in the USB DVD drive I brought with me and typed this:

$ dd if=/dev/dvd of=/path/to/usb/hard-drive/disk-00.iso

Waited about 20min, then took the disk out, and repeated this... 39 times.

The creation of the actual video file on the other hand was the big problem. There are lots of sites out there that claim to tell you how to do this, and very few of them have anything helpful. I think that this is because the end goal is rarely understood up front. Sometimes people are trying to encode DVDs into a high quality file for local playback, and the settings for that are rather different from what someone would want to do to encode for a web-friendly format.

There's also a wide variety of tools out there, most of which are buggy, unsupported, don't have a port for Gentoo, or just plain suck. The most common recommendation I found was for Handbrake, which is an impressive GUI for ripping videos but for me:

It didn't encode files that were high enough quality given the file size
It didn't make web-friendly formats. Even when you tick the box to make it web-friendly, the output file doesn't stream in Firefox. I didn't test other browsers.
It was terribly slow to find all the tracks, apply the settings I wanted and then wait to see if things panned out. There's no command-line interface to make things easier.

All of this lead to a lot of frustration and weeks of tinkering, finally leading me to a site that gave me the magic ffmpeg incantation to generate a web-friendly file:

$ ffmpeg \
  -i /path/to/input.mp4 \
  -vpre libvpx-720p \
  -pass 1 -passlogfile ffmpeg-18 -an -f webm \
  -y /path/to/output.webm && \
  ffmpeg -i \
  /path/to/input.mp4 \
  -vpre libvpx-720p \
  -pass 2 -passlogfile ffmpeg-18 -acodec libvorbis -ab 100k -f webm \
  -y /path/to/output.webm

Of course this assumed a .mp4 input file, and I wanted to rip straight from the .iso, so after much digging, I discovered that ffmpeg has a means of concatenating (chaining) video inputs and it can read straight from a DVD's .VOB file. With this nugget of knowledge, all I had to do was mount the .iso locally and compile a list of files conforming to this naming convention:

VIDEO_TS/VTS_01_#.VOB

With that information, I wrote a quick shell script that ended up generating a great big queue file of commands that look a lot like this:

ffmpeg -i \
'concat:/mnt/grandpa/18/VIDEO_TS/VTS_01_1.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_2.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_3.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_4.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_5.VOB' \
-vpre libvpx-720p -pass 1 -passlogfile ffmpeg-18 -an -f webm \
-y /home/daniel/Projects/Grandpa/htdocs/vid/18.webm && \
ffmpeg -i \
'concat:/mnt/grandpa/18/VIDEO_TS/VTS_01_1.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_2.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_3.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_4.VOB|/mnt/grandpa/18/VIDEO_TS/VTS_01_5.VOB' \
-vpre libvpx-720p \
-pass 2 -passlogfile ffmpeg-18 -acodec libvorbis -ab 100k -f webm \
-y /home/daniel/Projects/Grandpa/htdocs/vid/18.webm

Unfortunately, ffmpeg doesn't really do threading very well, and the prevailing advice out there appears to be that you should just thread the process yourself rather than ask ffmpeg to try to use all your CPUs itself. For this bit, I wrote a very simple paralleliser in Python and magically, all of the cores on my super machine could crunch Grandpa's videos, 16 at a time.

Finally, I wrapped the whole thing in a simple script that mounted all of the .isos simultaneously and then ran the paralleliser, and ran that in a tmux session so I could get on a plane and Fly to Greece while my computer did its thing for two days.

While I was in Athens, I spent a day or two fiddling with the site itself, getting video.js to work the way I wanted it to and playing with Select2 to try and get an interface that the non-technical people in my family could follow. I wish I had better skills in this area 'cause frankly, the site is kinda ugly, but at least it's functional now.

So that's it. I hope that one day, someone will find this stuff useful. The ffmpeg incantations were especially difficult to find and assemble, so I figure that'll help someone eventually.

July 17, 2016 19:23 +0000 | Programming 3

Back in 2010, Stephanie came to me with an idea that she'd been batting around with a friend of hers. Now that Gowalla was dead, we were all looking for a replacement and she had the idea to create one of our own: a geolocational game for finding and collecting monsters.

The mechanics were simple: there are little monsters all over the real world, and you could use your phone to find them, fight them, and capture them. Importantly, the monsters are native to their environment: water monsters only appear near water, darkness monsters only at night, bosses only when Jupiter is above the horizon, etc.

I thought it was a brilliant idea, and started hacking on a server-side implementation. It turns out that this is a lot harder than it sounds. There were all sorts of concepts that were totally new to me:

GIS extensions for PostgreSQL to handle geographical queries
Timezone information (which varies widely) by geographic point
National border information had to be loaded and queried for every request
Ladder logic for all of the monsters' experience levelling
Figuring out the position of various celestial bodies based on the time & place
Figuring out how to tell whether the user was actually near water or green space (that was fun)
Building a vagrant box (this is pre-Docker) so Stephanie could have an environment to work with

All of this was solved, by me, over the course of many years, but I never had enough time to put into it, and there was of course the considerable barrier that neither of us had any illustrative talent. When 2015 rolled around and Nintendo announced that it was going to build Pokémon Go in partnership with Niantic, I accepted defeat and stopped tinkering with it altogether.

Except... Pokémon Go kinda sucks. More to the point, it doesn't scratch the itch Stephanie and I wanted scratched when we started Spirithunter (working title). Niantic decided to go the multiplayer route, which is (a) more complicated, and (b) pretty much sucks the fun out of the game for many of us. If you want to collect monsters and don't fancy getting your butt kicked by "professional" players out there with all of the best gear and time in the world, then with Pokémon Go you're pretty much out of luck.

By its nature, Pokémon Go forces people to compete with each other rather than just passively have fun, and I think that sucks.

So that more-or-less leaves Spirithunter as still viable really. I think I'm going to give it another whack over the next week or so. I believe that last I left it it was running Django 1.5 with an older version of DRF so I'll upgrade those first. Then I'll see about getting it up & running on my Linode.

I've also unlocked the repository so that other people can take a look if they're curious. At present, the copyright is still mine, though I might just GPL it if I decide not to pursue this as a corporate venture.

Anyway, if you're reading this and think that you'd like to contribute -- especially if you have artistic skills and want to see your art brought to life in an indie game -- drop me a line. There's 7 billion people in the world, more than enough to support two geolocational games, so why cant ours be one of them?

May 15, 2013 22:20 +0000 | Blogger Programming 1

So I hope you've noticed by now, but this site has been completely rewritten.

Would you believe that this site is almost 10 years old? I've been running a personal blog since December of 2003. The first few incarnations were written in PHP, and the most recent two were both written in Python, using Django as the framework.

Some notes about this version:

Complete markdown support in blog posts and comments. Honestly, I got tired of writing HTML blog posts. The text was getting to be illegible so I wrote this version to make my life easier. Then I added markdown support to the comments so you can make prettier comments. Go ahead, give it a shot :-)
A prettier image gallery - The layout is prettier, with larger thumbnails, and some fun layout tricks too. It might be a bit buggy with non-standard image sizes, but I'll work those out.
A smarter image gallery - Now supporting multi-upload, and it'll auto-generate the various sizes... but only if it has to. If I want to photoshop some images before I post them, this makes it so much easier.
Prettier interface and shiny icons!
Expanded projects pages

...and that's about it. You'd think that given the fact that it took me so long to write, that there'd be more to it, but this is it. Well, that and the code is just... better :-) For those who might be interested, here's a list of some of the packages this site is using:

Everywhere
- Python
- Django
Frontend
- Twitter Bootstrap - A smooth and consistent set of styles
- Font Awesome - Sexy icons
- Showdown - a markdown implementation in Javascript
Backend
- django-crocodile - See my GitHub project page for more information
- django-autocomplete-light
- django-extensions
- markdown
- django-model-utils
- tweepy

And sadly, I have yet to add a Django Pony anywhere yet. I'll do that soon though. There's also a few kinks to work out with the image uploader, hence the delay in posting my Warsaw/Krakow shots. I'll get to them soon enough though.

Anyway, if you haven't already, please poke through the site and let me know if you run into any problems. Now that I've made it (slightly) easier to post, I hope to be writing more in the future.

November 15, 2010 22:19 +0000 | Drupal Programming Software 14

I've been doing Drupal development on-and off for nearly three years now and it's always been frustrating. I'm a pretty vocal and animated kind of person too, so my co-workers soon came to know me as the anti-Drupal guy, which can be pretty rough when your employer has chosen to standardise on the platform. Now that I'm finally out of the Drupal world, I wanted to write a little about the platform, specifically speaking to its weaknesses and failures.

My hope here is two fold: (a) that this post serve as a means of communicating to the thousands of frustrated developers out there that they're not alone in their pain, and (b) that perhaps some of this post will help development shops choose Drupal where appropriate and other technologies when it is not.

For the Drupal fan(girl|boy)s, I ask only that you try to read this with an open and constructive mind. While I may rant and curse about Drupal in my Twitter feed, I've tried very hard to make this an unemotional, hopefully useful post about something I've spent a lot of time thinking about and working with.

Drupal Ideology

It seems to be a mantra within the community: "You don't even need to write code". The Drupal ideology is user-centric, choosing ease-of-use over performance at every turn. There's nothing wrong with this of course, so long as your goal is to let unskilled people make websites. However if your priority is a performant application capable of handling a lot of traffic, you're going to have a number of problems.

Some examples of prioritising user-focus over performance:

Silent failures are the bane of any developer's existence. It's important to know when a variable isn't defined, or that writing a record to the database failed, or that a file didn't upload properly. Drupal suppresses such messages by default, and as a result nearly every contrib module in the community is so riddled with errors and warnings that development with these messages enabled is near impossible.
Views, the de-facto standard way to store and retrieve data from your database, writes queries to the database, so that in order to perform a query against the database, you must first fetch the query from the database. Similar inefficiencies can be found in other "standard" modules like CCK and Panels.
Drupal relies almost entirely on caching in order to function at all. Without caching, a method usually reserved for high to extreme traffic situations, Drupal can't handle even a small number of concurrent visitors. Indeed, some projects I've seen have taken more than 10minutes to load a single page, even in development where there was only one connection in use.

Drupal Magic

It's a term celebrated by many in the community. The idea being that Drupal does a mountain of work for you, so you don't have to worry about it. The only problem is that when you're trying to build a finely-tuned application, most of this magic either gets in the way, or even works against you. You get 80% of the way there with Drupal and its contrib modules, and then spend three months fighting the whole application, undoing the damage it's done, just to get what you need out of your website.

The hook-dependent system requires and fosters this anti-pattern. Re-using code often means unpredictable, site-wide changes. A property is written in module X, overwritten in module Y, and altogether removed in module Z, and there's no way to be certain that these functions will execute in a predictable order.

This problem is notably worse when it comes to new developers on a project, since they will undoubtedly not be privy to the magic that is running under the hood, and will have a difficult time discovering it on their own. To those who will answer this with "the project simply needs better documentation", I respectfully suggest that a good code base is easy to understand, and doesn't require a manual that is usually out of date.

To work with Drupal Magic is to attempt to produce useful code against an unordered, uncontrolled, grep-to-find-what-is-going-on-dependent architecture.

Drupal Community

For all the victories in community engagement Drupal has achieved (a massive, diverse and engaged membership), it's the glaring failures that make the whole project a miserable situation for developers. I've already mentioned the standardising on inefficient modules, but I haven't talked about the mountains of really horribly written code yet. Drupal Core, for what it does, is pretty efficient, but too many contrib modules are written by inexperienced developers, or are simply incapable of scaling to enterprise-level capacity. The result of this is that non-developers (managers, sometimes even clients) will point to the functionality of module X and insist: "don't redesign the wheel, just use that", and you spend the next three weeks trying to work around the poor design of said module, eventually being forced to write garbage that talks to garbage.

Often the perceived strength of the community is Drupal's greatest weakness. Drupal is promoted based on its theoretically infinite feature set, but the reality is that in order to use every one of those contrib modules in your site, the memory footprint will be massive, the stability suspect, and the performance abysmal. And gods help you if you try this on a site with millions of users or a similar number of content nodes.

Drupal Establishment

None of this is a problem however if Drupal is used where its features and shortcomings are both understood and accepted as the nature of the platform. Drupal is a great tool in some situations and a horrible burden in others. Sadly, this has not yet sunk in with many of the decision-makers in the web development community. Drupal is being used and promoted as a solution hammer, with every potential development project, a Drupal-shaped nail.

This has a number of negative outcomes, the most dangerous of which is a lack of skill diversity in developers. Companies that insist on Drupal-centric development are in fact promoting ignorance of alternatives that might do a better job and that hurts everyone. Unless developers at these companies take it upon themselves to spend time outside of their 8-12 hour work day to write code for a different platform or language, this Drupal dependency will force their non-Drupal skills to atrophy, limiting their ability to produce good code in the future.

Conclusion

I'm finally at the end of my admittedly unenthusiastic involvement in the Drupal community. Whether the Drupal shops out there read this isn't really up to me, but I hope that this manages to help some people re-evaluate their devotion to the platform. Comments are welcome, so long as they're constructive (I moderate everything), but I'm not going to get into a shouting match on the Internet. If you think I'm wrong, we can talk about it in 5 years.

November 13, 2009 17:51 +0000 | Programming Python Software 0

I wrote something like this some time ago, but this version is much better, if only because it's in python. Basically, it's a script that highlights standard input based on arguments passed to it.

But how is that useful? Well imagine that you've dumped the contents of a file to standard output, maybe even piped it through grep, and/or sed etc. Oftentimes you're still left with a lot of text and it's hard to find what you're looking for. If only there was a way to highlight arbitrary portions of the text with some colour...

Here's what you do:

$ cat somefile | highlight.py some strings

You'll be presented with the same body of text, but with the word "some" highlighted everywhere in light blue and "strings" highlighted in light green. The script can support up to nine arguments which will show up in different colours. I hope someone finds it useful.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys,re

colours = [
    "\033[1;34m", # light blue
    "\033[1;32m", # light green
    "\033[1;36m", # light cyan
    "\033[1;31m", # light red
    "\033[1;33m", # yellow
    "\033[0;32m", # green
    "\033[0;36m", # cyan
    "\033[0;33m", # brown
    "\033[1;35m", # pink
    "\033[0m"     # none
]

args = sys.argv[1:]

# Strip out arguments exceeding the maximum
if len(args) > 9:
    print("\n%sWARNING: This script only allows for a maximum of 9 arguments.%s\n\n" % (colours[4], colours[9]), file=sys.stderr)
    args = args[0:8]

while True:
    line = sys.stdin.readline()
    colour = 0
    for arg in args:
        line = re.sub(
            r"(%s)" % (arg),
            "%s%s%s" % (colours[colour], "\g<1>", colours[9]),
            line
        )
        colour = colour + 1
    if line == '':
        break
    try:
        print(line.rstrip("\n"))
    except:
        pass

July 17, 2009 00:01 +0000 | Programming Software Twitter 0

Wil Wheton posted to Twitter today a request for an easy way to fetch all of one's tweets and store them locally. Someone might want to do that if they want a personal archive, or if they're interested in porting their data over to a Free implimentation like Laconica. Whatever your reasoning, here's a quick and dirty way to do it:

for i in {1..999}; do
  curl -s "http://twitter.com/statuses/user_timeline.xml?screen_name=your_screen_name&count=200&page=$i" | grep '<text>' | sed -e 's/^ *<text>\(.*\)<\/text>/\1/'
  sleep 2
done

Just hit "ctrl-c" when you hit your first post ever.

July 08, 2009 22:25 +0000 | PHP Programming Python 0

I wrote something rather fun today and I thought that I'd share it here. It's a Python module that you can use to interact with PHP products. Specifically, it's a reproduction of PHP's http_build_query() and parse_ini_file() functions that act as PHP does according to PHP's own way of doing things.

This means that if you've written an API server (as we have) in PHP that makes use of things like the above, you can interact with it using Python as your scripting language with little effort.

Examples:

from php import parse_ini_file

config = parse_ini_file("/path/to/config.ini")
print config["sectionName"]["keyName"]

This would give you the value for keyName in the section called sectionName in your config.ini file.

from php import http_build_query

somedata = {
  "keyname": "valuename",
  "otherkey": 123,
  "anotherkey": [1,2,3,{"seven": "eight"}]
}
print http_build_query(somedata)

This would give you:

otherkey=123&keyname=valuename&anotherkey[1]=2&anotherkey[0]=1&anotherkey[3][seven]=eight&anotherkey[2]=3&

The code was fun to write, and I'm guessing that it'll be useful to others so I'm posting it here. If you do end up using it, lemme know by posting a comment here eh?

You can download it here: php.py.

When I mentioned this to some other coworkers, they pointed out that I'm not the only one trying to get some of PHP's odd functionality into Python. Another developer has mimicked PHP's serialize() functions in the form of a Python module. I wonder if there are any other cases where this kind of stuff might be useful.

January 09, 2009 19:10 +0000 | PHP Programming 2

I stumbled upon an ugly PHP bug today and thought that I would share. While PHP is supposed to be a untyped language, this isn't always the case. The following code snippet for example does not do what you might expect:

switch ($output->status)
{
  case 0: $output->status = 'fail'; break;
  case 1: $output->status = 'ok';   break;
  case 2: $output->status = 'stub'; break;
}

With this code, passing in a string such as "ok", $output->status is set to "fail". This is due to what I assume to be a bug in PHP's lack of keeping everything untyped. For some reason, it would seem that PHP parses $output->status as an integer (therefore all strings return as 0) and then compares them to the list. If however you change the cases to strings:

switch ($output->status)
{
  case '0': $output->status = 'fail'; break;
  case '1': $output->status = 'ok';   break;
  case '2': $output->status = 'stub'; break;
}

Everything works as expected. Pretty lame if you ask me, but there it is.

October 20, 2008 17:54 +0000 | Geek Stuff Programming 3

Big thanks to Corey for brightening my day with this one:

Disbelief
- "Who wrote this!?
Anger
- "I'm not cleaning this up!"
Bargaining
- "Okay, we'll fix up this module if you promise we'll just rewrite everything else."
Depression
- "This is never going to get any better."
Acceptance
- "I'll just create a wrapper..."

September 24, 2008 16:01 +0000 | Maps PHP Programming Software 3

I just read a nifty post on monkeycycle about how to geocode an spreadsheet with free tools from Google and Yahoo and it occurred to me that this is probably the kind of thing people go looking for so I thought that I'd post my latest shiny new bit of code here.

I call it a cascading geocoder. The idea being that most of the time, a single geocoding service is pretty good, but sometimes it goes down, and other times it can't understand the address. For the purposes of the project I'm working on, this wasn't permissible, so I wrote some code that attempts to code an address first with Google, then if that fails, it uses geocoder.ca's engine.

It's fully object oriented and very clean. It's also GPL. Download it here if you're interested ^_^