Blog

September 26, 2024 20:11 +0000  |  Django Docker Python Software 3

Update: there's now a companion repo to serve as a demonstration of the below.

You'd think that this topic would have been done to death, but given that every job I've started in the past 10+ years has used Docker differently (if at all) to varying degrees of success, I feel like we need some sense of consensus around how to Do This Right™. And, as someone with an ego the size of a small moon, I'm here to decide what's right... on my own blog anyway.

The argument here is that the use of Docker and various tooling shouldn't be unique to any particular project, that this sort of thing should be so standard it's both common and boring to even think about. My experience tells me that we're not there yet though, so this is just me making the case for what I think constitutes a good setup.

Thesis

The main argument I want to make is that if you're using Docker, you don't write software anymore, but rather you build immutable images. Images that are developed locally, tested in CI, and deployed to production, but importantly, the image at each of these stages is exactly the same: reproducible at every turn. There's little value in developing and testing one thing if what you roll out to production is something else entirely. Say it with me:

Test what you develop. Deploy what you test.

There are a lot of advantages to this model, strongest of which is probably simplicity. It greatly reduces the number of tricks & hacks you need to make your various environments work, thereby reducing surprises (and extra code!) at all stages. It does however require a shift in how you might be used to building things. That's ok though. We'll get through this together ;-)

The 12 Factor Model

Maybe you've heard about it and maybe you haven't, but the 12-factor app is a design pattern that's uniquely suited to Docker-based systems. The gist is that your application code and behaviour shouldn't change based on its environment. At it's simplest, stuff like this:

if ENVIRONMENT == "prod":
      do_something_only_production_does()

...shouldn't happen. How exactly do you test that? How do you even demo it locally? You're introducing unpredictable behaviour into your application that can only be revealed in production.

Often external services like ElasticSearch or Kafka are a culprit here. We think to ourselves: "This is development. We don't need an industrial grade service to handle the little stuff I'm doing on my laptop." ...and so we stub out these services with dummy functions or bypass them altogether with hacks like the above.

The problem of course is that in doing so you've cemented a blind spot into your application. Your software behaves differently in one environment than it does in another, so when it breaks in production, you have no reliable way to debug it, let alone catch it before it reaches production in the first place.

The work-around for this is to reproduce your production environment locally as best you can — a feat that wasn't really possible pre-Docker. Maybe you could stand up a database with your application, but a queue, Kafka, or a built-in-house-by-another-team-API-server? Without containers, that's a fool's errand.

But we have containers, so let's use them! When you're developing locally, you can use Docker Compose to spin up everything you use in production. Sure, you won't have 64 instances of the webserver and 128 workers, but one of everything relevant is likely enough, and if you ever want to track down some of the ugly race conditions and async issues, you can always increment the replicas for a single service to 2 or 3.

So what does that look like? Well here's a sample compose.yaml file for a common pattern you see in Djangoland:

name: myproject

## Note that we're hard-coding credentials here with some very insecure values.
## That's ok, because these values are set to something sensible in production.
## What's important is that (a) the way your application behaves is consistent
## (looks for credentials and uses them to do something), and (b) the values are
## all known, so standing this baby up requires no special knowledge (more on
## this in the next section).

x-base: &base
  build:
    context: .
  environment:
    DEBUG: 'True'
    ALLOWED_HOSTS: '*'
    SECRET_KEY: secret
    DB_URL: postgres://postgres:postgres@database/postgres
    CACHE_URL: redis://redis:6379/0
    QUEUE_URL: redis://redis:6379/1
    ELASTICSEARCH_URL: http://elasticsearch:9200
    ELASTICSEARCH_USER: elastic
    ELASTICSEARCH_PASS: elastic
    BULK_UPLOAD_BUCKET: mybucket
    BOTO3_ENDPOINT: http://localstack:4566/
    AWS_ACCESS_KEY_ID: XXX
    AWS_SECRET_ACCESS_KEY: XXX
  working_dir: /app/src
  volumes:
    - .:/app
  restart: on-failure
  depends_on:
    - redis
    - database
    - the_other_database
    - elasticsearch
    - localstack

services:

  # The database.  Our project is using PostgreSQL 16 in production, so
  # that's what we use here.
  database:
    image: postgres:16-alpine
    restart: always
    environment:
      POSTGRES_PASSWORD: postgres

  # There's another database running an older version that we pinky-swear
  # we're going to update soon, but For Right Now™ it's doing its own thing
  # with behaviour that differs from our primary db.
  the_other_database:
    image: postgres:11-alpine
    restart: always
    environment:
      POSTGRES_PASSWORD: postgres

  # We cache things, but we also might use this to queue stuff.
  redis:
    image: library/redis:7.4-alpine

  # Apparently this project needs some industrial strength search magic.
  elasticsearch:
    image: elasticsearch:8.15.1
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms256m -Xmx1g"
      - xpack.security.enabled=false
      - ELASTIC_PASSWORD=elastic

  # Production is on AWS, so this is how we simulate S3
  localstack:
    image: localstack/localstack:1.2
    environment:
      SERVICES: s3:5002
      LOCALSTACK_HOSTNAME: 127.0.0.1
      BULK_UPLOAD_BUCKET: mybucket
    stop_grace_period: 1s

  # Our webserver
  web:
    <<: *base
    ports:
      - '${PORT:-8000}:8000'
    command: /app/src/manage.py runserver '0.0.0.0:8000'

  # No point in having a queue if you don't have workers for it.  Note that we
  # can set ${WORKERS} in our environment to spin up multiple workers, but it
  # defaults to 1.  This is handy when you want to test out how async stuff
  # might handle a large workload.
  worker:
    <<: *common
    deploy:
      replicas: ${WORKERS:-1}
    depends_on:
      - redis
      - database
    command: /app/src/manage.py rqworker default

It looks daunting, but what you see here is a declaratively-defined, canonical way for you to stand up A Very Complicated Project with everything you need to make your application work just as it does in production (whether using all of these tools together is a good idea or not is a whole other conversation). Critically, now you can do this on anyone's laptop, and even in CI (we'll get to that).

The big win here is that you don't have to be thinking things like "In production, my app will do this, but in development that feature's not enabled, so this other thing will happen". No: if your app uses ElasticSearch, then your local development experience can now account for all the quirks of interacting with that system. More importantly, your tests can be written to interact with them too.

It's important to remember though that this isn't religion. Big services like Oracle or Kafka for example can be beasts to stand up locally and may only have very tangential relevance to your project. In such cases it may make more sense to take on the risk of stubbing-out the service for development and tests, but these really should be the exception.

Developer Tooling

This is where I tend to run into the most pushback on this pattern but it's also the part that can greatly reduce headaches. Are you ready? Your immutable image includes everything you need for development: linters, tests, and debugging modules. I will sometimes even include a few useful system tools like netcat or ping, as well as a fancy prompt.

None of these things are necessary for production. They are at best, image bloat, adding anywhere from 100 to 200 MB of useless code to your image that's never used in the wild. Why then, would we want to include it?

Part of it goes back to the develop → test → deploy paradigm. If these tools exist in one stage, they must exist in all of them unless you're willing to deploy something you didn't test. Mostly though, this pattern saves you a pile of pain around getting everyone developing on your project with as few problems as possible.

One of the first questions I always ask on a new job is "How do you run the tests?". This is because everyone does it differently and many companies bake assumptions into the process that depend on things like your OS or Python version, etc. As a Python developer, I see instruction manuals like this all too often:

  1. Install homebrew
  2. brew install <some system dependencies>
  3. Install Pyenv
  4. Switch your Python version to 3.10
  5. Setup a virtualenv and pip install -r dev-requirements.txt
  6. Run our special script that assumes you're running everything on a Mac. It'll set some environment variables, create some folders and make a bunch of assumptions about the location of various files and their permissions on your system.

If you're on Windows or Linux of course, you're typically met with a shrug, but even if you are on a Mac, this is a rather demanding ask: make a bunch of changes to your entire system with homebrew (good luck if you've got multiple projects with different version requirements) and more importantly you're not testing the image. You're deploying a Debian image with dependencies of various versions that are the result of apt install <package name>. These system dependencies (and the system's behaviour when interacting with our Python dependencies) are unique to the environment and by developing outside of that controlled environment, you're inviting surprises.

Here's a fun example of this that I ran into on a recent job. The company was mostly comprised of Mac-based developers, but there was a small group of Linux nerds as well of which I was a part. The project we were working on required an antiquated version of a system-level XML parsing library, so in order to even build the project's Python dependencies, you needed to have version 1.x of this library installed on your system. This was fine for the Macs, since there was apparently no way to upgrade this library anyway, but for some of the Linux machines (like my beloved Arch) we had to pin the version of this library to the old value, lest our package manager upgrade it and break our project.

It doesn't stop there though. We also had to install a bunch of spy/management-ware on our machines which (as you might have guessed by now) complained at you if you were running an old version of any package. So every time I had to rebuild the Python dependencies, I had to downgrade this XML library, install the dependencies, then re-upgrade it so I could actually use my computer without it barking at me about updates. Good times.

It's just so needlessly complicated. Imagine if those complicated instructions were as simple as:

  1. docker compose up

The Python version is guaranteed to the be same as production. The OS and system dependencies are also guaranteed to be the same. We don't care what OS you're running, or what version of Python you might have locally, and we're not asking you to change any of your system to make this project go. You just say: "start the project in its own environment".

It's liberating how simple this is. It's even more exciting when you consider that you can now run unit tests for multiple languages (your NodeJS-based front-end will have its own set of special behaviour constrained to its container) and you can run integration tests between the various services -- no demands or expectations made on the host computer whatsoever.

The same goes for the linters. What version of ruff did you install on your laptop? How 'bout mypy? Is this up to date with the project requirements? What if running the linters was as easy as:

$ docker compose exec web /scripts/run-linters

How much less hassle would you and every new developer on your project have? How many hours of senior developers debugging junior setups could be reclaimed? This alone is a goal worth working toward: a one-line way to ensure Shit Just Works™.

The CI

The nice thing about building an accurate, consistent, and reproducible environment for local development is that you can reuse that environment in the CI. I've seen this far too many times: the dev team has put in all the effort to dockerise their project and may even use Docker both in development and production, but for some reason they run the tests in whatever broken Ubuntu container their favourite CI provider offers. The whole CI process looks something like this:

  1. Check out the code into an Ubuntu container
  2. Build a virtualenv
  3. Use apt to install one or two things
  4. Install the Python dependencies into the virtualenv with something like pip or uv
  5. Stand up a Docker container of Postgres (What version? Probably "latest".)
  6. Run the tests
  7. Build the Debian or Alpine-based Docker image
  8. Push it to the repo for deployment

This is madness. You just tested all of your code in an entirely different Linux distro using different versions of system dependencies and ran your tests against only one of your external dependencies (the database), and there's no guarantee that it was even the right version. Then you throw all that out and build an entirely separate Docker image with different... everything and you deploy that.

To top this all off, it's extra work! Someone had to sit down and write those instructions above, accounting for the differences between development and production and trying to make a best-effort middle road between the two. Someone has to keep an eye on CI environment versions, and worry about keeping dependencies in sync between the Docker container and what's installed in CI.

Why do this to yourself when you've already gone through the effort of building a working Compose environment. Just use that! You can build your image, then test that your image behaves as it should, and then push that image out for deployment.

Here's what that looks like with GitLab's CI system:

## We use "Docker in Docker", since GitLab CI drops you into a Dockerised
## environment to begin with
services:
  - docker:dind

stages:
  - build_and_test

build_and_test:
  services:
    - docker:dind
  variables:
    DOCKER_HOST: tcp://docker:2375
    DOCKER_DRIVER: overlay2
  stage: build_and_test
  image: docker

  # We log into the Docker repo to pull down the latest image that was
  # successfully built.  That way when we build our image here, we can
  # take advantage of layer caching which reduces our build time to
  # seconds.
  before_script:
    - docker login --username $CI_REGISTRY_USER --password $CI_REGISTRY_PASSWORD $CI_REGISTRY
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - docker build --cache-from $CI_REGISTRY_IMAGE:latest --build-arg BUILDKIT_INLINE_CACHE=1 --tag $CI_REGISTRY_IMAGE:latest .

  # This is a simple project, so we're running the linters followed by the
  # tests in the same job.  We could split this up, but I'm trying to keep
  # it easy to follow.
  script:
    - docker compose run web sh -c "/scripts/run-linters"
    - docker compose run web sh -c "/scripts/run-tests"
    - docker push $CI_REGISTRY_IMAGE:latest

Note that we're running exactly the same stuff here as we did in development. That's by design. It should be easy and obvious to a developer just what's running in CI and how they might reproduce a problem locally, but more than anything else, they should be able to know:

"If the tests pass locally, they'll pass in CI"

There's something deeply satisfying about knowing this with a high degree of confidence. It's also a lot cheaper for a company that pays a SaSS CI provider.

Why You Should Care

The big advantage to this model is simplicity.

  • No special documentation for getting your development environment working.
  • No special documentation for running the tests or linting.
  • No weird hacks to run the tests in CI.
  • It works everywhere, on any system, without hassle.
  • It doesn't depend on (and therefore potentially interfere with) the host system.

This is a massive net reduction in code and documentation as well as in confusion and headaches for the entire development team. How does the project work? The same way, regardless of where it's being used. You reduce the chances that you'll miss something in production or CI because development is as close to production as is technically possible and that saves you a mountain of pain as well.

Mostly though, it'll make me happy. Just once I'd like to sit down at a project that doesn't require me to globally install a bunch of tools and libraries, or open some network ports, or use homebrew, or Pyenv, or any other virtualisation-before-we-had-containers nonsense. I Have A Dream my friends, and that dream is that one day I'll sit down at a new project and the instructions will be this simple:

To stand up the project:

    $ docker compose up

To run the tests:

    $ docker compose exec web /scripts/run-tests

Or just use our alias:

    $ make tests

Until that day comes though, I'm going to keep pointing people to this blog post.

May 30, 2024 04:27 +0000  |  Django Python 0

It's taken the better part of six months, working a few hours in the evenings when I can scratch the time together, but my latest project is finally finished.

Named for the famous Tim Berners-Lee quote, django-cool-urls is a little library that lets you link to a web page or embedded video from your site, and should that link ever die (the site removed the page, or just died altogether, etc.) your site will swap out the external link for a local copy.

Just swap out this:

<a href="https://example.com/">...</a>

for this:

<a href="{% cool_url 'https://example.com/' %}">...</a>

I hear this sort of thing is great for SEO, but I mostly wrote it 'cause I was tired of going over old blog posts that linked to things that no longer exist, leaving a post stripped of context.

So, after hacking something together to work inside my site, I broke it out into a proper Django module mostly 'cause I thought it might be useful to others... well that and I like to build pretty things, and this code is very pretty.

Anyway, it's all up there now, GPL-licensed for the world to use or ignore. Check it out if you're so inclined:

December 30, 2023 22:53 +0000  |  Blogger Python 0

71 files changed, 2242 insertions(+), 1405 deletions(-)

It's been a very long time since I started working on supporting video in my former image gallery, but it's finally finished. This site has had a substantial overhaul, dropping the old easy-thumbnails library in favour of rolling my own thumbnailer that stores the thumbnail locations on the Media object. I also employed some light-touch polymorphism to support rendering out a page of media to include both images and video. There were a bunch of backflips required (tinkering with ffmpeg) to extract metadata from videos as well as to thumbnail them, and the geometry I had to fiddle with to make it look just right wasn't fun either.

I also dropped the old js Packery library in favour of CSS grid and then spent literally weeks testing migrations since we're talking about around 77GB of images and video that I not only didn't want to lose, but I also wanted to interrogate further for higher quality metadata. Finally, my 16-core, 64GB desktop machine was getting taxed to its limits.

I don't know how well this is going to perform on the Raspberry Pi Kubernetes cluster though. Simple image thumbnailing works just fine, but video transcoding on arm64? It's going to be interesting.

Anyway, it was a shittone of work, so I thought it worth posting about. Chances are, you won't see any of the videos unless you login since it's all of my kid anyway :-) If none of the above makes sense to you, don't feel bad. This is a very nerdy subject.

June 23, 2017 16:12 +0000  |  Django Python 0

I sunk 4 hours of my life into this problem yesterday so I thought I might post it here for future frustrated nerds like myself.

If you're using django-debreach and Django REST Framework, you're going to run into all kinds of headaches regarding CSRF. DRF will complain with CSRF Failed: CSRF token missing or incorrect. and if you're like me, you'll be pretty confused since I knew there was nothing wrong with the request. My token was being sent, but it appeared longer than it should be.

So here's what was happening and how I fixed it. Hopefully it'll be useful to others.

Django-debreach encrypts the csrf token, which is normally just fine because it does so as part of the chain of middleware layers in every request. However, DRF doesn't respect the csrf portion of that chain. Instead it sets csrf_exempt() on all of its views and then relies on SessionAuthentication to explicitly call CSRFCheck().process_view(). Normally this is ok, but with a not-yet-decrypted csrf token, this process will always fail.

So to fix it all, I had to implement my own authentication class and use that in all of my views. Basically all this does is override SessionAuthentication's enforce_csrf() to first decrypt the token:

class DebreachedSessionAuthentication(SessionAuthentication):

    def enforce_csrf(self, request):

        faux_req = {"POST": request.POST}

        CSRFCryptMiddleware().process_view(faux_req, None, (), {})
        request.POST["csrfmiddlewaretoken"] = faux_req["csrfmiddlewaretoken"]

        SessionAuthentication.enforce_csrf(self, request)

Of course, none of this is necessary if you're running Django 1.10+ and already have Breach attack protection, but if you're stuck on 1.8 (as we are for now) this is the best solution I could find.

September 17, 2015 18:42 +0000  |  Django Python 0

I ran into something annoying while working on my Tweetpile project the other day and it just happened to me today on Atlas. Sometimes, removing code can cause explosions with migrations -- even when they've already been run.

Example:

  • You've created a new class called MyClass.
  • It subclasses models.Model
  • It makes use of a handy mixin you wrote called MyMixin:

    class MyClass(MyMixin, models.Model):
        # stuff here
    
  • You create a migration for it, run it, commit your code and congratulate yourself on code well done.

  • Months later you come back and realise that the use of MyMixin was a terrible mistake, so you remove it.
  • Now migrations don't work anymore.

Here's what happened:

Creating a migration that's dependent on non-Django-core stuff to assemble the model (think mixins that add fields, or the use of custom fields etc.) means that migrations has to import those modules to run. This is a problem because every time you run manage.py migrate it loads all migration files into memory, and if those files are importing now-non-existent modules, everything breaks.

Solution:

It's an ugly one, but so far it's the only option I can figure: manually collapsing the migration stack. Basically you make sure you've run all of the migrations to date, then delete the offending classes, delete all of the migration files, and recreate a new empty migration:

$ cd /project/root/
$ ./manage.py migrate
$ rm -rf myapp/migrations/*
$ touch myapp/migrations/__init__.py
[ modify your code to remove the offending fields/mixins ]
$ ./manage makemigrations myapp

Now run this in your database:

DELETE FROM django_migrations WHERE app = 'myapp' AND name <> '0001_initial';
UPDATE django_migrations SET applied = NOW() where app = 'myapp';

The new single migration created won't be importing the removed classes, so everything will be ok, and you have the added benefit of not having so many migrations to import. Note however that this may cause problems with migrations from other apps that may have been created dependent on your now-deleted migrations, so this may start you down a rabbit-hole if you're unlucky.

I hope this helps someone in the future should this sort of thing present itself again.

October 04, 2010 01:41 +0000  |  Blogger Django Python Software 8

I haz a new site! I've been hacking at this for a few months now in my free time and it's finally in a position where I can replace the old one. Some of the features of the old site aren't here though, in fact this one is rather limited by comparison (no search, no snapshots, etc.) but the underlying code is the usual cleaner, better, faster, more extendable etc. so the site will grow beyond the old one eventually.

So, fun facts about this new version:

  • Written in Python, based on Django.
  • 317133 lines of code
  • Fun libraries used:
    • Flot (for the résumé skillset charts)
  • Neat stuff I added:
    • A new, hideous design!
    • A hierarchical tagging system
    • A custom image resizing library. I couldn't find a use for the other ones out there.
    • The Konami Code. Try it, it's fun :-)
  • Stuff that's coming:
    • Search
    • Mobile image upload (snapshots)
    • The image gallery will be up as soon as the shots are done uploading.

Anyway, if you feel so inclined, please poke around and look for problems. I'll fix them as soon as I can.

August 10, 2010 12:16 +0000  |  Blogger Django PHP Python 1

For those who have been demanding that I post something, anything, (*cough* Noreen *cough*) I apologise for the delay, but it won't be long now. I've been using all this time to write a new version of my site, done up in Python/Django. The next version will be a watered-down version of this one (on account of the complete rewrite) but will grow with time.

I may also decide to abandon all attempts at making it pretty... 'cause well... I suck at that :-)

January 03, 2010 12:07 +0000  |  Django Facebook Python Software TheChange.com Web Development 2

This is going to be a rather technical post, coupled with a smattering of rants about Facebook so those of you uninterested in such things might just wanna skip this one.

As part of my work on my new company, I'm building a syncroniser for status updates between Twitter, Facebook, and our site. Eventually, it'll probably include additional services like Flickr, but for now, I'm just focusing on these two external systems.

A Special Case

Reading this far, you might think that this isn't really all that difficult for either Twitter or Facebook. After all, both have rather well-documented and heavily used APIs for pushing and pulling data to and from a user's stream, so why bother writing about it? Well for those with my special requirements, I found that Facebook has constructed a tiny, private hell, one in which I was trapped for four days over the Christmas break. In an effort to save others from this pain, I'm posting my experiences here. If you have questions regarding this setup, or feel that I've missed something, feel free to comment here and I'll see what I can do for you.

So, lets start with my special requirements. The first stumbler was the fact that my project is using Python, something not officially supported by Facebook. Instead, they've left the job to the community which has produced two separate libraries with different interfaces and feature sets.

Second, I wasn't trying to syncronise the user streams. Instead, I needed push/pull rights for the stream on a Facebook Page, like those created for companies, politicians, famous people, or products. Facebook claims full support for this, but in reality it's quite obvious that these features have been crowbared into the overall design, leaving gaping holes in the integration path.

What Not to Do

  • Don't expect Facebook to do the right/smart thing. Everything in Facebookland can be done in one of 3 or 4 ways and none of them do exactly what you want. You must accept this.
  • Don't try to hack Facebook into submission. It doesn't work. Facebook isn't doing that thing that makes sense because they forgot or didn't care to do it in the first place. Accept it and deal. If you try to compose elaborate tricks to force Facebook's hand, you'll only burn 8 hours, forget to eat or sleep in the process and it still won't work.

What to Do

Step 1: Your basic Facebook App

If you don't know how to create and setup a basic canvas page in Django, this post is not for you. Go read up on that and come back when you're ready.

You need a simple app so for starters get yourself a standard "Hello World" canvas page that requires a login. You can probably do this in minifb, but PyFacebook makes this easy since it comes with handy Django method decorators:

# views.py
from django.http import HttpResponse, HttpResponseRedirect
import facebook

@facebook.djangofb.require_login()
def fbCanvas(request):
    return HttpResponse("Hello World")
Step 2: Ask the User to Grant Permissions

This will force the user to add your application before proceeding, which is all fine and good but that doesn't give you access to much of anything you want, so we'll change the view to use a template that asks the user to click on a link to continue:

# views.py
from django.shortcuts import render_to_response
from django.template import RequestContext
import facebook

@facebook.djangofb.require_login()
def fbCanvas(request):
    return render_to_response(
        "social/canvas.fbml",
        {},
        context_instance=RequestContext(request)
    )

Note what I mentioned above, that we're asking the user to click on a link rather than issuing a redirect. I fought with Facebook for a good few hours to get this to happen all without user-input and it worked... sometimes. My advice is to just go with the user-clickable link. That way seems fool-proof (so far).

Here's our template:

<!-- canvas.fbml -->
<fb:header>
    <p>To enable the syncronisation, you'll need to grant us permission to read/write to your Facebook stream.  To do that, just <a href="http://www.facebook.com/connect/prompt_permissions.php?api_key=de33669a10a4219daecf0436ce829a2e&v=1.0&next=http://apps.facebook.com/myappname/granted/%3fxxRESULTTOKENxx&display=popup&ext_perm=read_stream,publish_stream,offline_access&enable_profile_selector=1">click here</a>.
</fb:header>

See that big URL? It's option #5 (of 6) for granting extended permissions to a Facebook App for a user. It's the easiest to use and hasn't broken for me yet (Numbers 1, 2, 3 and 4 all regularly complained about silly things like not having the app instaled when this was not the case, but your milage may vary). Basically, the user will be directed to a page asking her to grant read_stream, publish_stream, and offline_access to your app on whichever pages or users she selects from the list of pages she administers. Details for modifying this URL can be found in the Facebook Developer Wiki.

Step 3: Understanding Facebook's Hackery

So you see how in the previous section, adding enable_profile_selector=1 to the URL will tell Facebook to ask the user to specify which pages to which she'd like to grant these shiny new permissions? Well that's nifty and all, but they don't tell you which pages the user selected.

When the permission questions are finished, Facebook does a POST to the URL specified in next=. The post will include a bunch of cool stuff, including the all important infinite session key and the user id doing all of this, but it doesn't tell you anything about the choices made. You don't even know what page ids were in the list, let alone which ones were selected to have what permissions. Nice job there Facebook.

Step 4: The Workaround

My workaround for this isn't pretty, and worse, depends on a reasonably intelligent end-user (not always a healthy assumption), but after four days cursing Facebook for their API crowbarring, I could come up with nothing better. Basically, when the user returns to us from the permissioning steps, we capture that infinite session id, do a lookup for a complete list of pages our user maintains and then bounce them out of Facebook back to our site to complete the process by asking them to tell us what they just told Facebook. I'll start with the page defined in next=:

# views.py
@facebook.djangofb.require_login()
def fbGranted(request):

    from cPickle import dumps as pickle
    from urllib  import quote as encode

    from myproject.myapp.models import FbGetPageLookup

    return render_to_response(
        "social/granted.fbml",
        {
            "redirect": "http://mysite.com/social/facebook/link/?session=%s&pages=%s" % (
                request.POST.get("fb_sig_session_key"),
                encode(pickle(FbGetPageLookup(request.facebook, request.POST["fb_sig_user"])))
            )
        },
        context_instance=RequestContext(request)
    )
# models.py
def FbGetPageLookup(fb, uid):
    return fb.fql.query("""
        SELECT
            page_id,
            name
        FROM
            page
        WHERE
            page_id IN (
                SELECT
                    page_id
                FROM
                    page_admin
                WHERE
                    uid = %s
            )
    """ % uid)

The above code will fetch a list of page ids from Facebok using FQL, and coupling it with the shiny new infinite session key, bounce the user out of Facebook and back to your site where you'll use that info to re-ask the user about which page(s) you want them to link to Facebook.

Step 5: Capture That page_id

How you capture and store the page id is up to you. For me, I had to create a list of organisations we're storing locally and let the user compare that list of organisations to the list of Facebook Pages and make the links appropriately. Your process will probably be different. Regardless of how you do it, just make sure that for every page you wish to syncronise with Facebook, you have a session_key and page_id.

Step 6: Push & Pull

Because connectivity with Facebook (and Twitter) is notonoriously flakey, I don't recommend doing your syncronisation in real-time unless your use-case demands it. Instead, run the code via cron, or better yet as a daemon operating on a queue depending on the amount of data you're playing with. However you do it, the calls are the same:

import facebook

# Setup your connection
fb = facebook.Facebook(settings.FACEBOOK_API_KEY, settings.FACEBOOK_SECRET_KEY)
infinitesessionkey = "your infinite session key from facebook"
pageid             = "the page id the user picked"

# To push to Facebook:
fb(
    method="stream_publish",
    args={
        "session_key": infinitesessionkey,
        "message":     message,
        "target_id":   "NULL",
        "uid":         pageid
    }
)

# To pull from Facebook:
fb(
    method="stream_get",
    args={
        "session_key": infinitesessionkey,
        "source_ids": pageid
    }
)["posts"]

Conclusion

And that's it. It looks pretty complicated, and... well it is. For the most part, Facebook's documentation is pretty thorough, it's just that certain features like this page_id thing appear to have fallen off their radar. I'm sure that they'll change it in a few months though, which will make my brain hurt again :-(

November 13, 2009 17:51 +0000  |  Programming Python Software 0

I wrote something like this some time ago, but this version is much better, if only because it's in python. Basically, it's a script that highlights standard input based on arguments passed to it.

But how is that useful? Well imagine that you've dumped the contents of a file to standard output, maybe even piped it through grep, and/or sed etc. Oftentimes you're still left with a lot of text and it's hard to find what you're looking for. If only there was a way to highlight arbitrary portions of the text with some colour...

Here's what you do:

$ cat somefile | highlight.py some strings

You'll be presented with the same body of text, but with the word "some" highlighted everywhere in light blue and "strings" highlighted in light green. The script can support up to nine arguments which will show up in different colours. I hope someone finds it useful.

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys,re

colours = [
    "\033[1;34m", # light blue
    "\033[1;32m", # light green
    "\033[1;36m", # light cyan
    "\033[1;31m", # light red
    "\033[1;33m", # yellow
    "\033[0;32m", # green
    "\033[0;36m", # cyan
    "\033[0;33m", # brown
    "\033[1;35m", # pink
    "\033[0m"     # none
]

args = sys.argv[1:]

# Strip out arguments exceeding the maximum
if len(args) > 9:
    print("\n%sWARNING: This script only allows for a maximum of 9 arguments.%s\n\n" % (colours[4], colours[9]), file=sys.stderr)
    args = args[0:8]

while True:
    line = sys.stdin.readline()
    colour = 0
    for arg in args:
        line = re.sub(
            r"(%s)" % (arg),
            "%s%s%s" % (colours[colour], "\g<1>", colours[9]),
            line
        )
        colour = colour + 1
    if line == '':
        break
    try:
        print(line.rstrip("\n"))
    except:
        pass

July 08, 2009 22:25 +0000  |  PHP Programming Python 0

I wrote something rather fun today and I thought that I'd share it here. It's a Python module that you can use to interact with PHP products. Specifically, it's a reproduction of PHP's http_build_query() and parse_ini_file() functions that act as PHP does according to PHP's own way of doing things.

This means that if you've written an API server (as we have) in PHP that makes use of things like the above, you can interact with it using Python as your scripting language with little effort.

Examples:

from php import parse_ini_file

config = parse_ini_file("/path/to/config.ini")
print config["sectionName"]["keyName"]

This would give you the value for keyName in the section called sectionName in your config.ini file.

from php import http_build_query

somedata = {
  "keyname": "valuename",
  "otherkey": 123,
  "anotherkey": [1,2,3,{"seven": "eight"}]
}
print http_build_query(somedata)

This would give you:

otherkey=123&keyname=valuename&anotherkey[1]=2&anotherkey[0]=1&anotherkey[3][seven]=eight&anotherkey[2]=3&

The code was fun to write, and I'm guessing that it'll be useful to others so I'm posting it here. If you do end up using it, lemme know by posting a comment here eh?

You can download it here: php.py.

When I mentioned this to some other coworkers, they pointed out that I'm not the only one trying to get some of PHP's odd functionality into Python. Another developer has mimicked PHP's serialize() functions in the form of a Python module. I wonder if there are any other cases where this kind of stuff might be useful.