Searching for Tao

Facebook Page Syncronisation

This is going to be a rather technical post, coupled with a smattering of rants about Facebook so those of you uninterested in such things might just wanna skip this one.

As part of my work on my new company, I'm building a syncroniser for status updates between Twitter, Facebook, and our site. Eventually, it'll probably include additional services like Flickr, but for now, I'm just focusing on these two external systems.

A Special Case

Reading this far, you might think that this isn't really all that difficult for either Twitter or Facebook. After all, both have rather well-documented and heavily used APIs for pushing and pulling data to and from a user's stream, so why bother writing about it? Well for those with my special requirements, I found that Facebook has constructed a tiny, private hell, one in which I was trapped for four days over the Christmas break. In an effort to save others from this pain, I'm posting my experiences here. If you have questions regarding this setup, or feel that I've missed something, feel free to comment here and I'll see what I can do for you.

So, lets start with my special requirements. The first stumbler was the fact that my project is using Python, something not officially supported by Facebook. Instead, they've left the job to the community which has produced two separate libraries with different interfaces and feature sets.

Second, I wasn't trying to syncronise the user streams. Instead, I needed push/pull rights for the stream on a Facebook Page, like those created for companies, politicians, famous people, or products. Facebook claims full support for this, but in reality it's quite obvious that these features have been crowbared into the overall design, leaving gaping holes in the integration path.

What Not to Do

  • Don't expect Facebook to do the right/smart thing. Everything in Facebookland can be done in one of 3 or 4 ways and none of them do exactly what you want. You must accept this.
  • Don't try to hack Facebook into submission. It doesn't work. Facebook isn't doing that thing that makes sense because they forgot or didn't care to do it in the first place. Accept it and deal. If you try to compose elaborate tricks to force Facebook's hand, you'll only burn 8 hours, forget to eat or sleep in the process and it still won't work.

What to Do

Step 1: Your basic Facebook App

If you don't know how to create and setup a basic canvas page in Django, this post is not for you. Go read up on that and come back when you're ready.

You need a simple app so for starters get yourself a standard "Hello World" canvas page that requires a login. You can probably do this in minifb, but PyFacebook makes this easy since it comes with handy Django method decorators:

# views.py
from django.http import HttpResponse, HttpResponseRedirect
import facebook

@facebook.djangofb.require_login()
def fbCanvas(request):
    return HttpResponse("Hello World")
Step 2: Ask the User to Grant Permissions

This will force the user to add your application before proceeding, which is all fine and good but that doesn't give you access to much of anything you want, so we'll change the view to use a template that asks the user to click on a link to continue:

# views.py
from django.shortcuts import render_to_response
from django.template import RequestContext
import facebook

@facebook.djangofb.require_login()
def fbCanvas(request):
    return render_to_response(
        "social/canvas.fbml",
        {},
        context_instance=RequestContext(request)
    )

Note what I mentioned above, that we're asking the user to click on a link rather than issuing a redirect. I fought with Facebook for a good few hours to get this to happen all without user-input and it worked... sometimes. My advice is to just go with the user-clickable link. That way seems fool-proof (so far).

Here's our template:

<!-- canvas.fbml -->
<fb:header>
    <p>To enable the syncronisation, you'll need to grant us permission to read/write to your Facebook stream.  To do that, just <a href="http://www.facebook.com/connect/prompt_permissions.php?api_key=de33669a10a4219daecf0436ce829a2e&v=1.0&next=http://apps.facebook.com/myappname/granted/%3fxxRESULTTOKENxx&display=popup&ext_perm=read_stream,publish_stream,offline_access&enable_profile_selector=1">click here</a>.
</fb:header>

See that big URL? It's option #5 (of 6) for granting extended permissions to a Facebook App for a user. It's the easiest to use and hasn't broken for me yet (Numbers 1, 2, 3 and 4 all regularly complained about silly things like not having the app instaled when this was not the case, but your milage may vary). Basically, the user will be directed to a page asking her to grant read_stream, publish_stream, and offline_access to your app on whichever pages or users she selects from the list of pages she administers. Details for modifying this URL can be found in the Facebook Developer Wiki.

Step 3: Understanding Facebook's Hackery

So you see how in the previous section, adding enable_profile_selector=1 to the URL will tell Facebook to ask the user to specify which pages to which she'd like to grant these shiny new permissions? Well that's nifty and all, but they don't tell you which pages the user selected.

When the permission questions are finished, Facebook does a POST to the URL specified in next=. The post will include a bunch of cool stuff, including the all important infinite session key and the user id doing all of this, but it doesn't tell you anything about the choices made. You don't even know what page ids were in the list, let alone which ones were selected to have what permissions. Nice job there Facebook.

Step 4: The Workaround

My workaround for this isn't pretty, and worse, depends on a reasonably intelligent end-user (not always a healthy assumption), but after four days cursing Facebook for their API crowbarring, I could come up with nothing better. Basically, when the user returns to us from the permissioning steps, we capture that infinite session id, do a lookup for a complete list of pages our user maintains and then bounce them out of Facebook back to our site to complete the process by asking them to tell us what they just told Facebook. I'll start with the page defined in next=:

# views.py
@facebook.djangofb.require_login()
def fbGranted(request):

    from cPickle import dumps as pickle
    from urllib  import quote as encode

    from myproject.myapp.models import FbGetPageLookup

    return render_to_response(
        "social/granted.fbml",
        {
            "redirect": "http://mysite.com/social/facebook/link/?session=%s&pages=%s" % (
                request.POST.get("fb_sig_session_key"),
                encode(pickle(FbGetPageLookup(request.facebook, request.POST["fb_sig_user"])))
            )
        },
        context_instance=RequestContext(request)
    )
# models.py
def FbGetPageLookup(fb, uid):
    return fb.fql.query("""
        SELECT
            page_id,
            name
        FROM
            page
        WHERE
            page_id IN (
                SELECT
                    page_id
                FROM
                    page_admin
                WHERE
                    uid = %s
            )
    """ % uid)

The above code will fetch a list of page ids from Facebok using FQL, and coupling it with the shiny new infinite session key, bounce the user out of Facebook and back to your site where you'll use that info to re-ask the user about which page(s) you want them to link to Facebook.

Step 5: Capture That page_id

How you capture and store the page id is up to you. For me, I had to create a list of organisations we're storing locally and let the user compare that list of organisations to the list of Facebook Pages and make the links appropriately. Your process will probably be different. Regardless of how you do it, just make sure that for every page you wish to syncronise with Facebook, you have a session_key and page_id.

Step 6: Push & Pull

Because connectivity with Facebook (and Twitter) is notonoriously flakey, I don't recommend doing your syncronisation in real-time unless your use-case demands it. Instead, run the code via cron, or better yet as a daemon operating on a queue depending on the amount of data you're playing with. However you do it, the calls are the same:

import facebook

# Setup your connection
fb = facebook.Facebook(settings.FACEBOOK_API_KEY, settings.FACEBOOK_SECRET_KEY)
infinitesessionkey = "your infinite session key from facebook"
pageid             = "the page id the user picked"

# To push to Facebook:
fb(
    method="stream_publish",
    args={
        "session_key": infinitesessionkey,
        "message":     message,
        "target_id":   "NULL",
        "uid":         pageid
    }
)

# To pull from Facebook:
fb(
    method="stream_get",
    args={
        "session_key": infinitesessionkey,
        "source_ids": pageid
    }
)["posts"]

Conclusion

And that's it. It looks pretty complicated, and... well it is. For the most part, Facebook's documentation is pretty thorough, it's just that certain features like this page_id thing appear to have fallen off their radar. I'm sure that they'll change it in a few months though, which will make my brain hurt again :-(

Google Indexing According to Me

I know that I'm in Korea and I "should" be out seeing the sites, but I have to explain that the primary reason for my visit here was less to see Seoul and more to see Shawna and just... relax. Since Shawna works during the day, I took the morning and after noon off to just do nothing yesterday and today I'm catching up on my crazy-sized email backlog. I'll be going out around 11am though with a friend of Shawna's to do some exploring and pick up a temporary phone.

For the moment though, I just wrote a rather long email to my uncle to help him with his Google ranking and figured that since this was the second time I've had to go through all of this with someone, that it might be a good idea to post it all here for future reference. If you think that I've missed anything, please let me know and I'll update.

Google bases your page rank on a few things: linkage, content, and formatting. I believe that it's even in that order. I'll explain one at a time.

Linkage

The number of links to your site and the ranking of the origin sites. So for example if "Bob's blog" links to you, that link is worth significantly less than if it were from Amazon.com or Slate etc. More links is better, and Google will even attribute the content of the origin site to your own. In other words, if a site about Pizza links to you, Google will assume that you have something to do with Pizza. So the best links to get are things *within your field* rather than from anywhere lest you run the risk of diluting your rank with non-relevant rankings.

Content

This is the easiest, but a lot of people miss it. First of all, so-called "rich media" isn't recognised by Google (and pretty much all other search engines too). Flash, Youtube, Silverlight etc. won't get read by Google so don't make your site dependent on such formats. Instead, lots of relevant content with links to other sites and proper use of keywords with which you want to be found.

For example, on my dad's site, he wanted to be found with the keyword "optical" but we never once used it on his site. Instead, we used "optician". As a result, he was #1 for "optician Kelowna" but had no mention for "optical".

It's also important to note that grammar is important. You can't just fill up the page with abnormal uses of keywords you for which want to be indexed. Google pays very smart people a lot of money to write code that will recognise poor-grammar-as-planted-keywords so don't mess with a good thing. The truth of it is that if you have a good site with relevant content, people will find you, link to you and your rank will improve over time.

Format

Back when I was in school we were taught that the format of your code was relevant to your search ranking. I'm not sure of how true this is anymore but it's a good practise nonetheless. Do put headers in header tags (<h1>..<h6>), put text in the alt="" portion of your <img> tags and don't try to screw with them by putting a bunch of keywords in a text block and then hide it by making the text the same colour as the background or by hiding the box altogether. They hate that and their scripts catch you, you risk being delisted.

Lastly, a handy thing to do is to install Google Analytics. It will do fun stuff like track page hits by hour, week, and month as well as give you country of origin stats, search engine references etc. It's awesome and it's free (as in beer, not Freedom).

Random Favourites

Most Commented

Tags

Activism Advertising Agriculture Amsterdam Anarchy Animals Anime Appnovation Art Atheism Blasphemy Bloc Québécois Blogger British Columbia Broadway Canada Capitalism Career CBC CCTV Charity Christians Chrystal Cities Civil Rights Climate Change Coalition Code Snippets Communism Conservatives Consumerism Copyright Corporations Costumes Creative Commons Culture Cycling Death Democracy Diplomacy Django Dreams Dream Vancouver Drupal Economy Emily-Jane Energy Environment Ethics Facebook Family Food Free Software Friends Fun Stuff Gentoo Linux George Bush Germany Graffiti Green Party Hacking Health Health Care Homelessness Ideas Imager Iraq Israel Italy Japan Javascript Job Hunting KDE Korea Language Learning Liberals Linux Maps Marketing Media Melanie Memes Moments In Time Money Movies Moving Multiculturalism Munich Municipal Collective My Future Nationalism NDP Netherlands Net Neutrality New Mind Space Noreen Nuclear Olympics Oxyor/Marketsims Passing Thoughts Patents Perl Photography PHP Police Politics Prejudice Primus Privacy Programming Protests Provincial Campaign 2009 Public Space Published Python Racism Recipes Reinvent Religion Riptown 'Round-the-World Rydium Scams Science and Nature Scrubby Seattle Self Development Self Reflection Sex Socialism Software Solitude Sovereignty SSH Star Trek Stephanie Stephen Harper Street Furniture Stress Stupid People STV Suburbia Susan Switzerland Technology TED Television Terrorism The Arts TheChange.com The Economy The Toronto Public Space Committee The United States Toronto Transit Translink Travel Twitter Unemployment Urban Design Utrecht Vancouver Vancouver Public Space Network Violence War Weather Web Development Who Am I Wikipedia Windows Women Wordpress Work [at] Play Writing

Twitter Feed

Support Wikipedia