Sunday, November 18, 2012

Goodbye Posterous. Hello Blogger.

I've been hosting my blog on Posterous since I started writing in 2008. Earlier in the year, they sold the business to Twitter with no guarantees that the service would be around long term.

I kept waiting for them to offer a tool to extract/port content and it hasn't arrived. So I decided to scratch an itch and do it myself. I ported my blog to Blogger this weekend and here's how I did it.

Problem scope

The blog content is the easy(ish) part, but if you want to preserve the links to the original content, it can be tricky. Depending on your search engine and social media traffic, those thinks can be valuable.

First, every blogging platform uses its own slug path structure. Second, if your blog is in a subdomain, you have to find a way to preserve that subdomain with the addresses on the new blogging platform. For example, the parking API post has two different URIs.

Posterous :

www.gregtracy.com/adding-parking-to-the-madison-api-homebrew

Blogger :

blog.gregtracy.com/2012/01/adding-parking-to-madison-api-homebrew.html

Migrating content

Content migration is a tedious process. But the current option for getting Posterous content to Blogger is well documented on the web.
  1. Import Posterous content into Wordpress 
  2. Export Wordpress to an XML file
  3. Use wordpress2blogger to transform the XML file for Blogger import
  4. Import XML file into Blogger
This flow is documented in detail here. Lifehacker has also documented a process for migrating content using the auto-post feature inside Posterous, but it is tedious.

It doesn't necessarily end there, however. There are two glaring problems that make this process linger on forever. First, the content is poorly formatted so you'll find yourself editing posts in Blogger to clean up formatting. Second, multimedia sometimes doesn't import correctly or references files on Posterous or Wordpress so you need to upload that content to Blogger.

Someone will eventually build a migrator by mashing up the Posterous API with the Blogger API to make this all go away. In a different life, I might have taken the time to do it. But until someone does, the content migration is a little messy.

Building the redirector

This was the piece I was most interested in because it's simple and fun to build. 
  1. Use the Posterous API to get a list of all post URI slugs and their post dates
  2. Construct a slug map for Posterous URIs to Blogger URIs utilizing the dates and slugs
  3. Build an app that takes any Posterous URI slug and redirects the page to the Blogger URI
  4. Setup DNS so your old blog points to your app

Posterous listing

I wrote the following nodejs app to grab the full list of public posts from Posterous.


Blogger listing

If you can get access to the Blogger API, which requires a special request to the Blogger team, you can do the same thing I did with Posterous. The challenge is finding the ways Blogger changes slugs. For example, they remove "the" and "a" from most post slugs. 

Finding these differences is tedious so I actually recommend making sure you get the mapping correct for your top ten blog posts and then just crowd source the rest. Keep track of the page misses with logging (see below for my solution) and then update the map as you find mistakes. 

In the end, the map looks something like this:

{ 'awesome-post-title-wins : { date : '/2012/11/' , blogger_slug : 'awesome-post-title' }

With those details, you can get from any legacy slug - /awesome-post-title-wins - to any Blogger URI - /2012/11/awesome-post-title

Redirector application

I chose to use Google App Engine to host my redirector, but you could use anything. The logic is very simple. Just lookup the inbound request in your slug map and redirect to the Blogger URI. It looks a lot like the following.


My full implementation, including the slug map can be found on github. My implementation also includes a miss tracker which is what I've used for understanding which redirects are failing. I'm persisting the routes and a counter for the number of occurrences to understand what's missing. I can find all of the misses using the datastore viewer in the App Engine dashboard.

The incidentals

There are a couple of miscellaneous resources that you'll need to add to your map file as well. 
  • RSS feed
  • favicon.ico
  • robots.txt
  • Apple touch icons
  • Posterous "tags" (Blogger "labels")
All of those can be redirected, but creating static files for your favicon and apple touch icons allow you to personalize that content better on your new domain.

Domain mapping

Now that you have an application that redirects content to Blogger, you need to map that application to your old domain. In my case, I updated the DNS for www.gregtracy.com to point to my app. The combination of App Engine and Google Apps makes this very easy.

Inside your App Engine dashboard, select "Application Settings" and use "Add Domain" to map the application to your subdomain. These steps walk you all the way through DNS setup.

That's it. What details did I miss?  http://blog.gregtracy.com

2 comments:

  1. Nice work Greg! The transition explains why Google Reader suddenly re-listed a number of your posts (perhaps a dozen or so).

    ReplyDelete
  2. The only problem I've found with both methods I've encountered is that the images themselves either stay on wordpress.com or (in the case where you autopublish to Blogger through Posterous) they stay on the Amazon CDN which will be no more shortly. There's no real way to do this.

    ReplyDelete