Sunday, November 6, 2011

Revisiting Google App Engine's pricing changes

This post revisits my earlier evaluation of Google App Engine's post-preview pricing changes and how it affected my project, SMSMyBus. As I noted, the app was projected to cost between $6 and $7 dollars per day under the new platform pricing.

Since that post, I’ve been rolling out small, incremental changes to optimize the code and combat all of the known issues. I’m thrilled to report that I have the price down to $0/day. And I’m once again impressed by the snappy and reliable App Engine platform.

Looking back on the changes, I can say that I was doing some bad things, some abusive things, and App Engine was making some bad choices as well. But the end result proves that if its developers optimize and do smart things, they are rewarded. App Engine remains a great solution for my transit API service.

Here’s a history of the changes over the last two months that got me down to $0/day...

Platform Configuration

1. Instance allocation
The new pricing model charges applications based on their use of instances (hardware resources where your application is running) rather than CPU utilization. A key to keeping your instance cost down is to simply reduce the number of instances that are spinning. Duh. So I grabbed the instance slider in the application settings and yanked it to the left. This doesn't prevent scaling, it just limits my billing for normal traffic flow.

2. Delete data
App Engine data storage (for your database) costs $0.008/GByte-day. Doesn’t sound too expensive, but I had been storing every single API call I had ever gotten. I thought it would be useful for API developers and for analytics. My drive to $0 outweighed that, however, so I deleted all of the history data and got under the free quota for storage.

Application Configuration

3. Memcached the application's route listings
I was surprised to find that I wasn’t doing this already, but there it was. I have a data structure that maps bus routes and bus stops to scheduling data on the Metro website and it never changes. In some cases - like the static calls from the kiosk clients - I was looking up route listing details in the datastore once every minute!! Fail. I used memcache to keep the common queries in memory and avoid the extra datastore reads.

4. Limit access during off hours
One thing that never changes is when the Metro service is running. There are five+ hours a day where the buses aren’t on the street. But some clients are still asking for data. I stubbed out most of the API during these off hours before the code ever gets close to making a datastore or memcache call.

These four changes brought me down to $0.70 per day. Bam!

Algorithm Changes

5. Asynchronous screen grabs
If you don’t know, behind the API curtain is an ugly screen scraping task that extracts the arrival estimates from the Metro website. So when a client requests arrival data for a stop, the app goes off and requests multiple web pages, machine-reads the information and aggregates all of the results.

The original implementation of the SMS interface did this by creating multiple tasks (one for each route traveling through the respective stop). When a task ran, it stored the results in the datastore. An aggregator task would read those results out of the datastore and piece together the response to the caller.

When the API was created, I couldn’t use background tasks because I had to respond with results in the same HTTP context. That’s when I discovered the great feature, asynchronous url fetch. This essentially let me grab all of the different Metro web pages at the same time. But when I implemented this, I continued to use the datastore as the mechanism for storing and retrieving results. This was just lazy. Under the old pricing, I wasn’t incented to change it other then the fact that it was a bit slow.
Under the new pricing model, this solution was very expensive. The API is continuously running this aggregation algorithm - constantly writing and reading to the datastore for model instances that have a lifespan of under a minute!

I rolled out a change that removed the use of the datastore and instead sorted the aggregated results in memory. This had a dramatic effect on my API quota for datastore reads and writes as well as overall performance and latency for my users. Especially the write operations, where you get penalized by an order of magnitude for this type of behavior because index updates work against your API quota as well.

6. Dogfood
After optimizing the API, I realized that the original SMSMyBus apps (SMS, chat, email and phone interfaces for the Metro) were now the long pole. Those apps were implemented before the API existed so they weren’t benefiting from the API optimizations. Solution... re-implement to use the SMSMyBus API.

It should have been done long ago simply as a validation exercise of the API methods. Credit to the eligence and simplicity of the API - this port was simple and only took a couple of hours.

These two changes brought me down to $0.10/day. Badda-bing.

AppStats

7. Run Appstats on all application interfaces
The last stop on the optimization train was Appstats. A truly great tool in the App Engine toolbox. In just a matter of minutes, you can find the hidden datastore operations that are dragging you down. In my case, it led me to one area that wasn’t being memcached at all. And it revealed an area that was simply using the memcache incorrectly! Love this tool...

This change brought me down $0.00/day. Winning.

Results
App Engine remains a great platform for developers that don’t abuse it and take the time to optimize their applications.

The SMSMyBus API now serves over 6,000 transit requests per day. It’s fast, reliable and flat out fun to use. I’m as proud as ever that I brought this to Madison.

Next step... find a way to fund my SMS users. :)

18 comments:

  1. Just get a VPS, the support alone is worth $20/mo. And when prices go up there's 100 other VPS to choose from. With $GOOG the knowledge gained in this post is down the toilet once you change providers. You guys are supposed to be smart, but most of yall n00bs!

    ReplyDelete
  2. Can you explain a bit more about "So I grabbed the instance slider in the application settings and yanked it to the left." ? What is the tradeoff this makes?

    ReplyDelete
  3. @PJ Brunet : i use a VPS (linode) for alomst all my web-services but at some point i do wish it was a resilient to attacks and over load as google app engine

    ReplyDelete
  4. Very cool. We're seeing something similar, although not really able to push memcache as hard as you can. Watching the thing scale to hundreds of instances under heavy load and then falling off all on it's own is great. Ahhhh. Relief.PJ - have fun with your VPS mate - stick with what you know, it's for the best for everyone

    ReplyDelete
  5. This is very interesting. I am going trough a very similar process and this post is very useful, thanks.Could you please explain a bit more your 3rd point "3. Memcached the application's route listings" ?Thanks!

    ReplyDelete
  6. Since you limited your instance count to 1 would it be possible to replace some use of memcache by an hashtable kept in the instance RAM?I guess that depends how often that instance goes away and how much RAM you're allowed to use.

    ReplyDelete
  7. Thanks for the hints. Also very motivating since I struggle with the same problem trying to get down with server costs. I got the following additional ideas so far:- moving image caching to the website as base64 in localStorage- moving server logic to the website, using the app engine only for data storageI try to avoid new instances being brought up due to too many server calls.

    ReplyDelete
  8. How about using Google Voice for SMS?

    ReplyDelete
  9. I also spent many time on App Engine like you before, my suggestion is go aways as soon as you can...app engine is just a toy when compared to Ec2

    ReplyDelete
  10. @hupp app engine has a nice tuning parameter that lets you control the minimum number of warm instances (where you code is loaded and ready to go for bursts of traffic). but you pay for each of those instance hours. if you configure this to be one (1), the downside is higher latency on traffic that forces app engine to spin up new instances.@jose that change is application specific. i have a model in my datastore that maps bus routes and bus stops to scheduling data. it never changes so there is no reason to query it again and again when the same request comes in. Caching the query results using memcache allow my app to avoid thousands of datastore reads. I'll clarify this in the original post.@ajasmin i think this is possible although without much reliability. i don't think there are any guarantees that a single instance is *the* instance that will stay up. just because the setting is one (1), it doesn't mean it's the only instance. as far as i know, there is no guarantee which instances GAE takes away when traffic slows. besides, memcache is so easy, and at this scale, free.

    ReplyDelete
  11. @jackson i don't believe google voice has an API for sending and receiving SMS. did i miss something?

    ReplyDelete
  12. @Greg there are unofficial ones. I have been using a ruby library to send from my startup's GV number and haven't had any problems so far.I have been doing it for a few months, but it is pretty low volume (maybe sending 250 msgs a day).

    ReplyDelete
  13. Lots of great tips here, thanks for writing this up and sharing! I've posted a link to it on CoderBuddy.

    ReplyDelete
  14. Google AppEngine with the dash board / app stats / build-in-services is perfect for developer and it was my last years performance experiment. With a simple service and some caching improments I increased the capacity of the free-to-use app engine from 50.000 up to 500.000 requests per day! http://united-coders.com/christian-harms/3-caching-steps-to-boost-your-webser...

    ReplyDelete
  15. I have been trying a lot of things to reduce the instance use since the app engine went out of preview, use of memcache helps a lot, I also removed a lot of urlfetch in python and moved it to javascript at client side, this helped a lot too. Thanks for the appstats info, did not know about it. I have reduced from $4/day to $1/day now, My site http://gramfeed.com gets about 90,000 requests, is it a good idea to limit no. of instances to 1, i have it at auto, it goes up to 5 at times?

    ReplyDelete
  16. The instance count depends a lot on your QPS and your user's tolerance for latency. You could limit the instances and also set the latency slider to limit the wait time for users. I am not a fan of the auto mode because it seems to be totally insensitive to cost. I recommend experimenting. Maybe you don't go all the way down to one, but instead experiment with two?

    ReplyDelete
  17. @GregTracy do u happen to know if I set it to 2 - "will always use 2" or "will it use 1 and max 2 in case of traffic" ?

    ReplyDelete
  18. The slider controls the number of *idle* instances - the number of warm instances with your code already running - ready to take new requests. If you have billing enabled, app engine will keep spinning up new instances to manage bursts of traffic.

    ReplyDelete