Monday, September 5, 2011

App Engine's place as a developer playground

The Google App Engine developer community is a hot mess this week over the new pricing plan for the platform. And for good reason. Many developers are seeing their hosting expenses going up by as much as 500%.


If you're looking for a post that is trashing the App Engine team, you can move along. You won't find it here. These guys are smart and considerate. If you spend any time interacting with them on StackOverflow, email or in person at Google IO, you understand this. In fact, just using the platform for a project you can appreciate their outlook and passion for their product and users. That's not to say they don't have room to improve. But enough with the negativity already!


Effects of new pricing on my projects


There have been a lot of people posting about their apps and revealing the effects of the new pricing on them. I wanted to do the same as a reference point. Note that my use of App Engine has primarily been for personal projects. Some have web front-ends, some have SMS interfaces, some are just based on background tasks and others come and go while I experiment with ideas or calendar events. I still think most of these experiments are well suited for App Engine, but I need to take a hard look at the more successful apps to figure out a long-term strategy because they are not scaling well with the new pricing plan.


I'll share two examples - both philanthropic projects - comparing the effects of the new pricing.


Astronomy Picture of the Day


This app had originally been written in Perl as a grad student and was hosted at the University of Wisconsin. I decided to port the application as the vehicle for learning App Engine and Python so it was the first app I ever wrote on the platform. It's primarily a background app. Every afternoon it runs a job that scrapes the contents of the APOD site and packages it into an email and sends it off to all subscribers. There's a simple web frontend that lets anyone sign up. There are currently 1900+ subscribers.


The app is free to run on the platform today and will cost $0.19/day - or $0.03 per user per year - after the price changes. 100% of those costs can be attributed to the use of the Mail API.


My only complaint about this app is that the change seems extreme. Going from 2,000 free emails to 100 feels like an attempt to curb the spamming community. And for the charity projects like this, all of the good net citizens are the losers with this change.


SMSMyBus


This app was originally built to provide a better interface for the Madison Metro bus service. It provides real-time arrival times for buses via SMS, chat, email and phone. But then it blossomed into a full-featured API for the Metro for other developers.


The app costs $0.01/day to operate today (excluding the SMS interface). It is estimated to cost $6.79/day after the price change. $2,478/year. Yah. That's a whopping 67,800% increase. Shebang.


The root cause of essentially all the cost can be attributed to the main API call that returns arrival times at a particular bus stop - getarrivals - and some of the clients call this repeatedly (like every two minutes). It is also where the confusion starts for me with respect to the new pricing.


Frontend instance hours


Frontend instance hours is projected to be $5.68/day, 84% of the bill. This represents the platform's transition from billing for CPU usage to billing for the contention of instance usage. I get it that they need to do this. They were using the wrong resource metric for monitoring before. 


But how do I go from a $0.00 cost for resource consumption to $5.68/day?!? That kind of increment just feels insane. How about $0 to $0.50? Or $0 to $1?


Datastore writes


Datastore writes is projected to be $1.00/day, 14% of the bill. This is harder for me to resolve for a couple of reasons. First, I can't find any cost under the current pricing plan for these operations even though the app's profile is fairly consistent. So I struggle, conceptually, with how this has suddenly become an issue for the app.


Second, $1/day equates to 1M writes/day in the datastore and I simply can't figure out where all of those writes are coming from. My back of the napkin math shows 40,000 writes. I'm totally baffled by this projection. 


The rest of it


The rest of the projected cost is a combination of storage and datastore read operations. I can eliminate the former if I simply store less data I wanted to use for analytics. It saves me money, but in the end, ignoring some of the data hurts the developers that use the API.


Optimizing


Now it's my job to go in and take another stab at optimizing the code and start with the getarrivals API call. I thought I had good habits with this so I was a little embarrassed when I found an obvious hole in the query path for route listings. There's a fairly repetitive query that was not being memcached - oops! Now fixed.


The second thing I'm experimenting with is the application's instance configuration. By default, I was letting the platform's scheduler determine my load patterns and create new instances whenever necessary. But I've made two changes. First, I took the scheduler out of 'auto' mode and set the maximum number of idle instances to one, and I've cranked up the minimum latency for the pending request queue to 250ms. In theory, each of these changes should drive the cost down because I should be using less frontend instance time throughout the day.


Let's see what happens! As I do my part with optimization, I'd like to see the App Engine do their part and move to the middle as well. :)


What to do next


I'm guessing that the App Engine platform simply priced things wrong the first time. I think the concept of platform as a service that exploits existing Google infrastructure was a smart, but geeky idea that was poorly modeled or had bad assumptions about its use/abuse. Ironically, the idea didn't scale well and they've been forced to admit that early assumptions on how to price it were just wrong.


The good part about this move... developers are forced to take a deep dive into optimization. Something i've written about before and have been doing again since the clock started on the pricing changes. This not only makes for a better platform for Google and project sharing the resources, but it makes for a better net as a whole. Faster is better.


The bad part... 



  • Developers will be forced to dead pool worthy projects that don't have a business model. 

  • Developers will be forced to port apps to other platforms. That could be a painful pill to swallow for developers when they aren't money making projects

  • Developers may be sacrificing analytics to avoid datastore bloat and access charges.


What the App Engine team should do about it



  • Provide better pricing structures for philanthropic and open source projects. App Engine is a great platform for these things and it provides a great playground for developers to support important projects at a low cost while also learning about a platform they can adopt for larger, commercial project down the road. They've hinted at this but will they do it? - http://code.google.com/appengine/kb/postpreviewpricing.html#special_programs_...

  • Provide more runway for optimization. A couple of weeks to get the sleeves rolled up and optimize their apps just isn't enough time.

  • Provide better analysis tools to highlight problems

  • Take baby steps. Must they really take these giant leaps in pricing?

  • Roll out Python 2.7 to support concurrent requests in Python projects.


In the process of writing this post I found some great resources...



 


 


 

9 comments:

  1. I am seeing the write issue as well and asked some people in the AppEngine IRC room what the problem might be and, the answer is very simple, each time you write an entity to the datastore it also makes 1 write for each indexed property to the index. So if you have 20 properties that are indexed on your entity a single put will result in 21 writes. I was very suprised by this myself, then I looked at my model classes and saw that 90% of my properties didn't need to be indexed, I added a indexed=False keyword parameter to those properties, and I am hoping in a couple of days when the billing catches up to the changes that my problem goes away. Hope this helps you.

    ReplyDelete
  2. We are also seeing a massive amount of extra datastore write operations (375M / day!) and are struggling to figure out where they are coming from.We have been adding indexed=False all over, but they are not having as much impact as we would have expected.Google needs to provide more tooling and visibility. E.g., today each request shows the cpu_ms consumed; it should also show datastore write ops, datastore read ops, etc. Appstats should be extended to break this down even further.Between lack of visibility and a 3-5 day lag on billing (which has the only metrics currently), it is very difficult to tune a large application.

    ReplyDelete
  3. Hi Greg, be aware that the Python 2.7 support will just be for HRD only. That means that if your applications are still on M/S, you will not have this option without a not trivial data migration.

    ReplyDelete
  4. @michael and @jason - thank you for this. that makes sense and now that you mention it, i remember reading that before.however, adding index=false is an instance-level setting correct? so this only effects future datastore writes.

    ReplyDelete
  5. @systempuntoout thanks for pointing that out. i'm scheming a way to migrate a portion of my app to HRD that would be less painful

    ReplyDelete
  6. @greg - you set indexed=False on your model which will offer datastore write operations savings on those entities going forward. Note that your indexes still exist for entities that were put in the past, which will count against your storage and the entities will also come back in queries should you query against that attribute.To fully address this, you'd need to use MapReduce or something to get/put every entity. The same applies if you ever need to turn indexing back on.

    ReplyDelete
  7. And this just in from Alfred Fuller here http://groups.google.com/group/google-appengine/browse_thread/thread/9a39d2a9... (I'm still not sure this accounts for our large number of write operations):A datastore write op != an entity put/delete. It is actually (entity + |index deltas|). For example if you have a Kind with 3 indexed properties and 1 composite index the number of write ops to put a new entity would be: 1 entity + 1 kind index + (1 ascending + 1 descending) * 3 indexed properties + 1 composite index value = 9 write ops if you change a single property on an existing entity it will be: 1 entity + (2 ascending changes + 2 descending changes) * 1 indexed property changed + 2 composite index values changed = 7 write ops (the change must remove the old value and add the new value for each index) deleting the entity will cost the same a creating it (9 write ops)

    ReplyDelete
  8. Could you describe the traffic that produces the $5.68/day bill for SMSMyBus? How expensive are your requests, whats your max qps, etc. That just seems improbably high to me. A good VPS will run you $20/month and I'd be impressed if you used up one of those. Are there any profiling tools available?

    ReplyDelete
  9. Yes. It does seem improbably high doesn't it. It just goes to show how high they hiked the prices. As I noted, the bulk of that cost came from the fact that the App Engine instance scheduler was spinning up as many as four instances for the app. In the new pricing model, you pay for all of that time those instances are running wether or not your QPS gets high. I've already gotten a lot of that time back with two things. First, I'm caching the crap out of the stop and route queries. Second, I removed the App Engine instance magic and just set the max idle count to one. The downside is that some calls can get some sub-optimal latencies. (They really need to role out support for a version of Python that supports threading) I have one more big refactor to do for one of the code paths that should eliminate the majority of my data store writes...

    ReplyDelete