Can you trust Google Play statistics?

Last week I did a promotion for my Vim book on the Amazon Kindle publishing platform. It went very well (more on that tomorrow) and it reminded me how well Amazon is prepared to handle both sales and sales reporting.

That Amazon employs some of the best software architects, developers, and admins can be seen not only in their extensive AWS catalogue, but also in the Amazon Kindle Direct Publishing control panel. When someone buys my book using their Kindle account, the sales count is updated within a few minutes allowing me to monitor the success of my promotions. That is exactly what I expect from a company that brought cloud computing to the startup masses and moved their own backend to their own cloud.

Similar levels of service can be reasonably expected from Google Play. After all, Google is another well-managed cloud. Or so we tend to think. Until a few months ago, I did not watch the application installations statistics very closely, as it was somebody else’s job and all I got was the total count (the clients had the raw data), but ever since I decided to finally embark on a serious Android-based project myself, I got to watch the stats more closely. And I noticed something I cannot understand. Let me explain.

First of all, I do not understand why Google cannot show me the number of installations in real time or in a slightly-delayed continuous fashion like Amazon can? This should be doable given the intellectual power and the infrastructure available to Google. So, why are those stats published once a day?

SIM Info deltas

Second, I do not understand the differences in the installations statistics reported by Google itself. Let me use my simple utility, SIM Info, as an example. The stats provided by Google contain, among other things, the total number of users who have installed my app. In theory, the difference between the total number of users who have installed my app by the end of August 20 and the total number of users who have installed my app by the end of August 21 ought to be equal to the number of the users who have installed my app on August 21.

SIM Info installations stats on Google Play

The math is simple, if the 8,732 users have installed my app by the end of Aug 20 and by the end of Aug 21 Google tells me that the total of 9,285 users have installed my app, I can assume that 553 users have installed my app on August 21. That does not seem to be true, as Google claims it was only 94 users. The data I get from Google shows differences of a few installs per day, with a whooping 459 on August 21!

I do not want to say that somebody is cheating here, but the data as it is delivered now is not worth much, which is a strange thing given the fact that Google has plenty of time to collect, sort, and check it. I do realize the scale of the data stream Google has to deal with and I am aware of issues like time drift, but these should not influence the validity of the data. SIM Info is just a simple app, but if I was to explain the performance of a VC-funded Android app to my investors, I would have trouble saying that my data can be trusted,

The numbers do not add up.

Update: August 24, 2012, 7:00 am GMT

Well, this is strange… the number of the total user installs has been corrected and today it is lower than yesterday. According to Google, by the end of August 22, 2012 my app was installed by only 8903 users which is lower than 9,285 users reposted for August 21, 2012 and that number never goes lower.

The number of the daily installs does not add up either.  If you subtract 9,285 from 8,903, you get -382, but Google reports 80 installs on August 24, 2012. The data is a mess.

I really do not know what to think of it. I wish someone would explain the algorithm used to compute those numbers. It does not look like the actual number of downloads, but more like some estimate, which is troubling.

I posted a question about this on Stackoverflow and got suggestions I should use some sort of external monitoring service, but this is excessive and potentially expensive for the user (network data access costs users money) so I’d rather avoid that in what is a simple utility. I might consider using such solution in an app that requires internet access anyway.

Update: August 26, 2012, 7:00 pm GMT

Not sure if it was my activity on the subject that has caused Google to notice the problem, but they posted a message to developers on the Google Play Developer Console.

And finally…

Update: August 29, 2012, 4:00 pm GMT

Google have adjusted their stats and fixed the numbers for August 21 & 22, 2012.

However, the discrepancies between the difference between the total number of user installs for two consecutive days and the daily number of installs remain.

Time will tell if Google fixes that. I think they should.

Update: September 12, 2012, 10:00 am GMT

Google has informed developers that their stats from September 6 are not correct. This seems to be a more serious problem that we originally thought.

PS. If you want to know what tool I use to make my analysis quicker, read the description of the script I wrote to parse Google Play stats.

PS. If you want to learn Android programming, have a look at these Android programming books.