Category Archives: code

Writing Smarter Fabric Scripts: Telling the Difference Between Local and Remote

Have you ever written a Fabric script that work wonderfully well with remote hosts only to realise that it has a problem running locally? If that is the case, then env.host is your friend.

#! /usr/bin/env python
# save this as smartfab.py
from fabric.api import local
from fabric.api import run
from fabric.api import env

def run_that_job():
    """get some info about your host"""
    cmd = "uname -a"
    if env.host:
        run(cmd)
    else:
        local(cmd)

If you run this script with the -H option, it will execute cmd using run(), but if you omit -H, cmd will be executed using local(). To see the difference for yourself, run

$ fab -f smartfab.py -H some_remote_host run_that_job

and compare results with:

$ fab -f smartfab.py run_that_job

A CSDL Syntax File for Vim

It’s not always that you get to contribute to one of the oldest Open Source software projects, but that’s exactly what happened on March 7, 2013. Bram Moolenaar was kind enough to include my CSDL syntax file to the official Vim source code repository.

Being a long-time Vim user it meant a lot to me and I’d like to thank Bram for letting me add my name to the list of contributors. I really appreciate it.

If you write CSDL, this file will help you do it in a more convenient way. Newly built Vim binaries should start including CSDL support soon, but if don’t want to wait, you can grab the CSDL syntax file from the datasift-vim repository.

CSDL syntax highlighting works automatically for .csdl files, but you can force it in the following way:

  • Press Esc
  • Type :syn on
  • Press Enter
  • Press Esc
  • Type :set syntax=csdl
  • Press Enter

To achieve the same effect In gVim or MacVim, select Syntax -> Show filetypes in menu and then select Syntax -> C -> CSDL.

Quick backups with Fabric and Python

Backups… you know you have to make them, but you would so much rather pay someone else to do it. And you know, what? You can outsource it to someone else and you won’t even have to pay for it. That’s right! Let a Python script do the hard work while you do something far more creative.

If you are using a shared hosting provider like DreamHost you are probably hosting a number of domains on there, because it is cheap, because it is easy, and because it makes sense to test things on an inexpensive server before you commit to a dedicated machine or something even more powerful.

This article is also available on Amazon Kindle. You may consider buying it, if you would like to keep it for your reference.

Quite often, those least expensive server options do not come with backup tools, and even if they do, you might want more control and a local copy, or maybe a remote copy that is done your way. For those times, and for many others, Fabric is the tool to go to. What is Fabric? The official documentation states that:

Fabric is a Python (2.5 or higher) library and command-line tool for streamlining the use of SSH for application deployment or systems administration tasks.

There were a lot of long words in there. We’re naught but humble pirates… What is it then?

From my point of view it is the easiest way to write Python scripts that automate system administration, network administration, and software deployment. Those tasks routinely use ssh, scp, and sudo and are very cumbersome to write in Bash. Writing them in Python results in much cleaner and maintainable code. If you ever tried to put off automating certain tasks because it would be a nightmare to write them in Bash, Fabric will make you very happy indeed.

On the surface it is a pointless replacement for custom shell scripts and makefiles, but when you start using it, you realize that it simplifies just what needs to be simplified and leaves you to do what you want. And that’s what the good tools are supposed to do.

Consider the simple task of making backups of directories on a shared server. If you were to do it by hand, you would use ssh to log into the server, make a note of the directory path (OK, you can skip that step if you already have that information), create a .tar.gz or .zip archive, log out, use the information you gathered to issue an scp command to copy the archive to the local machine.

If you were to write a Bash script to do that for you, you would need to test and re-test things and you’d quickly grow discouraged. Fabric lets you get these tasks done quickly and encourages saving them in a library of recipes known as fabfile (an obvious play of words on the old Unix makefile). It is a Python script that the fab will look for by default in the present working directory. Here’s a example of a Fabric file, a script called fabfile.py that the fab command looks for by default

#!/usr/bin/python2.7
from fabric.api import local
from fabric.api import get
from fabric.api import put
from fabric.api import reboot
from fabric.api import run
from fabric.api import sudo
from fabric.context_managers import cd
import time

def remote_get_archived_dir(da):
    """Make a tar.gz archive of da (a directory under ~/)."""
    o = run("cd ~; tar -zcf ~/%s.tar.gz %s" % (da, da))
    ds = "%s_backup_%s" % (da, time.strftime("%Y%m%d-%H%M%S", time.gmtime()))
    sl = local("mkdir %s" % ds)
    sg = get("~/%s.tar.gz" % (da), ds)

You can use it to backup any directory under ~/ in the present working directory on a local machine using the following command:

$ fab -u username -p password -H hostname remote_get_archived_dir:da

For example, if the directory you wanted to archive and make a local copy of that archive was called force located on a host called luke.example.com, you’d use the following command:

$ fab -u hansolo -p pssst -H luke.example.com remote_get_archived_dir:force

And since each fabfile is a Python script, you can use all of the power of Python in such scripts.

Fabric is not a part of the Python distribution, you need to install it using the following command:

$ pip install fabric

Once you have done that, see for yourself how easy it is to do the stuff you had to do by hand.

PS. If you want to learn Python, have a look at these Python programming books.

How I analyze my Google Play Android App install stats with Python

Last week I wrote about the discrepancies I spotted between the number of the daily user installs and the differences between the total number of the user installs for two consecutive days. Those two values ought to match, but they don’t and it seems to be a very random process. Google have adjusted the stats for August 21, 2012 and August 22, 2012, but the smaller discrepancies are still there.

The stats I use are available from the Google Play Developer Console. You need to go there, find you app, click on Statistics and then click on all in the top right corner. Once the graphs refresh, click on Export as CSV and make sure all boxes are ticked. Download the ZIP file, unpack it and import into your favorite spreadsheet.

This article is also available on Amazon Kindle. You may consider buying it, if you would like to keep it for your reference.

You can monitor those stats yourself daily, but it would take up up to an hour of your time, if you were to import that data into a spreadsheet by hand.

I use a simple Python script to pre-process the data before pasting it into a spreadsheet. It saves me a lot of time and I thought it might be a good idea to share that tool with other Android developers, in case you might want to use it. It is also a good example of how using a few standard Python modules can help you save time processing data.

Prerequisites: Python 2.7.1 or later.

You can check which version of Python you have installed on your system with the following command (do not type $):

$ python --version

Here is how I begin my script:

#!/usr/bin/python

The first line gives the command-line interpreter a hint to about the location of the Python interpreter on your system. If Python is not located at /usr/bin/python adjust the first line to match your system’s configuration. (This line is an example of what is known as the shebang in Unix scripts.)

Next, we tell Python to import the following four modules:

import argparse
import csv
import re
import zipfile

Here’s what they do:

argparse is used to parse command-line options and arguments, i.e. anything that is listed after the name of the script. It also displays syntax and usage information when the user makes a mistake or uses the -h option. (For more information on argparse consult the official documentation.)

csv is used to read and write Comma-Separated Values (CSV) files, which is the lowest common-denominator file format used to exchange data between different spreadsheets. It is also the format that Google Play publishes your app stats. (For more information on csv consult the official documentation.)

re is the module that implements regular expressions, necessary for filtering text. (For more information on re consult the official documentation.)

zipfile is used to read and write ZIP archives, such as those served by Google Play Developer Console when you click on the Export to CSV link. It can access files inside archives without you having to explicitly unpack them. (For more information on zip file consult the official documentation.)

I will describe those modules in a little more detail later. Let’s have a look at the default values set in the next section:

# define defaults

redate = re.compile('^[0-9]{8,8}$')

redate is a regular expression, a pattern that matches any numeric string expressed using digits 0-9. That string must contain exactly eight digits, e.g. 20120801, but not 20120108a.

To be precise, redate is an SRE_Pattern object returned by the compile() function defined in the re module. Every time you want to use regular expressions, you must first define (compile) the pattern you will be using to match, search, delete, or replace strings with.

rows = []

rows is a list that will store the data extracted from the the APPID_overall_installs.csv file. (APPID is the ID of your Android application, e.g. com.example.myapp.)

rc = 0

rc is a helper variable. Its use will be explained later.

dui = 0

dui stores daily user installs, the numbers extracted from the daily_user_installs column from the APPID_overall_installs.csv file.

ddui = 0

ddui is described later.

tui = 0

tui stores total user installs, the numbers extracted from the total_user_installs column from the APPID_overall_installs.csv file.

dtui = 0

dtui is described later.

Once the defaults have been set, the script can begin parsing command-line options and arguments. To do that it needs to create an argument parser object:

# parse arguments

parser = argparse.ArgumentParser(description='Compare the total number of user app installs with the number of the daily user installs on Google Play.')

An argument parser is an object returned by the ArgumentParser() function of the argparse module. Right now it is just an empty object that doesn’t do much, although if you were to run your script with the -h option, it would display the helper text defined in the description argument of ArgumentParser().

Our script needs to know which ZIP file you wish to use data from and the app ID. We will pass them as arguments of the -f and -a options respectively. Definitions of those options are added to the parser object with calls to the add_argument() function:

parser.add_argument('-f', required=True, action='store', dest='fin', 
                    help='the name of the ZIP archive file downloaded from your Google Plus Developer Console')

parser.add_argument('-a', required=True, action='store', dest='appid', 
                    help='app ID, e.g. com.example.myapp')

The first argument of add_argument() is the option string, e.g. '-f' defines the -f; the second argument is required, which is set to True for every option that must be set for the script to functions properly.

Next, you need to tell the parser what it should do with the arguments that follow the options. This is specified in the value of the action argument of add_argument(). Since we need to process those values later on, the script needs to store them somewhere. Hence action is set to 'store'.

Once we tell the parser what we want to do with the arguments to the options defined, we need to tell it where it should store those values. The names passed as the values of the dest argument will become the names of the properties of the parser object. They are also displayed in uppercase in the syntax section printed when you run the script with the -h option or when you make a mistake.

The last argument, help defines a short description of the purpose of the option and its argument.

Once all options have been defined, we need to initialize the parser:

args = parser.parse_args()

If all goes well, we should be able to read the ZIP archive:

# process the zip file

zf = zipfile.ZipFile(args.fin, 'r')

If the argument of the -f option (stored in args.fin) is a valid path to the ZIP file downloaded from Google Play, we should be able to read it. That’s what we request when we pass the 'r' argument to the ZipFile() function.

Knowing the app ID we can splice together the name of the CSV file inside the ZIP archive represented by zf. The app ID is given as the argument of the -a option and stored in args.appid. That spliced name is passed as the first argument of reader() and the second argument is delimiter, which separates cells in rows. It is set to , for the CSV files generated by Google Play.

# process the CSV file

cf = csv.reader(zf.open("%s_overall_installs.csv" % args.appid), delimiter=',')

The cf object represents the CSV file stored inside the archive represented by zf

We will now read the data, row by row, skipping empty rows and those where the first cell does not mach the regular expression pattern stored in redate.

The rows that pass the tests, are inserted at the beginning of the rows list, which is a way to reverse their order. We do it, because it helps process data later on.

for r in cf:

    if r == []:
        continue

    if not re.match(redate, r[0]):
        continue

    rows.insert(0, r)

We now have a list of rows in reverse order, from the earliest to the latest stats. Now is the time to crunch data:

for r in rows:

    dtui = int(r[4]) - int(tui)
    ddui = int(r[3]) - int(dtui)

    if rc == 0:
        print r[0] + "," + r[3] + "," + r[4]
    else:
        print r[0] + "," + r[3] + "," + r[4] + "," + str(dtui) + "," + str(ddui)

    tui = r[4]
    rc = 1

Because we need data from the previous day to compute the difference between the total user installs of the app, the first row (the oldest entry) needs to be printed with some data missing. This is why we use the rc flag.

So, the first row of the output will contain just three cells:

date,daily_user_installs,total_user_installs

Starting with the second row, the output will consist of five cells per row:

date,daily_user_installs,total_user_installs,delta_total_user_installs,delta_daily_total_user_installs

where:

delta_total_user_installs = dtui computed as the difference between the total number of user app installs for today minus the total number of user app installs for yesterday;

delta_daily_total_user_installs = ddui computed as the difference between the daily number of user app installs minus delta_total_user_installs.

I called my script gpstats.py.

To make it executable, you need to run the following command:

$ chmod 0755 ./gpstats.py

When you run gpstats.py it should produce a five column output that you need to capture to a file, preferably with a .csv filename extension so you can later import it into your favorite spreadsheet. Remember to redirect it into a file of your choice using the > symbol, e.g.:

$ ./gpstats.py -f com.example.myapp.zip -a com.example.myapp  > mystats.csv

The output file can be opened in any spreadsheet application.

You can download the script if you would like to see if you notice similar discrepancies in the stats reported by Google Play.

Have fun!

PS. If you want to learn Python, have a look at these Python programming books.

Can you trust Google Play statistics?

Last week I did a promotion for my Vim book on the Amazon Kindle publishing platform. It went very well (more on that tomorrow) and it reminded me how well Amazon is prepared to handle both sales and sales reporting.

That Amazon employs some of the best software architects, developers, and admins can be seen not only in their extensive AWS catalogue, but also in the Amazon Kindle Direct Publishing control panel. When someone buys my book using their Kindle account, the sales count is updated within a few minutes allowing me to monitor the success of my promotions. That is exactly what I expect from a company that brought cloud computing to the startup masses and moved their own backend to their own cloud.

Similar levels of service can be reasonably expected from Google Play. After all, Google is another well-managed cloud. Or so we tend to think. Until a few months ago, I did not watch the application installations statistics very closely, as it was somebody else’s job and all I got was the total count (the clients had the raw data), but ever since I decided to finally embark on a serious Android-based project myself, I got to watch the stats more closely. And I noticed something I cannot understand. Let me explain.

First of all, I do not understand why Google cannot show me the number of installations in real time or in a slightly-delayed continuous fashion like Amazon can? This should be doable given the intellectual power and the infrastructure available to Google. So, why are those stats published once a day?

SIM Info deltas

Second, I do not understand the differences in the installations statistics reported by Google itself. Let me use my simple utility, SIM Info, as an example. The stats provided by Google contain, among other things, the total number of users who have installed my app. In theory, the difference between the total number of users who have installed my app by the end of August 20 and the total number of users who have installed my app by the end of August 21 ought to be equal to the number of the users who have installed my app on August 21.

SIM Info installations stats on Google Play

The math is simple, if the 8,732 users have installed my app by the end of Aug 20 and by the end of Aug 21 Google tells me that the total of 9,285 users have installed my app, I can assume that 553 users have installed my app on August 21. That does not seem to be true, as Google claims it was only 94 users. The data I get from Google shows differences of a few installs per day, with a whooping 459 on August 21!

I do not want to say that somebody is cheating here, but the data as it is delivered now is not worth much, which is a strange thing given the fact that Google has plenty of time to collect, sort, and check it. I do realize the scale of the data stream Google has to deal with and I am aware of issues like time drift, but these should not influence the validity of the data. SIM Info is just a simple app, but if I was to explain the performance of a VC-funded Android app to my investors, I would have trouble saying that my data can be trusted,

The numbers do not add up.

Update: August 24, 2012, 7:00 am GMT

Well, this is strange… the number of the total user installs has been corrected and today it is lower than yesterday. According to Google, by the end of August 22, 2012 my app was installed by only 8903 users which is lower than 9,285 users reposted for August 21, 2012 and that number never goes lower.

The number of the daily installs does not add up either.  If you subtract 9,285 from 8,903, you get -382, but Google reports 80 installs on August 24, 2012. The data is a mess.

I really do not know what to think of it. I wish someone would explain the algorithm used to compute those numbers. It does not look like the actual number of downloads, but more like some estimate, which is troubling.

I posted a question about this on Stackoverflow and got suggestions I should use some sort of external monitoring service, but this is excessive and potentially expensive for the user (network data access costs users money) so I’d rather avoid that in what is a simple utility. I might consider using such solution in an app that requires internet access anyway.

Update: August 26, 2012, 7:00 pm GMT

Not sure if it was my activity on the subject that has caused Google to notice the problem, but they posted a message to developers on the Google Play Developer Console.

And finally…

Update: August 29, 2012, 4:00 pm GMT

Google have adjusted their stats and fixed the numbers for August 21 & 22, 2012.

However, the discrepancies between the difference between the total number of user installs for two consecutive days and the daily number of installs remain.

Time will tell if Google fixes that. I think they should.

Update: September 12, 2012, 10:00 am GMT

Google has informed developers that their stats from September 6 are not correct. This seems to be a more serious problem that we originally thought.

PS. If you want to know what tool I use to make my analysis quicker, read the description of the script I wrote to parse Google Play stats.

PS. If you want to learn Android programming, have a look at these Android programming books.

Your users will tell you what you wrote

Ask anyone who wrote any type of a non-game application and they will tell you that users often invent ways of using your code that you have never dreamt of.

Case in point? Spreadsheets, which many people use not to do any sort of financial modelling, but to keep lists of things to do. I have a friend who plans conferences in OpenOffice.org Calc.

But you do not have to write huge pieces of code to experience that phenomenon. I recently published a free Android app that originally began life as a part of my test code. I wanted to know more about the SIM cards I was using to test my other apps.

When I published SIM Info I was convinced it would only be of interest to a small group of developers, but as it turns out, many users find it handy when they need to unlock their phones and tablets. Users tell me that operators ask them for information that is not displayed in the Android Settings menu and that when SIM Info becomes a very handy utility to have.

Based on that feedback, I added a way to share the information displayed by SIM Info via email, text message, or any other communication channel available on your Android device.

The moral of the story is you should just put your code out there and listen to your users. They will tell you what they think of your code and how it is useful to them. And that’s what counts.