Sales data analysis

Most online businesses today have reams of customer and order data that can provide the basis for sophisticated customer behaviour and attribute analysis.

Using freely available socioeconomic data from the ABS (http://www.abs.gov.au), it is possible to match your sales data (via postcodes) with demographic information to gain powerful insights into the people buying your products and their behaviour over time. With this information, you can develop highly targeted marketing programmes that can drive your marketing spend into the most effective channels, or gain a better understanding of your customer base and the types of products bought by different socioeconomic segments. Postcode matching against sales data also allows you to understand the performance of your business by geographic region.  The following description provides technical details on what's available, and how to match up your data. The next section suggests a few examples of this data analysis in action.

ABS Socioeconomic Data

Once every 5 years, the ABS conducts an Australia-wide census. Each household in the country is asked to respond to a survey, and this information is collated into summarised data sets. The most recent datasets cover the census conducted in 2011, and data for this census is now available. Of great interest to online businesses are socioeconomic indicators, and the ABS now provides postcode-keyed datasets that can linked easily to your sales or customer data. See the Postal Areas data cube at Postal Area SEIFA datacube to download the socioeconomic datacube, keyed by postcode. Socioeconomic status is measured by the ABS Index of Relative Socio-Economic Advantage and Disadvantage definition. See SEIFA definition to understand how socioeconomic status is measured in ABS surveys - it is important to understand this definition and its limitations before you perform analysis.

For standard matching of socioeconomic data to postcodes, get the 'Postal Area, Indexes, SEIFA 2011' .zip file from the ABS datacube page (link is above). This file contains a spreadsheet containing postcode, and the indices of relative socioeconomic advantage/disadvantage for each postcode. Multiple worksheets contain the same data listed by socioeconomic rank, and split up into the different indices. 

Non-postcode keyed ABS data

ABS also provides a set of datacubes keyed by other non-postcode area variables (eg. state/suburb, state or commonwealth electoral groupings etc.) - these will usually require some basic transformation to make them useful as they use different geographic keys, customised to the ABS's data collection methodology. You are unlikely to have your sales data keyed by any of these ABS keys, with the (important) exception of state.

Geographic Area Identification in ABS data - the SA1_MAINCODE

To identify geographic areas, the ABS uses a custom, master identifier key called SA1_MAINCODE. This key is an 11-digit number and represents a specific geographic area in Australia, usually less than a square kilometre in capital city suburbs. It is the lowest level geographic identifier provided by the ABS, and there is a many-to-one relationship between SA1_MAINCODES and postcode numbers. The median number of SA1_MAINCODES for each postcode is 11, but in the case of a few postcodes, there are a much larger number of constituent SA1_MAINCODES (eg. Toowoomba 4350 has 264, and Glen Waverly East has 154). There are a small number of SA1_MAINCODES that 'cut across' a couple (or more) postcodes, but in general, the many-to-one relationship between SA1_MAINCODES and postcode is a reasonable approximation.

Example PHP script for creating an ABS Socioeconomic data table in MySQL

The link below provides some example code to create and load an ABS socioeconomic database with postcodes. You will first need the ABS data from the ABS Data packs link above, put into .csv tab-delimited format. Unzip this file to retrieve the Excel spreadsheet file containing the data. For the script below, you will want the worksheet titled 'Table 1' which contains the Export this worksheet to a flat file (tab-delimited if you want to use the script below), then 'tidy' it up by removing summary information at top of sheet. Before running the script below, you should have a flat file that has just the colum headings and the data. You will need a postcode reference file if you want to look up the suburb names/states for postcodes: a link to this file is provided below.

PHP Code to setup ABS database and postcodes: loadABSData_php.txt

Australian postcodes .csv file: POSTCODES.csv

Examples of analysis that can be obtained

Data analysis Useful to...
Median socioecononic advantage index of customers by month Check, in rough terms whether sales are moving to an 'upmarket' or 'downmarket' customer base, over time.
Sales growth by state, or a particular set of postcodes at a point in time vs. previous periods Analyse the impact of marketing spend (where that spend can be allocated to geographic regions)
Distribution of socioeconomic indices, by product Assist in identifying demographic characteristcs of relevant customer segments (for individual products)
Median socioeconomic index of sales by season Identify socioeconomic characterists of customers/sales at different times of year, eg. Summer, Christmas, 'off'-season

Links

ABS: http://www.abs.gov.au

ABS Data packs: ABS data packs