Organising data in Excel for analysis

From time to time I get asked to help with a statistical analysis. When I ask for the data I often get a spreadsheet like this. OK, I can read this and sometimes make sense of it, but there are a number of important issues including:

  • how much work is involved in laying out these tables? For example, calculating averages and SEM requires multiply entering the relevant functions for each column for each group, or a lot of cutting and pasting of formulae and double checking that it worked as expected.
  • is this format easy to import into a statistics program? The answer to this is clearly NO!
  • How easy is it to regroup the data, filter the data, graph the data?  Not very easy at all!

With a little bit of learning, this can all be done so much easier. Here is a suggestion.

First, enter the data in columns with one row for each data value, like this.

The individual measurements are in column D in this example.

The corresponding treatment, animal code and time are recorded in columns A-C.

There is a bit of repetition (eg you need to copy the treatment codes over, in this case, 12 rows each, but excel makes replicating down or across easy (enter the value in the first cell of a range, select the range, press ctrl-D (copy down) or ctrl-R ( copy right). There is repetition of the animal numbers – ditto. Time, in this case, can be duplicated from the block of control to the Treatment1 block of rows. Easy. (or use Excel’s Power Query Editor to automate these steps and make it really easy – see Organising data in Excel for analysis part 2. There is a bit of learning to do, but it will save heaps of time in the long run.)

Now to analyse the data – Say you want basic means etc by treatments and times. Select the columns with the relevant data (in this case click in the A header for column A to select column A, then  hold down shift and click the D header for column D. Now you will have A-D selected.

From the Insert menu select Pivot Table. Click OK on the create pivot table dialog that pops up. This will create a pivot table with the selected data on a new worksheet.

The new worksheet will look like this. On the left you have the Pivot Table template that will be populated when you fill in the The Pivot table fields form on the right which appears whenever you select a cell within the pivot table.

In this case we wanted averages for each treatment and time. This is easy. In the the Pivot table fields form click and drag “treatment” from the top and release it in the “rows” area. The distinct values of the treatment column appear on the pivot table. Repeat this process to add “time” below treatment in the rows field. Now drag “measurement” down to the “Σ Values” area.

Your spreadsheet now looks like this:

In column A we now have a breakdown by treatment and time.

In Column B we have sum of Measurement. OOPS! we wanted average not sum. Easy. in the The Pivot table fields form, in the “Σ Values” area, Sum of measurement has a drop down icon.

Click this and select Value field settings. Here you can choose the desired function. Choose average, and the pivot table now shows the averages.

So far, so good. Now for sem. Sadly excel lacks a build in SEM function so we have to calculate it from StDev / sqrt(count). So, drag measurement to the “Σ Values” area and set it to show StDev from the Value Field Settings. And again drag measurement to the “Σ Values” area and set it to show Count from the Value Field Settings. Now we have this:

Now you have a pivot table with data grouped by treatment and time, with average, stdDev and count … select the table and Paste-Special :: Values to a new worksheet and you can then calculate the SEM easily. Enter the formula on the first row then copy the formula down and all the SEMs are done in one go.

As it is, the pivot table is a bit untidy. There are lots of things you can do. You can remove unneeded values – for example the “Blank” treatment group… Click on the “row Labels” header dropdown icon and unselect “blank”.

Don’t want subtotals etc… right click in the pivot table and select Pivot Table Options from the context menu.

Do you want a different layout. Try dragging “treatment” from rows to columns… have a play. Google tutorials on Pivot tables… they are very powerful ways to arrange and collate your data.

Want to visualise your collated data with a graph? Easy, Select your raw data columns and instead of choosing pivot table, choose Pivot Chart or Pivot Table and Pivot Chart. WIth a couple of key presses you can generate a quick graph to see what your data looks like. For example:

Now consider if you have data for, say, 10 genes at each treatment/time… Add a new column called Gene, and add 9 more blocks of data, one for each gene…

Now you can make pivot tables and pivot graphs with ease and with just a few clicks you can collate for each gene, filter the data, change graph types etc etc.

I have put a couple of examples below with a sample set of 3 genes.

Or how about a line chart showing differences between genes for the different treatments

Or how about just gene1 vs gene 3 looking at changes over time … it is all very easy if the data is organised the right way.

And did I mention that when the data is organised nicely, you can simply export the data as, say, a CSV file ready for import into your favourite statistics program for more detailed statistical analysis. Statistics programs like data in columns.

Want to find out more about pivot tables and graphs – look on the web for endless information. Here is one example that looks reasonably clear and easy

Have fun.