Using Excel Power Query to Unpivot and Transform Data for SPSS Analysis of Variance

spreadsheets-24956_640

Excel, of course, is quite capable of providing simple statistical summaries and analyses. However, there are occasions when bigger guns must be brought to bear on a statistical problem, and Excel data must be exported to a dedicated statistical package such as IBM’s SPSS. Since SPSS can directly read Excel files, should be s simple matter, right? Would be, except that Excel and SPSS each expect their data to be presented in different formats. We cannot transfer data from Excel to SPSS without also changing its layout and organization.

Here is an example of some sample data used earlier to illustrate ANOVA (analysis of variance) in Microsoft Excel.

Excel ANOVA
ANOVA performed in Excel.

This is the format Excel wants. Different drugs represented by different sets of rows; different doses in different columns. SPSS, on the other hand, will not know what to do with this format. SPSS relies almost exclusively on receiving data as individual cases. That is, each row represents a single measurement. In for the data in example there would be three times as many rows, since each of the three dosages must now be represented by additional rows. Furthermore, the “aspirin” and “acetaminophen” labels are not going to be sufficient for SPSS. The Excel user recognizes that “Aspirin” applies to 10 rows; SPSS is not that clever. Fortunately, Excel makes it easy to copy the labels multiple times.

Table Preparation

Rather than create a larger and more complex Power Query, we’ll take things stepwise for clarity

Let’s start by preparing an Excel table to provide input data. Copy the range used in the Excel ANOVA to a new worksheet and turn it into a table named InTable. After selecting the cell containing the drug name “Aspirin”, we grab the little square in the lower right-hand corner of the cell and drag it downward. We do the same for “Acetaminophen.” Label the first column “Drug.”

Figure02-ExcelANOVA

Unpivot the Data

We can now click on the Power Query tab of the Excel ribbon and choose “From Table.” If the data were in a range rather than a table, Power Query would turn it into a table, and the usual rules for table creation, e.g. no gaps between rows or columns, apply.

Once our data is imported into Power Query we are ready to go. There is no “Unpivot” choice in the Power Query ribbon. We must highlight the data columns and then right-click in the header and select the choice “Unpivot Columns” from the context menu. The data is now represented as one case = one row, the way SPSS lkes. We should be ready to click the “Close and Load” button to return our data to Excel. But we are not. Not yet.

ColumnContextMenu

Once again, in a real import, we would do everything in a single Power Query, but for clarity we will perform the necessary steps in a series of queries. We’ll name this first query “DrugDataUnpivot” and click “Close & Load.”

The new table should look something like this:

ResultingUpivotedTable

Another problem now becomes apparent; the values for the Dosage column are text data, not numeric values.

Replacing Text

Once again we click “From Table” on the Power Query ribbon. Right-click on the header for the dosage column and select “Replace Values.”

ReplaceValues

Replace mg with blank

In the Replace Values dialog, we indicate that we want to replace “mg” with nothing at all. The mg units are gone, but the column is still a text datatype.

mg gone still text

With the Dosage column still selected, drop down the Data Type list from the Transform tab of the Query Editor. Select “Whole Number.”

DatatypeMenu

After selecting “Whole Number” the data type for the dosage will be numeric, a requirement for ANOVA.

mg gone now integer

That’s Not Enough for SPSS?

For reasons known only to IBM, SPSS will not recognize group or treatment names in text. We must therefore replace “Aspirin” and “Acetaminophen” with code values (0 and 1 are as good as any). No problem, we now know how to do that. With the “Adjust Dosage Datatype” query as the starting position, select From Table yet again from the Power Query ribbon. Name this query OutTable_CodeGroups. Replace “Aspiring” with 0 and “Acetaminophen” with 1. Change the datatype to Whole Number. Click Close & Load.

It would be wise to rename the worksheet contain the new “out” table something like, well, OutTable. That way it will be easily recognizable as the sheet we want when we are loading it into SPSS.

Save and close the workbook.

Not really Excel, but…

The Open | Data dialog in IBM SPSS has Excel files as an alternative file type.

SPSSOpenData

Once we have navigated to and opened the Excel file, we will be asked to identify the worksheet or range containing the data of interest.

OpenExcelDataSource

Conclusion

While we have chosen a specific application of the unpivot transformation for our example, it illustrates that Power Query is a very general tool that aids greatly moving data between databases and a wide assortment of analytical tools.

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.