In the previous instalment on querying economic data with SQL we saw a problem not uncommon in the world of worksheets-as-database. Errors and inconsistencies that seem trivial when looking at the data themselves become annoying impediments to creating the pivot table we want. In many cases, the traditional tools of the SQL database are well suited to the clean-up operation we need. In this post we will look at how SQL together with the Excel Data Model can repair the data.
We could, in principle, fix the problem data where it lies. This has the drawback that future edits might re-introduce the very same inconsistency. No, the better solution is to create a separate table in which each commodity description appears but once. To accomplish this, we will need to make an ADO connection to our own worksheet, as described in an earlier blog. (If you like, you can import this VBA module into your workbook.) Once we have a connection, we can use a SELECT DISTINCT query to get a list of individual commodity descriptions. It turns out that although there are 30,000+ rows of data, there are only 6 descriptions representing only 3 actual classifications. It is easy to edit this by hand. The rows with the problem characters can simply be deleted. One of the (almost) duplicate descriptions can be deleted as well. Now we are ready.
SELECT DISTINCT [Commodity Code], Commodity FROM CommodityTradeData
We note that when we use the ADO to insert rows into a worksheet we do not automatically obtain column headings. Since we wish to add this small table to our data model, we should add column headers now. Once the column headers are in place, we go to the Insert tab a turn this small set of rows into an Excel table, remembering, of course, to ensure the “my table has headers” box is checked. After creating the table, we’ll name it “CommodityCodes”.
At this point, we can add the new short list of commodity descriptions to our data model just as we have before, by going to the Connections button on the Data tab and choosing “Add to the Data Model…” from the dialog “Add” button menu.
Since we no longer need the column containing the original commodity descriptions, it would be best to delete it from the Excel table. The column will be deleted fro the Power Pivot representation when we update.
Of course, the original goal was a working, sensible pivot table, so let’s try again. Oops; after selecting both the trade value and the commodity from the Pivot Table Field List, I’m reminded that I forgot to define the relationship between the original data and the new CommodityCodes table. No problem now that we know how to do that. Drag-and-droppers can go to the Power Pivot Diagram View to create the relationship, it’s probably faster to click the create button in the field list warning and edit the relationship directly.
We can now include valid commodity groups in pivot tables! A finished version of the workbook is available here.
Often important relational data is made available over the web in Excel workbook format. To successfully explore this data using pivot tables, we must establish the same sort of relationships as are defined in databases. The advent of Power Pivot and linked tables made this process direct and practical. Now that Excel 2013 has more strongly integrated this functionality into the user interface, using standard database design practices for data stored in workbooks is easier than ever.
For more on Power Pivot, have a look at Learning Tree’s 2-day course – Power Pivot for Excel: Mining Data for Business Intelligence. Attend in-class at one of our global centers or online from the convenience of home or office!