As we noted in the previous post on how to use Power Pivot Linked Tables, a wealth of economic information is available for download from banks and government agencies, as well as the United Nations. Unfortunately, the structure of that data differs from source to source, and rarely does it meet our requirements as is. In this installation, we will see how to use the standard SQL query language to access data contained in Excel worksheets.
To accomplish this goal, we will connect to the Excel data using VBA and the ADO (ActiveX Data Objects). Though it seems a tad, well, perverse for a workbook to query its own data, it’s a perfectly valid technique and is often the shortest route to the data we want.
For this demonstration we will use data on the import and export of edible agricultural products in 2014, as tabulated by the UN. For those interested, the full data bank is accessible here. The 2014 data in Excel is available here. Since this example does not involve Power Pivot, the same workbook should be OK for Excel 2010 and Excel 2013.
Dim conn As ADODB.Connection
Dim rs As ADODB.Recordset
To use the ADO in our VBS code we will need to set a reference. The list of available COM libraries will likely contain many versions of the ADO; unless you have a very specific reason, there is probably no good reason to use anything except the most recent library. Just click the checkbox.
If you think this technique might be useful in a variety of different workbooks, you might like to code the VBA to be as general as possible.
Dim strFileName As String
Dim ConnectionString As String
strFileName = ThisWorkbook.FullName
ConnectionString = “Provider=Microsoft.ACE.OLEDB.12.0;Data Source=” _ & strFileName & “;Extended Properties=””Excel 12.0;HDR=Yes;IMEX=1″”;”
‘This provider is for 64-bit Excel.
‘The version number for Excel is not important
Set conn = CreateObject(“ADODB.Connection”)
Set rs = CreateObject(“ADODB.Recordset”)
By using ThisWorkbook.FullName we avoid hardcoding the workbook name into the VBA. Of course, if you are using 32-bit Excel, the connection will have to be modified. Mocrpspft’s documentation uses different Excel version numbers for different connection strings, but I have been unable to find a circumstance where it makes any difference. The important think is the Access database engine. Curiously, the version number for Excel seems to make no difference.
Microsoft.ACE.OLEDB.12.0;refers to the 64-bit Access engine, while Microsoft.Jet.OLEDB.4.0; references the 32-bit version.
It is now time to create SQL queries to obtain the subset of data in the form we wish. In this example, we will use the same group table as we used in the previous installment on Power Pivot linked tables. In this case, however, we are not using pivot tables, we will be taking matters into our own hands with SQL. Most of the core SQL statement structures work perfectly well in Excel. We simply substitute ranges or range names for what would be table names in a conventional relational database. In these examples the range name is literally “RangeName”. Obviously this would be a stupid name in a real workbook, but the name was chosen for these query examples to clearly indicate where your range names will be substituted. Excel table names, curiously, will not work. Here are some working examples:
strSQL = “SELECT * FROM [SheetName$A1:O23]” ‘Ranges work
strSQL = “SELECT * FROM tblTestData WHERE Reporter=’Albania'” ‘ sadly, table names don’t work
strSQL = “SELECT * FROM RangeName WHERE Reporter=’Albania'” ‘ Range names work
strSQL = “SELECT * FROM RangeName WHERE [Reporter ISO]=’ALB'” ‘ Brackets work for column names with spaces
strSQL = “SELECT CountryCode FROM GroupRangename WHERE GroupName=’ASEAN’ “
SQL Subqueries Work:
strSQL = “SELECT * FROM RangeName WHERE [Partner ISO] IN (SELECT CountryCode FROM GroupRangeName WHERE GroupName=’ASEAN’)”
SQL JOIN Syntax Works:
strSQL = “SELECT grn.GroupName, rn.* FROM RangeName rn INNER JOIN GroupRangeName AS grn ON rn.[Partner ISO] = grn.CountryCode”
For any SQL Server folks in the group it is worth noting that in these queries the”INNER” specification is not optional.
Once we have create the SQL query string, the only thing left to do is execute it and return the results to a new worksheet. Fortunately ADO makes this trivially easy.
rs.Open strSQL, conn
Dim ws As Worksheet
Set ws = Application.Sheets.Add
The CopyFromRecordset method is a delight; you don’t have to concern yourself with looping through rows and writing them individually. And if that weren’t enough, the method is extremely fast.
Don’t forget that after all is said and done Excel is not a relational database and does not automatically protect you against duplicate or invalid keys. It’s up to you to check to ensure that your results are correct!
Sometimes downloaded economics data is not quite in the structure we might have liked, but we can use SQL to extract and join column data to produce worksheet data in the form we want. Stay tune to my next post where we will learn to repair data using SQL & the Excel Data Model
If you’d like to become a VBA programmer, have a look at Learning Tree’s 4-day hands-on VBA Programming course.