The most direct method for reading SQL Server data into the statistical program R is with ODBC. If R is running on a Windows system, this is straightforward using the R package RODBC. On a Linux system, the installation of RODBC is more challenging.
Of course, RODBC requires a working ODBC system be present on the machine. An open-source implementation of ODBC suitable for Ubuntu, as well as most *nix systems, is provided by www.unixodbc.org. Unfortunately, there are many pitfalls and stumbling blocks on the path to a working installation. If you are installing unixODBC on an existing system you must remove all traces of any previous installation. My tests of SQL Server vNext on Linux are run on fresh virtual machines, so previous installations are not an issue. Nevertheless, successfully installing and running ODBC and RODBC is not easy.
If you wish to create a system with SQL Server vNext and R on Ubuntu 16.04, I encourage you to install ODBC and R first. The installation ofunixODBC on a fresh system is easy; the “standard” compilation and installation should work straight away.
After downloading and untarring theunixODBC source, simply run
sudo make install
After unixODBC is installed, R can be conventionally installed. The only thing that remains is to install the RODBC package necessary to connect with SQL Server. RODBC binaries can be installed using apt-get.
sudo apt-get install r-base-dev
I am not sure if the following is necessary, but I ran it on the “couldn’t hurt” philosophy:
sudo apt-get build-dep r-cran-rodbc
sudo apt-get install r-cran-odbc
If these commands executed without error, it should not be necessary to install the package from within R. We can confirm a successful installation by starting R and loading the library.
We can now install SQL Server vNext for Ubuntu using the method described in the previous blog. When we are finished, there should be three new folders in /etc, just as there were before.
Copy the Microsoft folder and the mssql-tools folder into a new folder in your home directory. Copy also the odbcinst.ini file from /etc.
Now that SQL Server has been installed, we can start R and load the RODBC library. This time it fails with a library does not exist error. We can try to reinstall the library.
sudo apt-get install r-cran-rodbc
This time the installation fails because of a missing dependency, libodbc1. We can reinstall.
sudo apt-get installl libcodbc1
sudo apt-get install r-cran-rodbc
Now, apt-get install r-cran-odbc should succeed. If, however, we return to the /opt directory, two previously installed folders, including the Microsoft SQL Server ODBC driver, have vanished.
This is why we copied two folders into the home directory. They can now be copied back. If it is missing, copy back odbcinst.ini, which is the file that tells ODBC about the physical location the Microsoft driver. The copy will, of course, have to be performed with administrator privilege. After copying, we should have a working RODBC package and an ODBC driver for SQL Server vNext on Linux.
Successful installation of ODBC and RODBC on Linux requires attention to detail. The installation can be eased somewhat by starting with a “clean slate”, that is, a fresh VM. I strongly encourage you to take careful notes during the installation documenting any problem with failed installation or missing dependencies.
In the next installation, we will fire up R and actually create some data frames from SQL Server data!