Setting Memory Limits for R in SQL Server 2016

In previous blogs, we’ve looked at the potential for integrating the statistical language R into soon-to-be-released SQL Server 2016. Developers and analysts are excited about the new possibilities – the response of administrators is considerably more restrained. Administrators are a conservative lot – it’s part of their job description. When learning of a new feature […]
Excel: Interpreting the Principal Components Analysis (PCA)

  What Does Principal Component Analysis Tell Us? In the first installment, we indicated that the primary reason to do a principal component analysis (PCA) in Excel was to increase our own understanding. If your goal is the PCA itself, a better choice of tool might be R, Matlab, or similar tool. Now that we […]
Version control for data scientists using Git and RStudio

Look at the working directory of the average data science project and you’ll see things like this: cust-churn.csv cust-churn.R cust-churn-old.csv cust-churn-progess-meeting.R cust-churn-working.R cust-churn-working2.R cust-churn-20160217.R test.R test-bk.R Every time a change needs to be made, files are copied to preserve the working code. As changes will often be made to multiple files, it’s common to […]
SQL Server 2016: R Integration Redux

SQL Server 2016 has now reached Release Candidate 3 (RC3). One of the new features that continues to foster interest in the analytics and data mining community is the integration of SQL Server with the open-source statistics toolset R. If you fall into this category, I’m inclined to suggest you remain patient and wait for […]
A Problem with R

  The value of R lies in the enormous quantity of code contributed by analysts and academic researchers over many years, providing a packaged solution not only for common analytical techniques but also the esoteric and the obscure. The problem with R, and one that concerns many analysts dealing with large data volumes, is that […]
