Machine Learning using Spark and R
Mar 27,
2017
R is ubiquitous in the data science community. Its ecosystem of more than 8,000 packages makes it the Swiss Army knife of modeling applications. Similarly, Apache Spark has rapidly become the big data platform of choice for data scientists. Its ability to perform calculations relatively quickly (due to features like in-memory caching) makes it ideal […]
Version control for data scientists using Git and RStudio
May 20,
2016
Look at the working directory of the average data science project and you’ll see things like this: cust-churn.csv cust-churn.R cust-churn-good.zip cust-churn-old.csv cust-churn-progess-meeting.R cust-churn-working.R cust-churn-working2.R cust-churn-20160217.R test.R test-bk.R Every time a change needs to be made, files are copied to preserve the working code. As changes will often be made to multiple files, it’s common to […]