How Big is Big Data?

One of the questions I’m often asked is “How big does my data have to be before I need to start using big data tooling?” There’s no right answer to this—but that’s largely because it’s not the right question. Big Data is actually a pretty unhelpful term. It focuses attention exclusively on the volume of […]
Read More ›

Using a Pipeline Operator in R

I find myself programming more and more in a functional style these days. Obviously, R, F# and Scala encourage it—but I’m a heavy user of LINQ in C# and my JavaScript has been going that way for a while too. When programming in languages other than F#, I yearn for the pipeline operator (|>). The […]
Read More ›

How to call F# code from C#

          Functional programming is popping up everywhere these days. Why? And should we care? One reason we are seeing more of it is the explosion of interest in parallel processing. Computing clusters (for big data analysis), general purpose GPUs and multi-core processors all use parallel execution to deliver their performance benefits. […]
Read More ›

How to Use Power Query to Import Hadoop Data into Excel

When Power Query was first introduced early in 2013 it was known as the Data Explorer. In some ways, Data Explorer was a better name. The primary job of Power Query is to enable Excel users to examine data, decide what values need to be imported into Excel, and then complete the import process. If […]
Read More ›

How to Predict Outcomes Using Random Forests and Spark

Random forests are an ensemble, or model of models, machine learning approach. The algorithm builds multiple decision trees, based on different subsets of the features in the data. Outcomes are then predicted by running observations through all the trees and averaging the individual predictions. Think wisdom of crowds. Spark’s machine learning library, MLlib, has support […]
Read More ›

Type to search

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.