Choosing a machine learning (ML) platform can be a cause of considerable stress. It’s a major commitment you will have to live with for years and an unwise decision might even affect your job prospects. Fortunately, asking yourself a few fundamental questions can simply the decision considerably.
In recent years Microsoft has made a great effort to provide machine learning tools for both the Windows and Linux platforms. However, the preponderance of folks using Microsoft tools are running them on a Windows OS. If you run Microsoft tools on Linux, there are likely to be fewer resources to draw upon if you need help.
There are a number of different but closely related machine learning platforms provided by the folks from Redmond. Though not strictly speaking machine learning platforms, both Microsoft R Client and Microsoft R Server can provide advantages. Both include the valuable RevoScalR packages. In R client, this enables the use of two cores for many problems. In R Server, these packages can use many cores and can process large data volumes in chunks, but, of course, you must pay for these advantages. With the introduction of SQL Server 2017, Microsoft added Python to the capabilities of R Server and changed the name to Machine Learning Server to reflect these new capabilities.
We say “scripting language” because machine learning tools themselves are very likely to be written in C or C++ for purposes of performance. Will you be assembling these components into working systems using C# or F#? Python? R? Perhaps Java or Scala?
If your preferred language is a .Net language like C# or F# you will likely want to focus on CNTK
Python has become the de facto standard language for machine learning scripts. Virtually all ML platforms provide direct API support for Python.
R remains popular among people who require the techniques of what you might call “classic” statistics. Fortunately, while most of today’s machine learning platforms do not provide a direct interface for R, workers who prefer R are not out of luck. Keras provides a powerful and easy R API for many platforms.
Keras is not a machine learning platform itself, but rather a high-level API for a number of platforms. At present, the Keras API is available for CNTK, Tensorflow, and MXNet. R programmers who would like to use any of these platforms need only install Keras. The Keras R package will then permit ML development from R Studio or whatever IDE the R analyst might prefer.
The architecture of a machine learning system is often determined by the data itself. People interested in image recognition often face difficulties with the computational demands of complex algorithms. They are limited by CPU cycles. In contrast, workers performing sentiment analysis of tweets or newsfeeds may be limited by the sheer volume of data to be analyzed. Folks in this latter category may well benefit from Apache Spark.
Apache Spark is so often used in conjunction with Hadoop there are many who think that it must be used with Hadoop. This is not the case. Whether standing alone or integrated with hadoop, Spark is useful anywhere the large volumes of data to be analyzed are spread across distributed servers. The default installation of Spark includes, among other things, an engine for executing distributed R and a machine learning library.
Selecting a machine learning platform involves considerations that go beyond technical considerations. Will the platform continue to evolve as machine learning does? Are there books, blogs, and tutorials? Are there places to turn for help if needed?
Tensorflow from Google seems be enjoying a great surge in popularity. While there are many aspects of tensorflow where I think Google has fallen a little short, there are many valuable resources for learning tensorflow. Other vendors have provided tensorflow support within their own products. For example, Intel’s Myriad series of video processing chips provide support for Tensorflow execution graphs (as well as support for Caffe).
Considering the high quality of many of today’s machine learning platforms it would be difficult to make a choice that was truly poor. In my opinion, however, I would lean towards Microsoft’s ML Server and CNTK for a Windows platform. On Linux I the availability of code examples, pre-trained models and documentation would push me towards Tensorflow. In both cases, I would supplement the functionality of these platforms with Keras.