Intel’s distribution of Python, created in collaboration with Anaconda, is billed by the hardware giant as a distribution created for performance. Specifcally, the numerical performance demanded by scientific research and machine learning. Claims are bandied about referring to Python that executes at machine-language speeds, but does Intel Python really deliver?
I won’t use the word “benchmark”, which, to my mind, suggests a scientific and carefully engineered set of performance measurements. However, there is certainly nothing wrong with taking Intel Python for a quick spin and seeing what happens.
For many data scientists, manipulation of Pandas dataframes is daily routine. I was disturbed when I read a blog suggesting that Pandas code ran slower in the Intel distribution. After all, this seems not to make sense. Intel’s enhancements rely heavily on Python code calling fully compiled libraries for numerical algorithms. Code such as Pandas code would not be expected to take advantage of these enhancements, but we certainly wouldn’t expect it to be any worse. In my experiments, Pandas data manipulation using the Intel distribution was not noticeably different from that done with the Anaconda distribution.
Matrix math constitutes a large proportion of machine learning code, and numpy is the “go-to” library used by most Python programmers for array manipulation. Running a numpy matrix multiplication routine on the Intel platform yielded a performance improvement averaging about 20%. This improvement is more impressive than it might seem at first, since the numpy modules are already highly optimized.
Numba is not an Intel library, but is included if you install the full Intel Python distribution as opposed to the core distribution. Numba provides a just-in-time (jit) compiler to convert Python code into machine-language code. In the past, I have encountered challenges installing and correctly configuring numba. I was delighted to discover that the Intel Python distribution installs numba seamlessly. When Intel’s distribution is done installing, numba is there and it works.
We shouldn’t expect to see numba yield a performance improvement in numpy’s matrix multiplication, since in this case most of numpy’s work is being done in efficient libraries, not in Python itself. We would, however, expect to see improvement in code where mathematical tasks are repeatedly performed within Python loops. The popular and perhaps overdone Mandlebrot set is an example of such code. Test code running with the numba “jitter” ran between seven and eight times faster than the Python code by itself.
Intel’s Data Analytics Acceleration Library (DAAL) is available as a separate library and does not require the Intel Python distribution. However, if you do the full install of Intel Python, DAAL comes along for the ride. This library consists of many of the functions commonly used in data analytics. It is not a python library, but Intel provides a python interface called DAAL4Py. This python interface replaces the now-deprecated pyDAAL.
OK, PlaidML is not directly related to Intel Python, but since we are speaking of Intel offerings of interest to data scientists and machine learning afficianados, plaidML should be mentioned. PlaidmML is a tensor compiler originally introduced by a company called Vertex.AI. PlaidML has found new life now that Vertex.AI has been acquired by Intel. Briefly put, plaidML takes high-level instructions for machine learning platforms such as Keras and ONNX and translates them into low-level instructions for different hardware platforms such as CPUs, NVidia CUDA GPUs, and GPUs that support OpenCL.
In contrast with my experience with Google’s Tensorflow, the plaidML installation process seems to easily and reliably discover and use CUDA drivers on both Windows and Linux systems.
Intel is ensuring that data scientists and machine learning researcher continue to see improvements in performance. Will you train a neural network over your coffee break? Probably not. However, the latest Intel offerings and upgrades seem to be squeezing more performance out of the current generation of hardware and well positioned to take advantage of the next.