A New Parallel Algorithm for Sinkhorn WordMovers Distance and Its Performance on PIUMA and Xeon CPU
Abstract
The Word Movers Distance (WMD) measures the semantic dissimilarity between two text documents by computing the cost of optimally moving all words of a source/query document to the most similar words of a target document. Computing WMD between two documents is costly because it requires solving an optimization problem that costs $O (V^3 \log(V)) $ where $V$ is the number of unique words in the document. Fortunately, WMD can be framed as an Earth Mover's Distance (EMD) for which the algorithmic complexity can be reduced to $O(V^2)$ by adding an entropy penalty to the optimization problem and solving it using the SinkhornKnopp algorithm. Additionally, the computation can be made highly parallel by computing the WMD of a single query document against multiple target documents at once, for example by finding whether a given tweet is similar to any other tweets of a given day. In this paper, we first present a sharedmemory parallel SinkhornKnopp algorithm to compute the WMD of one document against many other documents by adopting the $ O(V^2)$ EMD algorithm. We then algorithmically transform the original $O(V^2)$ dense computeheavy version into an equivalent sparse one which is mapped onto the new Intel Programmable Integrated Unified Memory Architecture (PIUMA) system. The WMD parallel implementation achieves 67x speedup on 96 cores across 4 NUMA sockets of an Intel Cascade Lake system. We also show that PIUMA cores are around 1.22.6x faster than Xeon cores on SinkhornWMD and also provide better strong scaling.
 Publication:

arXiv eprints
 Pub Date:
 July 2021
 arXiv:
 arXiv:2107.06433
 Bibcode:
 2021arXiv210706433J
 Keywords:

 Computer Science  Distributed;
 Parallel;
 and Cluster Computing;
 Computer Science  Hardware Architecture;
 Computer Science  Machine Learning;
 Computer Science  Performance
 EPrint:
 11 Pages. arXiv admin note: substantial text overlap with arXiv:2005.06727