stringdist-parallelization: Multithreading and parallelization in 'stringdist'

Description Multithreading and parallelization in stringdist See Also


This page describes how stringdist uses parallel processing.

Multithreading and parallelization in stringdist

The core functions of stringdist are implemented in C. On systems where openMP is available, stringdist will automatically take advantage of multiple cores. The section on OpenMP of the Writing R Extensions manual discusses on what systems OpenMP is available (at the time of writing more or less anywhere except on OSX).

By default, the number of threads to use is taken from options('sd_num_thread'). When the package is loaded, the value for this option is determined as follows:

The latter step makes sure that on machines with n>3 cores, n-1 cores are used. Some benchmarking showed that using all cores is often slower in such cases. This is probably because at least one of the threads will be shared with the operating system.

Functions that use multithreading have an option named nthread that controls the maximum number of threads to use. If you need to do large calculations, it is probably a good idea to benchmark the performance on your machine(s) as a function of 'nthread', for example using the microbenchmark package of Mersmann.

See Also

stringdist documentation built on Sept. 9, 2021, 5:08 p.m.