| memshare-package | R Documentation |
This project extends 'R' with a mechanism for efficient parallel data access by utilizing 'C++' shared memory. Large data objects can be accessed and manipulated directly from 'R' without redundant copying, providing both speed and memory efficiency.
The DESCRIPTION file:
This package was not yet installed at build time.
Index: This package was not yet installed at build time.
If the user detaches the package, all handels are destroyed, meaning that all vairables of all namespaces are cleared as long as there is no other r thread still using the variables.
The two basic definitions are:
1. “Pages” are variables owned by the current compilation unit of the code (e.g., 'R' session or terminal that loaded the DLL). The pages are coded in Windows via 'MapViewOfFile' and on Unix via 'shm'+'mmap'.
2. “Views” are references to variables owned by another (or their own) compilation unit. The views are always 'ALTREP' wrappers for the pointers to the shared memory chunk.
3. "namespace" are character of length 1 called here strings, that define the identifier of the shared memory context allowing the initialize shared variables.
Safety
R itself is designed around a single-threaded C API, which means that internal R functions and memory management cannot be called safely from multiple threads at the same time. Each worker process created by parallel runs its own independent R interpreter, so ordinary R code is safe as long as all R-level operations happen inside one worker at a time.
However, shared-memory buffers created by memshare are visible to multiple
R sessions simultaneously. These buffers are intended to be read-shared:
many workers can read the same matrix or vector concurrently without conflict.
If you ever modify a shared object in place (e.g., X[1,1] <- 0), that write
immediately affects the shared buffer seen by other workers and can cause
data races if they also access the same region. To avoid this, treat shared
variables as read-only, or implement explicit synchronization mechanisms such as
interprocess locks or task partitioning that guarantees non-overlapping writes.
Inside compiled code (e.g., via Rcpp, OpenMP, or TBB), threads may perform numerical computations on raw pointers to shared data, but must never call back into R (e.g., create R objects, print, evaluate R expressions, etc.) from those secondary threads. All communication with R must occur in the main thread of each worker process.
Resource lifecycle
Shared memory behaves differently from ordinary R objects because it exists outside the R garbage collector. Therefore, memshare provides explicit functions to manage it.
Each time you call retrieveViews, the package creates one or more
"handles" that link your R objects to the shared-memory segments. When you are
done using these views, always call releaseViews to remove those
handles. As long as any active view exists in any R session, the corresponding
shared memory remains allocated.
Memory can only be reclaimed once all views have been released.
The call to releaseVariables from the master session removes ownership
of the pages, but they are physically unmapped only when no process holds a view.
Detaching or unloading the memshare package automatically drops all handles,
but if another R session still has an open view, that memory remains in use until
it is released there as well. This design prevents dangling pointers and ensures
that shared data are not invalidated while still in use by another worker.
OS notes
Windows. Uses Win32 file mappings (CreateFileMappingA(), MapViewOfFile()). Namespaces are automatically prefixed with "Local\\" to scope mappings per user session. Mapping sizes use 64-bit high/low DWORDs. Views are opened read-only by default (FILE_MAP_READ). Cleanup unmaps views (UnmapViewOfFile()) and then closes handles (CloseHandle()). Requires homogeneous architecture (e.g., all 64-bit R sessions). Antivirus/EDR tools may slow mappings.
Linux. Uses POSIX shared memory (shm_open() + mmap()). Owners create with O_CREAT|O_EXCL|O_RDWR and PROT_READ|PROT_WRITE; views attach with O_RDONLY and PROT_READ. Owners call shm_unlink() only after all views are released. Shared memory size may be limited by /dev/shm or system SHM limits (e.g., shmmax).
macOS. Also uses POSIX shared memory (shm_open() + mmap()) with the same owner/view flags as Linux.
Unlike Linux, macOS implements POSIX shared memory via files in ‘/var/run/shm’ or ‘/private/var/shm’, which may
not exist by default and can require manual creation or permission adjustment. POSIX shm names must be short; a practical
upper bound is about 32 characters including the leading slash. Exceeding this limit raises a clear error. Shared memory
segments are visible only to processes of the same user unless permissions are relaxed. The maximum segment size could be smaller than on Linux. macOS does not automatically remove orphaned
segments if a process crashes; these must be manually cleaned using ipcs -m / ipcrm or by restarting the
system. If your R session or script crashes before releaseVariables() or releaseViews() runs, those shared memory segments may remain allocated on disk.
Julian Maerte [aut, ctr] (ORCID: <https://orcid.org/0000-0001-5451-1023>), Romain Francois [ctb], Michael Thrun [aut, ths, rev, cph, cre] (ORCID: <https://orcid.org/0000-0001-9542-5543>)
Maintainer: Michael Thrun <m.thrun@gmx.net>
x = rnorm(100)
y = runif(100)
Mat = cbind(x,x,x)
res = memApply(X = Mat, MARGIN = 2,
FUN = function(x,y) {
cc = memshare::mutualinfo(x,y,isYDiscrete = TRUE,
na.rm = TRUE,useMPMI = FALSE)
return(cc)
},VARS = list(y=y),MAX.CORES=1, #for testing purposes only single thread
NAMESPACE = "namespaceID")
unlist(res)
## Not run:
#usually MAX.CORES>1 for application
#alternative usage with manual memory allocation:
## End(Not run)
Data = cbind(x, x, x)
namespace = "ns_package"
memshare::registerVariables(namespace, list(Data = Data, y = y))
res2 = memshare::memApply(
X = "Data",
MARGIN = 2,
FUN = function(x, y) {
cc = memshare::mutualinfo(x,
y,
isYDiscrete = TRUE,
na.rm = TRUE,
useMPMI = FALSE)
return(cc)
},
VARS = c("y"),
MAX.CORES = 1,
#for testing purposes only single thread
NAMESPACE = namespace
)
unlist(res2)
memshare::releaseVariables(namespace, c("Data", "y"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.