bdm.pakde: Perplexity-adaptive kernel density estimation

View source: R/bdm_main.R

bdm.pakdeR Documentation

Perplexity-adaptive kernel density estimation

Description

Starts the paKDE algorithm (second step of the mapping protocol).

Usage

bdm.pakde(
  bdm,
  ppx = 100,
  g = 200,
  g.exp = 3,
  mpi.cl = NULL,
  threads = 2,
  layer = 1
)

Arguments

bdm

A bdm data mapping instance.

ppx

The value of perplexity to compute similarities in the low-dimensional embedding. Default value is ppx = 100.

g

The resolution of the density space grid (g*g cells). Default value is g = 200.

g.exp

A numeric factor to avoid border effects. The grid limits will be expanded so as to enclose the density of the kernel of the most extreme embedded datapoints up to g.exp times \sigma. Default value is g.exp = 3, i.e. the grid limits are expanded so as to enclose the 0.9986 of the probability mass of the most extreme kernels.

mpi.cl

An MPI (inter-node parallelization) cluster as returned by bdm.mpi.start(). Default value is mpi.cl = NULL, i.e. a 'SOCK' (intra-node parallelization) cluster is automatically generated.

threads

Number of parallel threads (according to data size and hardware resources, i.e. number of cores and available memory). Default value is threads = 4.

layer

The ptSNE output layer. Default value is layer = 1.

Details

When computing the paKDE the embedding area is discretized as a grid of size g*g cells. In order to avoid border effects, the limits of the grid are expanded by default so as to enclose at least the 0.9986 of the cumulative distribution function (3 \sigma) of the kernels of the most extreme mapped points in each direction.

The presence of outliers in the embedding can lead to undesired expansion of the grid limits. We can overcome this using lower values of g.exp. By setting g.exp = 0 the grid limits will be equal to the range of the embedding.

The values g.exp = c(1, 2, 3, 4, 5, 6) enclose cdf values of 0.8413, 0.9772, 0.9986, 0.99996, 0.99999, 1.0 respectively.

Value

A copy of the input bdm instance with new element bdm$pakde (paKDE output). bdm$pakde[[layer]]$layer = 'NC' stands for not computed layers.

Examples


# --- load mapped dataset
bdm.example()
# --- run paKDE
## Not run: 
m <- bdm.pakde(ex$map, ppx = 200, g = 200, g.exp = 3, threads = 4)
# --- plot paKDE output
bdm.pakde.plot(m)

## End(Not run)

jgarriga65/bigMap documentation built on June 10, 2024, 7:05 a.m.