bdm.pakde: Perplexity-adaptive kernel density estimation
In jgarriga65/bigMap: Big Data Mapping

bdm.pakde

R Documentation

Perplexity-adaptive kernel density estimation

Description

Starts the paKDE algorithm (second step of the mapping protocol).

Usage

bdm.pakde(
  bdm,
  ppx = 100,
  g = 200,
  g.exp = 3,
  mpi.cl = NULL,
  threads = 2,
  layer = 1
)

Arguments

`bdm`	A `bdm` data mapping instance.
`ppx`	The value of perplexity to compute similarities in the low-dimensional embedding. Default value is `ppx = 100`.
`g`	The resolution of the density space grid (`g*g` cells). Default value is `g = 200`.
`g.exp`	A numeric factor to avoid border effects. The grid limits will be expanded so as to enclose the density of the kernel of the most extreme embedded datapoints up to `g.exp` times `\sigma`. Default value is `g.exp = 3`, `i.e.` the grid limits are expanded so as to enclose the 0.9986 of the probability mass of the most extreme kernels.
`mpi.cl`	An MPI (inter-node parallelization) cluster as returned by `bdm.mpi.start()`. Default value is `mpi.cl = NULL`, i.e. a 'SOCK' (intra-node parallelization) cluster is automatically generated.
`threads`	Number of parallel threads (according to data size and hardware resources, i.e. number of cores and available memory). Default value is `threads = 4`.
`layer`	The ptSNE output layer. Default value is `layer = 1`.

Details

When computing the paKDE the embedding area is discretized as a grid of size g*g cells. In order to avoid border effects, the limits of the grid are expanded by default so as to enclose at least the 0.9986 of the cumulative distribution function (3 \sigma) of the kernels of the most extreme mapped points in each direction.

The presence of outliers in the embedding can lead to undesired expansion of the grid limits. We can overcome this using lower values of g.exp. By setting g.exp = 0 the grid limits will be equal to the range of the embedding.

The values g.exp = c(1, 2, 3, 4, 5, 6) enclose cdf values of 0.8413, 0.9772, 0.9986, 0.99996, 0.99999, 1.0 respectively.

Value

A copy of the input bdm instance with new element bdm$pakde (paKDE output). bdm$pakde[[layer]]$layer = 'NC' stands for not computed layers.

Examples


# --- load mapped dataset
bdm.example()
# --- run paKDE
## Not run: 
m <- bdm.pakde(ex$map, ppx = 200, g = 200, g.exp = 3, threads = 4)
# --- plot paKDE output
bdm.pakde.plot(m)

## End(Not run)

jgarriga65/bigMap documentation built on June 10, 2024, 7:05 a.m.