Bootstrap: Bootstrapping of Distance Matrix

View source: R/utils.R

BootstrapR Documentation

Bootstrapping of Distance Matrix

Description

Bootstrapping of Distance Matrix

Usage

Bootstrap(
  data,
  dist_mat_null,
  k = 10,
  kernel = c("gaussian", "euclidean"),
  normalization = c("cosine", "lognorm", "none"),
  normalize_factor = 10000,
  pca_dims = 0,
  norm_type = c("l1", "l2"),
  n_iters = 100,
  ratio = 0.05,
  t = 0,
  calc_perturb_mat = FALSE,
  n_cores = NULL,
  zero_percent = 0.7,
  ...
)

Arguments

data

An M x d matrix or data.frame with M rows of data points and d columns of features.

dist_mat_null

An M x M distance matrix calculated from the original data (null).

k

Number of nearest neighbors. Default is 10. See details from nn2.

kernel

Kernel distance used:

  • gaussian, gaussian distance kernel. See details EuclideanDist.

  • euclidean, euclidean distance kernel. See details GaussianDist.

normalization

Normalization method used: #'

  • cosine, cosine normalization. See details Normalization.

  • lognorm, log normalization. See details Normalization.

  • none, normalization is not performed.

normalize_factor

Normalize factor used in log normalization. Default is 10000. See details Normalization.

pca_dims

Number of dimensions used. Default is 0 and PCA is not performed.

norm_type

Type of norm used:

  • l1, L1-like norm. See details L1Norm.

  • l2, L1-like norm. See details L2Norm.

n_iters

Number of bootstrapping iterations. Default is 100.

ratio

Fraction of features to be downsampled in the original data matrix. Default is 0.05 aka 5%.

t

Matrix power used for the distance matrix. Default is 0 and powering is not performed. See MatrixPower for details.

calc_perturb_mat

Whether to calculate the perturb matrix. Default is FALSE.

n_cores

Number of cores used. Default is to use all existing cores. See details makeCluster.

zero_percent

Zero-entry percentage threshold. If the number of zeros in the returned matrices is above this number, a sparse matrix will be returned. Default is 0.7 aka 70%.

...

Additional parameters pass to makeCluster.

Value

Returns a list with entries:

  • feature_weight, n x d binary matrix with n rows of bootstrap iterations and d columns of features where 0 means feature not sampled and 1 means sampled.

  • sample_weight, n x M matrix with n rows of bootstrap iterations and M columns of data points where each entry represents weight.

  • perturb_mat, d x M matrix with d rows of features and M columns of data points where each entry represents the relative importance of a feature to a data point.

  • dist_mat, M x M distance matrix.


stevexniu/FuseNet documentation built on May 16, 2022, 12:23 p.m.