dcsis: Performs distance correlation sure independence screening...

Description Usage Arguments Value References Examples

View source: R/dcsis.R

Description

Performs distance correlation sure independence screening \insertCiteli2012featuredcortools with some additional options (such as calculating corresponding tests).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
dcsis(
  X,
  Y,
  k = floor(nrow(X)/log(nrow(X))),
  threshold = NULL,
  calc.cor = "spearman",
  calc.pvalue.cor = FALSE,
  return.data = FALSE,
  test = "none",
  adjustp = "none",
  b = 499,
  bias.corr = FALSE,
  use = "all",
  algorithm = "auto"
)

Arguments

X

A dataframe or matrix.

Y

A vector-valued response having the same length as the number of rows of X.

k

Number of variables that are selected (only used when threshold is not provided).

threshold

If provided, variables with a distance correlation larger than threshold are selected.

calc.cor

If set as "pearson", "spearman" or "kendall", a corresponding correlation matrix is addionally calculated.

calc.pvalue.cor

logical; IF TRUE, a p-value based on the Pearson or Spearman correlation matrix is calculated (not implemented for calc.cor ="kendall") using Hmisc::rcorr.

return.data

logical; speciefies if the dcmatrix object should contain the original data.

test

Allows for additionally calculating a test based on distance Covariance. Specifies the type of test that is performed, "permutation" performs a Monte Carlo Permutation test. "gamma" performs a test based on a gamma approximation of the test statistic under the null. "conservative" performs a conservative two-moment approximation. "bb3" performs a quite precise three-moment approximation and is recommended when computation time is not an issue.

adjustp

If setting this parameter to "holm", "hochberg", "hommel", "bonferroni", "BH", "BY" or "fdr", corresponding adjusted p-values are additionally returned for the distance covariance test.

b

specifies the number of random permutations used for the permutation test. Ignored for all other tests.

bias.corr

logical; specifies if the bias corrected version of the sample distance covariance \insertCitehuo2016fastdcortools should be calculated.

use

: "all" uses all observations, "complete.obs" excludes NA's, "pairwise.complete.obs" uses pairwise complete observations for each comparison.

affine

logical; indicates if the affinely transformed distance covariance should be calculated or not.

algorithm:

specifies the algorithm used for calculating the distance covariance.

"fast" uses an O(n log n) algorithm if the observations are one-dimensional and metr.X and metr.Y are either "euclidean" or "discrete", see also \insertCitehuo2016fast;textualdcortools.

"memsave" uses a memory saving version of the standard algorithm with computational complexity O(n^2) but requiring only O(n) memory.

"standard" uses the classical algorithm. User-specified metrics always use the classical algorithm.

"auto" chooses the best algorithm for the specific setting using a rule of thumb.

"memsave" is typically very inefficient for dcsis and should only be applied in exceptional cases.

Value

dcmatrix object with the following two additional slots:

name selected

description indices of selected variables.

name dcor.selected

ddistance correlation of the selected variables and the response Y.

References

\insertRef

berschneider2018complexdcortools \insertRefdueck2014affinelydcortools

\insertRef

huang2017statisticallydcortools

\insertRef

huo2016fastdcortools

\insertRef

li2012featuredcortools

\insertRef

szekely2007dcortools

\insertRef

szekely2009browniandcortools

Examples

1
2
3
X <- matrix(rnorm(1e5), ncol=1000)
Y <- sapply(1:100, function(u) sum(X[u,1:50]))+rnorm(100)
a <- dcsis(X,Y)

edelmand21/dcortools documentation built on Nov. 18, 2020, 12:28 p.m.