AclustsCCA: Implement AclustsCCA

AclustsCCAR Documentation

Implement AclustsCCA

Description

Implement an iterative penalized least squares approach to sparse canonical correlation analysis (SparseCCA) with various penalty functions.

Usage

AclustsCCA(
  clusters.list = NULL,
  X,
  Y,
  Z = NULL,
  X.resid = NULL,
  Y.resid = NULL,
  annot = NULL,
  dist.type = "spearman",
  Aclust.method = "average",
  dist.thresh = 0.2,
  bp.thresh.clust = 1000,
  bp.merge = 999,
  Xmethod = "lasso",
  Ymethod = "OLS",
  standardize = T,
  X.groupidx = NULL,
  init.method = "SVD",
  max.iter = 100,
  conv = 10^-2,
  maxnum = NULL,
  maxB = 10000,
  FDR.thresh = 0.05,
  h = hBH,
  permute = T,
  nthread = 2,
  test.stat = "cancors"
)

Arguments

clusters.list

A list of clusters with CpG sites obtained using A-clustering, each item is a cluster that contains a set of probes. A-clustering is implemented if NULL or can be provided by users.

X

n by p exposure data matrix, where n is sample size and p is number of exposures.

Y

n by q outcome data matrix, where n is sample size and q is number of outcomes.

Z

n by r confounder data matrix, where n is sample size and r is number of confounders. If NULL, partial residuals are used for SparseCCA analysis.

annot

A preloaded annotation file that includes columns "IlmnID", "Coordinate_37", "Islands_Name", "Relation_to_Island", "UCSC_RefGene_Name". Only needed if clusters.list is NULL.

dist.type

A type of similarity distance function. Options are "spearman" (default), "pearson" (correlation measures) or "euclid".

Aclust.method

A type of clustering function. Options are "single", "complete" or "average" (default).

Xmethod

A penalty function for the exposure, i.e. penalty function when regressing Y onto X. Options are "lasso", "alasso","gglasso", and "SGL" (default).

Ymethod

A penalty function for the outcome, i.e. penalty function when regressing X onto Y. Options are "lasso", "alasso","gglasso", "SGL", and "OLS" (default).

standardize

A logical flag for exposure X and outcome Y standardization, prior to fitting the model.

X.groupidx

A vector of length p that indicates grouping structure of exposure X.

init.method

Initialization method. Options are "lasso", "OLS", and "SVD" (default).

max.iter

A maximum number of iterations of SparseCCA. The default is 100.

conv

A tolerance value for convergence epsilon of SparseCCA. The default is 10e-2.

maxnum

A maximal total number of permutations across all the clusters.

maxB

A maximal number of permutations for a single cluster.

FDR.thresh

False discovery rate (FDR) threshold. The default is 0.05.

permute

A logical flag for whether to run permutation test or not.

nthread

A number of threads to parallelize permutation test and implementation of SparseCCA across all the clusters.

test.stat

A test statistic for permutation test. Options are canonical correlations ("cancors") or tail probability ("tailprob").

thresh.dist

A similarity distance threshold. Two neighboring clusters are merged to a single cluster if the similarity distance between them is above dist.thresh. The default is 0.2

max.dist

Optional maximum length between neighboring variables permitting to cluster them together. The default is 1000.

bp.thresh.dist

A distance in chromosomal location. Any set of methylation sites within an interval smaller or equal to bp.dist will be potentially merged, depending on the similarity between sites at the ends of the interval. The default is 999.

permute.tmp.filepath

A file path to save intermittent permutation results.

Value

The function returns a list of 6 objects according to the following order:

  • clusters.list : A list of clusters with CpG sites obtained using A-clustering, each item is a cluster that contains a set of probes. If A-clustering is not implemented inside AclustsCCA, return NA.

  • ALPHA.observed : A list of estimated canonical vector of length p corresponding to the exposure data X for each cluster.

  • BETA.observed : A list of estimated canonical vector of length q corresponding to the outcome data Y for each cluster.

  • cancors.observed : A vector of estimated canonical correlation for each cluster.

  • permutation.result : A mmctest object that contains permutation results.

  • settings : A settings used for the analysis.


jennyjyounglee/AclustsCCA documentation built on June 15, 2022, 7:45 p.m.