ds.fastGWAS: Fast genome-wide association analysis (GWAS)

View source: R/ds.fastGWAS.R

ds.fastGWASR Documentation

Fast genome-wide association analysis (GWAS)

Description

Performs a distributed GWAS using the methodology proposed on the paper https://doi.org/10.1186/1471-2105-14-166. This implementation has been adapted to DataSHIELD as it can rely on a model 0 provided by the ds.glm function and the other steps require mostly performing colSums, which are non-disclosive. However, the results are not 100 differences between cohorts, as the method does not rely on a meta-analysis, the obtained results of this methodology are the same when used on a distributed DataSHIELD environment than they would be having all the data gathered at the same computer.

Usage

ds.fastGWAS(
  genoData,
  formula,
  family,
  do.par = FALSE,
  n.cores = NULL,
  snpBlock = 20000L,
  resample = 1L,
  datasources = NULL
)

Arguments

genoData

character vector of objects on the server side object which is a container for storing genotype data from a GWAS toghether with the metadata associated with the subjects (i.e. phenotypes and/or covariates) and SNPs

formula

character or fomula formula indicating the condition (left side) and other covariates to be adjusted for (i.e. condition ~ covar1 + ... + covar2). The fitted model is: snp ~ condition + covar1 + ... + covarN

family

character A description of the generalized linear model used. "binomial" is defined for case/control studies. Quantitative traits can be analyzed by using "gaussian"

do.par

bool (default FALSE) Whether to use parallelization on the servers, to do so the servers have to have the package doParallel installed and run on a POSIX OS (Mac, Linux, Unix, BSD); Windows is not supported. This parallelization computes in parallel each genoData object, therefore it is only useful when the genoData is divided by chromosome.

n.cores

numeric (default NULL) Numbers of cores to use when do.par is TRUE. If NULL the number of cores used will be the maximum available minus one.

snpBlock

numeric (default 20000L) Block size for dividing the genotype data, it equals to the number of SNPs used on each iteration, depending on the servers RAM it may perform faster using lower or greater block sizes, do some testing to assess it.

datasources

a list of DSConnection-class objects obtained after login. If the <datasources> the default set of connections will be used: see datashield.connections_default.

Value

data.frame With the results of the GWAS


isglobal-brge/dsOmicsClient documentation built on March 20, 2023, 3:52 p.m.