fastCNVassoc: Fast association analysis between a CNV or imputed SNPs and...

Description Usage Arguments Value Note References See Also Examples

Description

This function performs an association analyses with several genetic variants with uncertainty (CNVs or imputed SNPs) and a response, maybe adjusting for covariates (e.g. clinical covariates, stratification, ...). It uses the Newthon-Raphson procedure with analytic likelihood derivatives and it has written in C language to speed up the process making feasible to analyse hundreds of thousands of variants, It also incorporates the possibility to use several cores to make calculations even faster.

Usage

1
2
3
fastCNVassoc(probs, formula, data, model = "additive",
        family = "binomial", nclass = 3, colskip = 5, tol = 1e-06, 
        max.iter = 30, verbose = FALSE, multicores=0)

Arguments

probs

either a matrix containing the probabilities of genetic variants in IMPUTE format (i.e. each row represents a variant and every 'nclass' columns an individual), or a file containing this data in white space separated columns. The first 'colskip' columns refers to the variant info (e.g. position, rs name, alleles, etc.).

formula

an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted. In the right side of ~ covariates must be included. If no covariates are present in the model just type 1.

data

an optional data frame, list or environment (or object coercible by 'as.data.frame' to a data frame) containing the variables in the model. If not found in 'data', the variables are taken from 'environment(formula)'.

model

Genetic model to be tested. Possible values are "multiplicative" (model free, e.g. co-dominant) or "additive", partial matching allowed. Only "additive" model is implemented.

family

a description of the error distribution and link function to be used in the model. This must be a character string naming a family function. Possible values are "binomial" and "weibull". Default value is "binomial"

nclass

integer specifying the number of possibles alleles or genotypes. Default value is 3 (tipycally for SNPs).

colskip

integer specifying the number of columns to be skipped in order to read the probabilities. This columns may contain the SNPs info (such as rs name, position, alleles and chromosome). Default value is 5 for IMPUTE format files.

tol

Tolerance for convergence in fitting model. Default value is 1e-06.

max.iter

Maximum number of iterations in fitting model. Default value is 30.

verbose

logical. If TRUE the number of current analysed variants is shown in the console. Default value is FALSE

multicores

integer indicating the number of cores to be used. It uses 'parallel'. Default value is 0 indicating that only one core is used and 'parallel' package is not required. For Windows OS, 'multicores'>1 is not supported.

Value

A data.frame with the following variables:

- variant: consecutive integer from one to the number of analyzed variants (CNV or imputed SNPs) in the same order as in the probs matrix. - beta coefficient: log-Odds Ratio for binary response or log-Hazard Ratio for time-to-event response. - se: standard error of beta coefficient. - zscore: ratio between beta and se - pvalue: p-value of association between each genetic variant and response, maybe adjusted by covariates. - iter: number of iterations necessary achieve convergence during the Newton-Raphson algorithm used to fit the model.

See examples for further illustration about all previous issues.

Note

The order of individuals from probabilities ('probs' argument) matrix must be the same as in the response and covariates variables.

The 'subset' and 'na.action' is not implemented. Therefore, no missings are allowed in probabilities, response or covariates.

It is important be aware whether the number of iterations has achieved the maximum (30 by default). In this case, the results may be not reliable.

The probability matrix ('probs') must have 'nclass' * N + 'colskip' columns, where N is the number of individuals.

The file can be in 'fst' format (see fst package)

References

Subirana I, Gonzàlez JR. Genetic Association Analysis and Meta-Analysis of Imputed SNPs in Longitudinal Studies. Genet Epidemiol, 2013 Jul;37(5):465-77.
Gonzalez JR, Subirana I, Escaramis G, Peraza S, Caceres A, Estivill X and Armengol L. Accounting for uncertainty when assessing association between copy number and disease: a latent class model. BMC Bioinformatics, 2009;10:172.

See Also

CNVassoc, multiCNVassoc, CNVtest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
require(parallel)
# read imputed SNP probabilities from a file-.
# Example from SNPTEST software of 500 cases and 500 controls on 200 imputed 
# SNPS.
fileprobs <- system.file("exdata/SNPTEST.probs",package="CNVassoc")
# build response (500 controls and 500 cases).
resp<-rep(0:1,each=500)
# generate two covariates randombly
N<-1000
# contiuous covariate
covar1<-rnorm(N)  
# categorical covariate
covar2<-factor(sample(1:3,N,replace=TRUE),labels=c("A","B","C")) 
# Under Windows OS, multicore must be <=1.
system.time(
res<-fastCNVassoc(fileprobs,resp~covar1+covar2,family="binomial",multicore=0)
)
res
# build a time-to-event response randomly
set.seed(123456)
times <- rexp(N,1)
cens <- rbinom(N,1,0.8)
system.time(
res <- fastCNVassoc(fileprobs, Surv(times, cens) ~ covar1 + covar2,
                        family = "weibull", multicore=0)
)
res

isglobal-brge/CNVassoc documentation built on May 30, 2019, 9:48 p.m.