Description Usage Arguments Details Value References Examples
Test for association between a trait and principal components of genotypes within a region, on summary statistics
1 2 3 4 |
formula |
referring to the column(s) in |
phenodata |
a data frame containing columns mentioned in |
genodata |
an object with genotypes to analyze. Several formats are allowed: |
kin |
a square symmetric matrix giving the pairwise kinship coefficients between analyzed
individuals. Under default |
nullmod |
an object containing parameter estimates under the null model. Setting |
regions |
an object assigning regions to be analyzed. This can be: |
sliding.window |
the sliding window size and step. Has no effect if |
mode |
the mode of inheritance: "add", "dom" or "rec" for additive, dominant or recessive mode, respectively. For dominant (recessive) mode genotypes will be recoded as AA = 0, Aa = 1 and aa = 1 (AA = 0, Aa = 0 and aa = 1), where a is a minor allele. Default mode is additive. |
ncores |
number of CPUs for parallel calculations. Default = 1. |
return.time |
a logical value indicating whether the running time should be returned. |
beta.par |
two positive numeric shape parameters in the beta distribution to assign weights for each genetic variant as a function of MAF in the default weights function (see Details). Default = c(1, 1). |
weights |
a numeric vector or a function of minor allele frequency (MAF) to assign weights for each genetic variant in the weighted kernels. Has no effect if one of unweighted kernels was chosen. If NULL, the weights will be calculated using the beta distribution (see Details). |
var.fraction |
minimal proportion of genetic variance within region that should be explained by principal components used (see Details for more info). |
impute.method |
a method for imputation of missing genotypes. It can be either "mean" (default) or "blue". If "mean" the genotypes will be imputed by the simple mean values. If "blue" the best linear unbiased estimates (BLUEs) of mean genotypes will be calculated taking into account the relationships between individuals [McPeek, et al., 2004, DOI: 10.1111/j.0006-341X.2004.00180.x] and used for imputation. |
write.file |
output file name. If specified, output (as it proceeds) will be written to the file. |
... |
other arguments that could be passed to |
PCA test is a useful tool to detect association between genetic variants of a region and a trait
when genetic variants are strongly correlated. PCA test is based on the spectral decomposition of
genotype matrix. The number of top principal components will be chosen
in such a way that >= var.fraction
of region variance can be explained by these PCs.
By default, var.fraction
= 0.85, i.e. >= 85% of region variance will be explained by PCs.
If var.fraction
= 1 then the results of PCA test and MLR-based test are identical.
beta.par = c(a, b)
can be used to set weights for genetic variants.
Given the shape parameters of the beta function, beta.par = c(a, b)
,
the weights are defined using probability density function of the beta distribution:
W_{i}=(B(a,b))^{^{-1}}MAF_{i}^{a-1}(1-MAF_{i})^{b-1} ,
where MAF_{i} is a minor allelic frequency for the i^{th} genetic variant in the region,
which is estimated from genotypes, and B(a,b) is the beta function. This way of defining weights
is the same as in original SKAT (see [Wu, et al., 2011] for details).
A list with values:
results |
a data frame containing P values, numbers of variants and filtered variants for each of analyzed regions. It also contains the number of the principal components used for each region and the proportion of genetic variance they make up. |
nullmod |
an object containing the estimates of the null model parameters: heritability (h2), total variance (total.var), estimates of fixed effects of covariates (alpha), the gradient (df), and the total log-likelihood (logLH). |
sample.size |
the sample size after omitting NAs. |
time |
If |
Jolliffe, I.T. A note on the use of principal components in regression. J R Stat Soc Ser C 31, 300-303 (1982).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | data(example.data)
## Run PCA with sliding window (default):
out <- PCA(trait ~ age + sex, phenodata, genodata, kin)
## Run PCA with regions defined in snpdata$gene and with
## null model parameters obtained in the first run:
out <- PCA(trait ~ age + sex, phenodata, genodata, kin,
out$nullmod, regions = snpdata$gene)
## Run PCA parallelized on two cores (this will require
## 'foreach' and 'doParallel' R-packages installed and
## cores available):
out <- PCA(trait ~ age + sex, phenodata, genodata, kin,
out$nullmod, ncores = 2)
## Run PCA with genotypes in VCF format:
VCFfileName <- system.file(
"testfiles/1000g.phase1.20110521.CFH.var.anno.vcf.gz",
package = "FREGAT")
geneFile <- system.file("testfiles/refFlat_hg19_6col.txt.gz",
package = "FREGAT")
phe <- data.frame(trait = rnorm(85))
out <- PCA(trait, phe, VCFfileName, geneFile = geneFile,
reg = "CFH", annoType = "Nonsynonymous")
## Run PCA with genotypes in PLINK binary data format:
bedFile <- system.file("testfiles/sample.bed",
package = "FREGAT")
phe <- data.frame(trait = rnorm(120))
out <- PCA(trait, phe, bedFile)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.