View source: R/runSingleTraitGwas.R
runSingleTraitGwas  R Documentation 
runSingleTraitGwas
performs a singletrait Genome Wide Association
Study (GWAS) on phenotypic and genotypic data contained in a gData
object. A covariance matrix is computed using the EMMA algorithm (Kang et
al., 2008) or the NewtonRaphson algorithm (Tunnicliffe, 1989) in the
sommer
package. Then a Generalized Least Squares (GLS) method is used
for estimating the marker effects and corresponding pvalues. This is done
using either one kinship matrix for all chromosomes or different
chromosomespecific kinship matrices for each chromosome. Significant SNPs
are selected based on a user defined threshold.
runSingleTraitGwas( gData, traits = NULL, trials = NULL, covar = NULL, snpCov = NULL, kin = NULL, kinshipMethod = c("astle", "IBS", "vanRaden"), remlAlgo = c("EMMA", "NR"), GLSMethod = c("single", "multi"), useMAF = TRUE, MAF = 0.01, MAC = 10, genomicControl = FALSE, thrType = c("bonf", "fixed", "small", "fdr"), alpha = 0.05, LODThr = 4, nSnpLOD = 10, pThr = 0.05, rho = 0.5, sizeInclRegion = 0, minR2 = 0.5, nCores = NULL )
gData 
An object of class 
traits 
A vector of traits on which to run GWAS. These can be either
numeric indices or character names of columns in 
trials 
A vector of trials on which to run GWAS. These can
be either numeric indices or character names of list items in 
covar 
An optional vector of covariates taken into account when
running GWAS. These can be either numeric indices or character names of
columns in 
snpCov 
An optional character vector of snps to be included as covariates. 
kin 
An optional kinship matrix or list of kinship matrices. These
matrices can be from the 
kinshipMethod 
An optional character indicating the method used for
calculating the kinship matrix(ces). Currently "astle" (Astle and Balding,
2009), "IBS" and "vanRaden" (VanRaden, 2008) are supported. If a
kinship matrix is supplied either in 
remlAlgo 
A character string indicating the algorithm used to estimate
the variance components. Either 
GLSMethod 
A character string indicating the method used to estimate
the marker effects. Either 
useMAF 
Should the minor allele frequency be used for selecting SNPs
for the analysis. If 
MAF 
The minor allele frequency (MAF) threshold used in GWAS. A
numerical value between 0 and 1. SNPs with MAF below this value are not taken
into account in the analysis, i.e. pvalues and effect sizes are put to
missing ( 
MAC 
A numerical value. SNPs with minor allele count below this value
are not taken into account for the analysis, i.e. pvalues and effect sizes
are set to missing ( 
genomicControl 
Should genomic control correction as in Devlin and Roeder (1999) be applied? 
thrType 
A character string indicating the type of threshold used for the selection of candidate loci. See Details. 
alpha 
A numerical value used for calculating the LODthreshold for

LODThr 
A numerical value used as a LODthreshold when

nSnpLOD 
A numerical value indicating the number of SNPs with the
smallest pvalues that are selected when 
pThr 
A numerical value just as the cut off value for pValues for

rho 
A numerical value used a the minimum value for SNPs to be
considered correlated when using 
sizeInclRegion 
An integer. Should the results for SNPs close to significant SNPs be included? If so, the size of the region in centimorgan or base pairs. Otherwise 0. 
minR2 
A numerical value between 0 and 1. Restricts the SNPs included
in the region close to significant SNPs to only those SNPs that are in
sufficient Linkage Disequilibrium (LD) with the significant snp, where LD
is measured in terms of R^2. If for example 
nCores 
A numerical value indicating the number of cores to be used by
the parallel part of the algorithm. If 
An object of class GWAS
.
For the selection of candidate loci from the GWAS output four different
methods can be used. The method used can be specified in the function
parameter thrType
. Further parameters can be used to fine tune the
method.
The Bonferroni threshold, a LODthreshold of
log10(alpha / m), where m is the number of SNPs and alpha can be
specified by the function parameter alpha
A fixed LODthreshold, specified by the function parameter
LODThr
The n SNPs with with the smallest pValues are selected. n can
be specified in nSnpLOD
Following the algorithm proposed by Brzyski D. et al. (2017),
SNPs are selected in such a way that the False Discovery Rate (fdr) is
minimized. To do this, first the SNPs are restricted to the SNPs with a
pValue below pThr
. Then clusters of SNPs are created using a
two step iterative process in which first the SNP with the lowest pValue is
selected as cluster representative. This SNP and all SNPs that have a
correlation with this SNP of ρ or higher will form a cluster.
ρ can be specified as an argument in the function and has a default
value of 0.5, which is a recommended starting value in practice.
The selected SNPs are removed from the list and the procedure repeated until
no SNPs are left. Finally to determine the number of significant clusters
the first cluster is determined for which the pValue of the cluster
representative is not smaller than cluster_{number} * α / m where
m is the number of SNPs and alpha can be specified by the function parameter
alpha
. All previous clusters are selected as significant.
Astle, William, and David J. Balding. 2009. Population Structure and Cryptic Relatedness in Genetic Association Studies. Statistical Science 24 (4): 451–71. doi: 10.1214/09sts307.
Brzyski D. et al. (2017) Controlling the Rate of GWAS False Discoveries. Genetics 205 (1): 6175. doi: 10.1534/genetics.116.193987
Devlin, B., and Kathryn Roeder. 1999. Genomic Control for Association Studies. Biometrics 55 (4): 997–1004. doi: 10.1111/j.0006341x.1999.00997.x.
Kang et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics 178 (3): 1709–23. doi: 10.1534/genetics.107.080101.
Millet, E. J., Pommier, C., et al. (2019). A multisite experiment in a network of European fields for assessing the maize yield response to environmental scenarios  Data set. doi: 10.15454/IASSTN
Rincent et al. (2014) Recovering power in association mapping panels with variable levels of linkage disequilibrium. Genetics 197 (1): 375–87. doi: 10.1534/genetics.113.159731.
Segura et al. (2012) An efficient multilocus mixedmodel approach for genomewide association studies in structured populations. Nature Genetics 44 (7): 825–30. doi: 10.1038/ng.2314.
Sun et al. (2010) Variation explained in mixedmodel association mapping. Heredity 105 (4): 333–40. doi: 10.1038/hdy.2010.11.
Tunnicliffe W. (1989) On the use of marginal likelihood in time series model estimation. JRSS 51 (1): 15–27.
VanRaden P.M. (2008) Efficient methods to compute genomic predictions. Journal of Dairy Science 91 (11): 4414–23. doi: 10.3168/jds.20070980.
summary.GWAS
, plot.GWAS
## Create a gData object Using the data from the DROPS project. ## See the included vignette for a more extensive description on the steps. data(dropsMarkers) data(dropsMap) data(dropsPheno) ## Add genotypes as row names of dropsMarkers and drop Ind column. rownames(dropsMarkers) < dropsMarkers[["Ind"]] dropsMarkers < dropsMarkers[colnames(dropsMarkers) != "Ind"] ## Add genotypes as row names of dropsMap. rownames(dropsMap) < dropsMap[["SNP.names"]] ## Rename Chomosome and Position columns. colnames(dropsMap)[match(c("Chromosome", "Position"), colnames(dropsMap))] < c("chr", "pos") ## Convert phenotypic data to a list. dropsPhenoList < split(x = dropsPheno, f = dropsPheno[["Experiment"]]) ## Rename Variety_ID to genotype and select relevant columns. dropsPhenoList < lapply(X = dropsPhenoList, FUN = function(trial) { colnames(trial)[colnames(trial) == "Variety_ID"] < "genotype" trial < trial[c("genotype", "grain.yield", "grain.number", "seed.size", "anthesis", "silking", "plant.height", "tassel.height", "ear.height")] return(trial) }) ## Create gData object. gDataDrops < createGData(geno = dropsMarkers, map = dropsMap, pheno = dropsPhenoList) ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. GWASDrops < runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield") ## Run single trait GWAS for trait 'grain.yield' for trial Mur13W. ## Use chromosome specific kinship matrices calculated using vanRaden method. GWASDropsMult < runSingleTraitGwas(gData = gDataDrops, trials = "Mur13W", traits = "grain.yield", kinshipMethod = "vanRaden", GLSMethod = "multi")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.