RGWAS.twostep: Perform normal GWAS (genome-wide association studies) first,...

View source: R/RGWAS.twostep.R

RGWAS.twostepR Documentation

Perform normal GWAS (genome-wide association studies) first, then perform SNP-set GWAS for relatively significant markers

Description

Perform normal GWAS (genome-wide association studies) first, then perform SNP-set GWAS for relatively significant markers

Usage

RGWAS.twostep(
  pheno,
  geno,
  ZETA = NULL,
  package.MM = "gaston",
  covariate = NULL,
  covariate.factor = NULL,
  structure.matrix = NULL,
  n.PC = 0,
  min.MAF = 0.02,
  n.core = 1,
  parallel.method = "mclapply",
  check.size = 40,
  check.gene.size = 4,
  kernel.percent = 0.1,
  GWAS.res.first = NULL,
  P3D = TRUE,
  test.method.1 = "normal",
  test.method.2 = "LR",
  kernel.method = "linear",
  kernel.h = "tuned",
  haplotype = TRUE,
  num.hap = NULL,
  test.effect.1 = "additive",
  test.effect.2 = "additive",
  window.size.half = 5,
  window.slide = 1,
  chi0.mixture = 0.5,
  optimizer = "nlminb",
  gene.set = NULL,
  map.gene.set = NULL,
  weighting.center = TRUE,
  weighting.other = NULL,
  sig.level = 0.05,
  method.thres = "BH",
  plot.qq.1 = TRUE,
  plot.Manhattan.1 = TRUE,
  plot.qq.2 = TRUE,
  plot.Manhattan.2 = TRUE,
  plot.method = 1,
  plot.col1 = c("dark blue", "cornflowerblue"),
  plot.col2 = 1,
  plot.col3 = c("red3", "orange3"),
  plot.type = "p",
  plot.pch = 16,
  saveName = NULL,
  main.qq.1 = NULL,
  main.man.1 = NULL,
  main.qq.2 = NULL,
  main.man.2 = NULL,
  plot.add.last = FALSE,
  return.EMM.res = FALSE,
  thres = TRUE,
  skip.check = FALSE,
  verbose = TRUE,
  verbose2 = FALSE,
  count = TRUE,
  time = TRUE
)

Arguments

pheno

Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.

geno

Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as -1, 0, 1 = aa, Aa, AA.

ZETA

A list of covariance (relationship) matrix (K: m \times m) and its design matrix (Z: n \times m) of random effects. Please set names of list "Z" and "K"! You can use more than one kernel matrix. For example,

ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))

Z.A, Z.D

Design matrix (n \times m) for the random effects. So, in many cases, you can use the identity matrix.

K.A, K.D

Different kernels which express some relationships between lines.

For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.

package.MM

The package name to be used when solving mixed-effects model. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'. See more details at EM3.general.

covariate

A n \times 1 vector or a n \times p _ 1 matrix. You can insert continuous values, such as other traits or genotype score for special markers. This argument is regarded as one of the fixed effects.

covariate.factor

A n \times p _ 2 dataframe. You should assign a factor vector for each column. Then RGWAS changes this argument into model matrix, and this model matrix will be included in the model as fixed effects.

structure.matrix

You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0.

n.PC

Number of principal components to include as fixed effects. Default is 0 (equals K model).

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.

parallel.method

Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach".

When 'parallel.method = "mclapply"', we utilize pbmclapply function in the 'pbmcapply' package with 'count = TRUE' and mclapply function in the 'parallel' package with 'count = FALSE'.

When 'parallel.method = "furrr"', we utilize future_map function in the 'furrr' package. With 'count = TRUE', we also utilize progressor function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/HenrikBengtsson/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'.

When 'parallel.method = "foreach"', we utilize foreach function in the 'foreach' package with the utilization of makeCluster function in 'parallel' package, and registerDoParallel function in 'doParallel' package. With 'count = TRUE', we also utilize setTxtProgressBar and txtProgressBar functions in the 'utils' package to show the progress bar.

We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.

check.size

This argument determines how many SNPs (around the SNP detected by normal GWAS) you will recalculate -log10(p).

check.gene.size

This argument determines how many genes (around the genes detected by normal GWAS) you will recalculate -log10(p). This argument is valid only when you assign "gene.set" argument.

kernel.percent

This argument determines how many SNPs are detected by normal GWAS. For example, when kernel.percent = 0.1, SNPs whose value of -log10(p) is in the top 0.1 percent are chosen as candidate for recalculation by SNP-set GWAS.

GWAS.res.first

If you have already performed normal GWAS and have the result, you can skip performing normal GWAS.

P3D

When P3D = TRUE, variance components are estimated by REML only once, without any markers in the model. When P3D = FALSE, variance components are estimated by REML for each marker separately.

test.method.1

RGWAS supports two methods to test effects of each SNP-set for 1st GWAS.

"normal"

Normal GWAS (default).

"score"

Score test, much faster than LR, but sometimes overestimate -log10(p).

test.method.2

RGWAS supports two methods to test effects of each SNP-set for 2nd GWAS.

"LR"

Likelihood-ratio test, relatively slow, but accurate (default).

"score"

Score test, much faster than LR, but sometimes overestimate -log10(p).

kernel.method

It determines how to calculate kernel. There are three methods.

"gaussian"

It is the default method. Gaussian kernel is calculated by distance matrix.

"exponential"

When this method is selected, exponential kernel is calculated by distance matrix.

"linear"

When this method is selected, linear kernel is calculated by NOIA methods for additive GRM.

So local genomic relation matrix is regarded as kernel.

kernel.h

The hyper parameter for gaussian or exponential kernel. If kernel.h = "tuned", this hyper parameter is calculated as the median of off-diagonals of distance matrix of genotype data.

haplotype

If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter.

num.hap

When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines.

test.effect.1

Effect of each marker to test for 1st GWAS. You can choose "test.effect" from "additive", "dominance" and "additive+dominance". you can assign only one test effect for the 1st GWAS!

test.effect.2

Effect of each marker to test for 2nd GWAS. You can choose "test.effect" from "additive", "dominance" and "additive+dominance". You also can choose more than one effect, for example, test.effect = c("additive", "aditive+dominance")

window.size.half

This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1.

window.slide

This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1.

chi0.mixture

RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

gene.set

If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.

map.gene.set

Genotype map for 'gene.set' (list of haplotype blocks). This is a data.frame with the haplotype block (SNP-set, or gene-set) names in the first column. The second and third columns contain the chromosome and map position for each block. The forth column contains the cumulative map position for each block, which can be computed by cumsumPos function. If this argument is NULL, the map will be constructed by genesetmap function after the SNP-set GWAS. It will take some time, so you can reduce the computational time by assigning this argument beforehand.

weighting.center

In kernel-based GWAS, weights according to the Gaussian distribution (centered on the tested SNP) are taken into account when calculating the kernel if Rainbow = TRUE. If weighting.center = FALSE, weights are not taken into account.

weighting.other

You can set other weights in addition to weighting.center. The length of this argument should be equal to the number of SNPs. For example, you can assign SNP effects from the information of gene annotation.

sig.level

Significance level for the threshold. The default is 0.05.

method.thres

Method for detemining threshold of significance. "BH" and "Bonferroni are offered.

plot.qq.1

If TRUE, draw qq plot for normal GWAS.

plot.Manhattan.1

If TRUE, draw manhattan plot for normal GWAS.

plot.qq.2

If TRUE, draw qq plot for SNP-set GWAS.

plot.Manhattan.2

If TRUE, draw manhattan plot for SNP-set GWAS.

plot.method

If this argument = 1, the default manhattan plot will be drawn. If this argument = 2, the manhattan plot with axis based on Position (bp) will be drawn. Also, this plot's color is changed by all chromosomes.

plot.col1

This argument determines the color of the manhattan plot. You should substitute this argument as color vector whose length is 2. plot.col1[1] for odd chromosomes and plot.col1[2] for even chromosomes

plot.col2

Color of the manhattan plot. color changes with chromosome and it starts from plot.col2 + 1 (so plot.col2 = 1 means color starts from red.)

plot.col3

Color of the points of manhattan plot which are added after the reestimation by SNP-set method. You should substitute this argument as color vector whose length is 2. plot.col3[1] for odd chromosomes and plot.col3[2] for even chromosomes.

plot.type

This argument determines the type of the manhattan plot. See the help page of "plot".

plot.pch

This argument determines the shape of the dot of the manhattan plot. See the help page of "plot".

saveName

When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.

main.qq.1

The title of qq plot for normal GWAS. If this argument is NULL, trait name is set as the title.

main.man.1

The title of manhattan plot for normal GWAS. If this argument is NULL, trait name is set as the title.

main.qq.2

The title of qq plot for SNP-set GWAS. If this argument is NULL, trait name is set as the title.

main.man.2

The title of manhattan plot for SNP-set GWAS. If this argument is NULL, trait name is set as the title.

plot.add.last

If saveName is not NULL and this argument is TRUE, then you can add lines or dots to manhattan plots. However, you should also write "dev.off()" after adding something.

return.EMM.res

When return.EMM.res = TRUE, the results of equation of mixed models are included in the result of RGWAS.

thres

If thres = TRUE, the threshold of the manhattan plot is included in the result of RGWAS. When return.EMM.res or thres is TRUE, the results will be "list" class.

skip.check

As default, RAINBOWR checks the type of input data and modifies it into the correct format. However, it will take some time, so if you prepare the correct format of input data, you can skip this procedure by setting 'skip.check = TRUE'.

verbose

If this argument is TRUE, messages for the current steps will be shown.

verbose2

If this argument is TRUE, welcome message will be shown.

count

When count is TRUE, you can know how far RGWAS has ended with percent display.

time

When time is TRUE, you can know how much time it took to perform RGWAS.

Value

$D

Dataframe which contains the information of the map you input and the results of RGWAS (-log10(p)) which correspond to the map. -log10(p) by normal GWAS and recalculated -log10(p) by SNP-set GWAS will be obtained. If there are more than one test.effects, then multiple lists for each test.effect are returned respectively.

$thres

A vector which contains the information of threshold determined by FDR = 0.05.

$EMM.res

This output is a list which contains the information about the results of "EMM" perfomed at first in normal GWAS. If you want to know details, see the description for the function "EMM1" or "EMM2".

References

Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- as.matrix(Rice_pheno[, trait.name, drop = FALSE])

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map


  ### Estimate genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)


  ### Modify data
  modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                                 return.ZETA = TRUE, return.GWAS.format = TRUE)
  pheno.GWAS <- modify.data.res$pheno.GWAS
  geno.GWAS <- modify.data.res$geno.GWAS
  ZETA <- modify.data.res$ZETA


  ### View each data for RAINBOWR
  See(pheno.GWAS)
  See(geno.GWAS)
  str(ZETA)


  ### Perform two step SNP-set GWAS (single-snp GWAS -> SNP-set GWAS for significant markers)
  twostep.SNP_set.res <- RGWAS.twostep(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA,
                                       kernel.percent = 0.2, n.PC = 4, test.method.2 = "LR",
                                       kernel.method = "linear", gene.set = NULL,
                                       test.effect.2 = "additive", window.size.half = 3,
                                       window.slide = 2, package.MM = "gaston",
                                       parallel.method = "mclapply",
                                       skip.check = TRUE, n.core = 2)

  See(twostep.SNP_set.res$D)
  ### Column 4 contains -log10(p) values for markers with the first method (single-SNP GWAS)
  ### Column 5 contains -log10(p) values for markers with the second method (SNP-set GWAS)


RAINBOWR documentation built on Sept. 12, 2023, 9:08 a.m.