RGWAS.twostep.epi: Perform normal GWAS (genome-wide association studies) first,...

View source: R/RGWAS.twostep.epi.R

RGWAS.twostep.epiR Documentation

Perform normal GWAS (genome-wide association studies) first, then check epistatic effects for relatively significant markers

Description

Perform normal GWAS (genome-wide association studies) first, then check epistatic effects for relatively significant markers

Usage

RGWAS.twostep.epi(
  pheno,
  geno,
  ZETA = NULL,
  package.MM = "gaston",
  covariate = NULL,
  covariate.factor = NULL,
  structure.matrix = NULL,
  n.PC = 0,
  min.MAF = 0.02,
  n.core = 1,
  parallel.method = "mclapply",
  check.size.epi = 4,
  epistasis.percent = 0.05,
  check.epi.max = 200,
  your.check = NULL,
  GWAS.res.first = NULL,
  P3D = TRUE,
  test.method = "LR",
  dominance.eff = TRUE,
  skip.self.int = FALSE,
  haplotype = TRUE,
  num.hap = NULL,
  optimizer = "nlminb",
  window.size.half = 5,
  window.slide = 1,
  chi0.mixture = 0.5,
  gene.set = NULL,
  map.gene.set = NULL,
  sig.level = 0.05,
  method.thres = "BH",
  plot.qq.1 = TRUE,
  plot.Manhattan.1 = TRUE,
  plot.epi.3d = TRUE,
  plot.epi.2d = TRUE,
  plot.method = 1,
  plot.col1 = c("dark blue", "cornflowerblue"),
  plot.col2 = 1,
  plot.type = "p",
  plot.pch = 16,
  saveName = NULL,
  main.qq.1 = NULL,
  main.man.1 = NULL,
  main.epi.3d = NULL,
  main.epi.2d = NULL,
  skip.check = FALSE,
  verbose = TRUE,
  verbose2 = FALSE,
  count = TRUE,
  time = TRUE
)

Arguments

pheno

Data frame where the first column is the line name (gid). The remaining columns should be a phenotype to test.

geno

Data frame with the marker names in the first column. The second and third columns contain the chromosome and map position. Columns 4 and higher contain the marker scores for each line, coded as -1, 0, 1 = aa, Aa, AA.

ZETA

A list of covariance (relationship) matrix (K: m \times m) and its design matrix (Z: n \times m) of random effects. Please set names of list "Z" and "K"! You can use more than one kernel matrix. For example,

ZETA = list(A = list(Z = Z.A, K = K.A), D = list(Z = Z.D, K = K.D))

Z.A, Z.D

Design matrix (n \times m) for the random effects. So, in many cases, you can use the identity matrix.

K.A, K.D

Different kernels which express some relationships between lines.

For example, K.A is additive relationship matrix for the covariance between lines, and K.D is dominance relationship matrix.

package.MM

The package name to be used when solving mixed-effects model. We only offer the following three packages: "RAINBOWR", "MM4LMM" and "gaston". Default package is 'gaston'. See more details at EM3.general.

covariate

A n \times 1 vector or a n \times p _ 1 matrix. You can insert continuous values, such as other traits or genotype score for special markers. This argument is regarded as one of the fixed effects.

covariate.factor

A n \times p _ 2 dataframe. You should assign a factor vector for each column. Then RGWAS changes this argument into model matrix, and this model matrix will be included in the model as fixed effects.

structure.matrix

You can use structure matrix calculated by structure analysis when there are population structure. You should not use this argument with n.PC > 0.

n.PC

Number of principal components to include as fixed effects. Default is 0 (equals K model).

min.MAF

Specifies the minimum minor allele frequency (MAF). If a marker has a MAF less than min.MAF, it is assigned a zero score.

n.core

Setting n.core > 1 will enable parallel execution on a machine with multiple cores. This argument is not valid when 'parallel.method = "furrr"'.

parallel.method

Method for parallel computation. We offer three methods, "mclapply", "furrr", and "foreach".

When 'parallel.method = "mclapply"', we utilize pbmclapply function in the 'pbmcapply' package with 'count = TRUE' and mclapply function in the 'parallel' package with 'count = FALSE'.

When 'parallel.method = "furrr"', we utilize future_map function in the 'furrr' package. With 'count = TRUE', we also utilize progressor function in the 'progressr' package to show the progress bar, so please install the 'progressr' package from github (https://github.com/HenrikBengtsson/progressr). For 'parallel.method = "furrr"', you can perform multi-thread parallelization by sharing memories, which results in saving your memory, but quite slower compared to 'parallel.method = "mclapply"'.

When 'parallel.method = "foreach"', we utilize foreach function in the 'foreach' package with the utilization of makeCluster function in 'parallel' package, and registerDoParallel function in 'doParallel' package. With 'count = TRUE', we also utilize setTxtProgressBar and txtProgressBar functions in the 'utils' package to show the progress bar.

We recommend that you use the option 'parallel.method = "mclapply"', but for Windows users, this parallelization method is not supported. So, if you are Windows user, we recommend that you use the option 'parallel.method = "foreach"'.

check.size.epi

This argument determines how many SNPs (around the SNP detected by normal GWAS) you will check epistasis.

epistasis.percent

This argument determines how many SNPs are detected by normal GWAS. For example, when epistasis.percent = 0.1, SNPs whose value of -log10(p) is in the top 0.1 percent are chosen as candidate for checking epistasis.

check.epi.max

It takes a lot of time to check epistasis, so you can decide the maximum number of SNPs to check epistasis.

your.check

Because there are less SNPs that can be tested in epistasis than in kernel-based GWAS, you can select which SNPs you want to test. If you use this argument, please set the number where SNPs to be tested are located in your data (so not position). In the default setting, your_check = NULL and epistasis between SNPs detected by GWAS will be tested.

GWAS.res.first

If you have already performed regular GWAS and have the result, you can skip performing normal GWAS.

P3D

When P3D = TRUE, variance components are estimated by REML only once, without any markers in the model. When P3D = FALSE, variance components are estimated by REML for each marker separately.

test.method

RGWAS supports two methods to test effects of each SNP-set.

"LR"

Likelihood-ratio test, relatively slow, but accurate (default).

"score"

Score test, much faster than LR, but sometimes overestimate -log10(p).

dominance.eff

If this argument is TRUE, dominance effect is included in the model, and additive x dominance and dominance x dominance are also tested as epistatic effects. When you use inbred lines, please set this argument FALSE.

skip.self.int

As default, the function also tests the self-interactions among the same SNP-sets. If you want to avoid this, please set 'skip.self.int = TRUE'.

haplotype

If the number of lines of your data is large (maybe > 100), you should set haplotype = TRUE. When haplotype = TRUE, haplotype-based kernel will be used for calculating -log10(p). (So the dimension of this gram matrix will be smaller.) The result won't be changed, but the time for the calculation will be shorter.

num.hap

When haplotype = TRUE, you can set the number of haplotypes which you expect. Then similar arrays are considered as the same haplotype, and then make kernel(K.SNP) whose dimension is num.hap x num.hap. When num.hap = NULL (default), num.hap will be set as the maximum number which reflects the difference between lines.

optimizer

The function used in the optimization process. We offer "optim", "optimx", and "nlminb" functions.

window.size.half

This argument decides how many SNPs (around the SNP you want to test) are used to calculated K.SNP. More precisely, the number of SNPs will be 2 * window.size.half + 1.

window.slide

This argument determines how often you test markers. If window.slide = 1, every marker will be tested. If you want to perform SNP set by bins, please set window.slide = 2 * window.size.half + 1.

chi0.mixture

RAINBOWR assumes the deviance is considered to follow a x chisq(df = 0) + (1 - a) x chisq(df = r). where r is the degree of freedom. The argument chi0.mixture is a (0 <= a < 1), and default is 0.5.

gene.set

If you have information of gene (or haplotype block), you can use it to perform kernel-based GWAS. You should assign your gene information to gene.set in the form of a "data.frame" (whose dimension is (the number of gene) x 2). In the first column, you should assign the gene name. And in the second column, you should assign the names of each marker, which correspond to the marker names of "geno" argument.

map.gene.set

Genotype map for 'gene.set' (list of haplotype blocks). This is a data.frame with the haplotype block (SNP-set, or gene-set) names in the first column. The second and third columns contain the chromosome and map position for each block. The forth column contains the cumulative map position for each block, which can be computed by cumsumPos function. If this argument is NULL, the map will be constructed by genesetmap function after the SNP-set GWAS. It will take some time, so you can reduce the computational time by assigning this argument beforehand.

sig.level

Significance level for the threshold. The default is 0.05.

method.thres

Method for detemining threshold of significance. "BH" and "Bonferroni are offered.

plot.qq.1

If TRUE, draw qq plot for normal GWAS.

plot.Manhattan.1

If TRUE, draw manhattan plot for normal GWAS.

plot.epi.3d

If TRUE, draw 3d plot

plot.epi.2d

If TRUE, draw 2d plot

plot.method

If this argument = 1, the default manhattan plot will be drawn. If this argument = 2, the manhattan plot with axis based on Position (bp) will be drawn. Also, this plot's color is changed by all chromosomes.

plot.col1

This argument determines the color of the manhattan plot. You should substitute this argument as color vector whose length is 2. plot.col1[1] for odd chromosomes and plot.col1[2] for even chromosomes

plot.col2

Color of the manhattan plot. color changes with chromosome and it starts from plot.col2 + 1 (so plot.col2 = 1 means color starts from red.)

plot.type

This argument determines the type of the manhattan plot. See the help page of "plot".

plot.pch

This argument determines the shape of the dot of the manhattan plot. See the help page of "plot".

saveName

When drawing any plot, you can save plots in png format. In saveName, you should substitute the name you want to save. When saveName = NULL, the plot is not saved.

main.qq.1

The title of qq plot for normal GWAS. If this argument is NULL, trait name is set as the title.

main.man.1

The title of manhattan plot for normal GWAS. If this argument is NULL, trait name is set as the title.

main.epi.3d

The title of 3d plot. If this argument is NULL, trait name is set as the title.

main.epi.2d

The title of 2d plot. If this argument is NULL, trait name is set as the title.

skip.check

As default, RAINBOWR checks the type of input data and modifies it into the correct format. However, it will take some time, so if you prepare the correct format of input data, you can skip this procedure by setting 'skip.check = TRUE'.

verbose

If this argument is TRUE, messages for the current steps will be shown.

verbose2

If this argument is TRUE, welcome message will be shown.

count

When count is TRUE, you can know how far RGWAS has ended with percent display.

time

When time is TRUE, you can know how much time it took to perform RGWAS.

Value

$first

The results of first normal GWAS will be returned.

$map.epi

Map information for SNPs which are tested epistatic effects.

$epistasis
$scores
$scores

This is the matrix which contains -log10(p) calculated by the test about epistasis effects.

$x, $y

The information of the positions of SNPs detected by regular GWAS. These vectors are used when drawing plots. Each output correspond to the replication of row and column of scores.

$z

This is a vector of $scores. This vector is also used when drawing plots.

References

Kennedy, B.W., Quinton, M. and van Arendonk, J.A. (1992) Estimation of effects of single genes on quantitative traits. J Anim Sci. 70(7): 2000-2012.

Storey, J.D. and Tibshirani, R. (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci. 100(16): 9440-9445.

Yu, J. et al. (2006) A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 38(2): 203-208.

Kang, H.M. et al. (2008) Efficient Control of Population Structure in Model Organism Association Mapping. Genetics. 178(3): 1709-1723.

Kang, H.M. et al. (2010) Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 42(4): 348-354.

Zhang, Z. et al. (2010) Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 42(4): 355-360.

Endelman, J.B. (2011) Ridge Regression and Other Kernels for Genomic Selection with R Package rrBLUP. Plant Genome J. 4(3): 250.

Endelman, J.B. and Jannink, J.L. (2012) Shrinkage Estimation of the Realized Relationship Matrix. G3 Genes, Genomes, Genet. 2(11): 1405-1413.

Su, G. et al. (2012) Estimating Additive and Non-Additive Genetic Variances and Predicting Genetic Merits Using Genome-Wide Dense Single Nucleotide Polymorphism Markers. PLoS One. 7(9): 1-7.

Zhou, X. and Stephens, M. (2012) Genome-wide efficient mixed-model analysis for association studies. Nat Genet. 44(7): 821-824.

Listgarten, J. et al. (2013) A powerful and efficient set test for genetic markers that handles confounders. Bioinformatics. 29(12): 1526-1533.

Lippert, C. et al. (2014) Greater power and computational efficiency for kernel-based association testing of sets of genetic variants. Bioinformatics. 30(22): 3206-3214.

Jiang, Y. and Reif, J.C. (2015) Modeling epistasis in genomic selection. Genetics. 201(2): 759-768.

Examples





  ### Import RAINBOWR
  require(RAINBOWR)

  ### Load example datasets
  data("Rice_Zhao_etal")
  Rice_geno_score <- Rice_Zhao_etal$genoScore
  Rice_geno_map <- Rice_Zhao_etal$genoMap
  Rice_pheno <- Rice_Zhao_etal$pheno

  ### View each dataset
  See(Rice_geno_score)
  See(Rice_geno_map)
  See(Rice_pheno)

  ### Select one trait for example
  trait.name <- "Flowering.time.at.Arkansas"
  y <- Rice_pheno[, trait.name, drop = FALSE]

  ### Remove SNPs whose MAF <= 0.05
  x.0 <- t(Rice_geno_score)
  MAF.cut.res <- MAF.cut(x.0 = x.0, map.0 = Rice_geno_map)
  x <- MAF.cut.res$x
  map <- MAF.cut.res$map


  ### Estimate genomic relationship matrix (GRM)
  K.A <- calcGRM(genoMat = x)


  ### Modify data
  modify.data.res <- modify.data(pheno.mat = y, geno.mat = x, map = map,
                                 return.ZETA = TRUE, return.GWAS.format = TRUE)
  pheno.GWAS <- modify.data.res$pheno.GWAS
  geno.GWAS <- modify.data.res$geno.GWAS
  ZETA <- modify.data.res$ZETA


  ### View each data for RAINBOWR
  See(pheno.GWAS)
  See(geno.GWAS)
  str(ZETA)




  ### Perform two-step epistasis GWAS (single-snp GWAS -> Check epistasis for significant markers)
  twostep.epi.res <- RGWAS.twostep.epi(pheno = pheno.GWAS, geno = geno.GWAS, ZETA = ZETA,
                                       n.PC = 4, test.method = "LR", gene.set = NULL,
                                       window.size.half = 10, window.slide = 21,
                                       package.MM = "gaston", parallel.method = "mclapply",
                                       skip.check = TRUE, n.core = 2)

  See(twostep.epi.res$epistasis$scores)


RAINBOWR documentation built on Sept. 12, 2023, 9:08 a.m.