donuts: DONUTS: Decomposing nature and nurture using GWAS summary...
In qlu-lab/DONUTS: Decomposing nature and nurture using GWAS summary statistics

View source: R/f.donuts.R

donuts

R Documentation

DONUTS: Decomposing nature and nurture using GWAS summary statistics

Description

This function takes 2 or 3 GWAS summary statistics as input and output summary statistics for the direct and indirect genetic effects.

Usage

donuts(
  ss.own,
  ss2,
  ss3 = NULL,
  l12 = 0,
  l13 = 0,
  l23 = 0,
  n1 = NULL,
  n2 = NULL,
  n3 = NULL,
  alpha = 0,
  mode = 2,
  OutDir = getwd()
)

Arguments

`ss.own`	a data.frame; GWAS-O summary statistics
`ss2`	a data.frame; 2nd input GWAS summary statistics. Depending on the `mode`, this could be GWAS-M, GWAS-P, or GWAS-MP. see Details.
`ss3`	a data.frame; default is NULL. 3rd input GWAS summary statistics; This is GWAS-P when `mode` == 1. see Details.
`l12`	numeric; `ss.own` and `ss2` LDSC genetic covariance intercept; default is 0.
`l13`	numeric; `ss.own` and `ss3` LDSC genetic covariance intercept; default is 0.
`l23`	numeric; `ss2` and `ss3` LDSC genetic covariance intercept; default is 0.
`n1`	integer; Sample size for ss.own; default is NULL. Only needs to be specified when sample size is not included in ss.own summary statistics. If specified, this number will be used in the analysis.
`n2`	integer; Sample size for ss2; default is NULL. Only needs to be specified when sample size is not included in ss2 summary statistics. If specified, this number will be used in the analysis.
`n3`	integer; Sample size for ss3; default is NULL. Only needs to be specified when sample size is not included in ss2 summary statistics. If specified, this number will be used in the analysis.
`alpha`	numeric or a data.frame; correlation between spousal genotypes (i.e., Corr(Gm, Gp)) at each locus; default is 0. This value measures the degree of assortative mating. `alpha` can also take a data.frame where user can specify the spousal correlation at each locus The data.frame must contain two columns "SNP" and "alpha", where "SNP" is the column name for SNP ID (rs#) and "alpha" is the column name for the SNP-level spousal correlation. When specified as a data.frame, will keep only the overlapping SNPs (by their rs#) among input GWAS and alpha.
`mode`	integer 1, 2, or 3; default is 2; specify analysis scenario – see Details.
`OutDir`	Output directory to write the direct and indirect effect summary statistics files. Default is the current directory. If is NULL, the output files won't be written (but will still return the results as a data.frame).

Details

This function will first check whether there are duplicated SNPs using variant IDs. SNPs with duplicated IDs will be removed. Then, it will take intersection of the SNPs among all the inputs (and also with SNPs in alpha if it's a data.frame) and only the overlapping SNPs will be kept in the output. The first input summary statistics' A1 and A2 will be used. That is, the other input's BETA will multiply by -1 if A1 and A2 are flipped, or will be re-coded as NA if the alleles cannot be matched.

GWAS-O: standard GWAS of own phenotype ~ own genotype

GWAS-M: offspring phenotype ~ mother's genotype

GWAS-P: offspring phenotype ~ father's genotype

GWAS-MP: offspring phenotype ~ parental genotype, where we pool together mothers and fathers from different families to run the GWAS

The input GWAS summary statistics must contain the following columns with exactly the following column names (they can contain additional columns, but those will not be used):

CHR: chromosome

BP: base-pair coordinate

SNP: variant IDs

A1: effect allele

A2: non-effect allele

BETA: effect size

SE: standard error

P: p-value

They can also contain "N" column for the sample size at each locus. If the summary statistics does not contain "N", they can be specified by n1, n2, or n3 for the 3 input, respectively. Note, if the sample size is specified by n1, n2, or n3, these values will be used even if the input summary statistics contains "N" column.

The default value for alpha is 0. You can also specify the spousal correlation at each SNP using a data.frame for alpha. If you want to do so, the data.frame alpha must contain 2 columns: "SNP" column for the variant ID and "alpha" column for the spousal correlation. When alpha is specified as a data.frame, only the overlapping SNPs with those in the input summary statistics will be kept in the output.

When mode == 1, 3 inputs are expected: ss.own is GWAS-O, ss2 is GWAS-M, and ss3 is GWAS-P. The returned data.frame will contain the input summary statistics and the direct, indirect, indirect maternal, and indirect paternal effects. If OutDir is not NULL, will write summary statistics for the direct effect (direct_effect.sumstats.gz), indirect effect (indirect_effect.sumstats.gz), indirect maternal effect (indirect_maternal_effect.sumstats.gz), indirect paternal effect (direct_effect.sumstats.gz), and a file containing everythings (all_aligned.sumstats.gz).

When mode == 2, 2 inputs are expected: ss.own is GWAS-O, ss2 is GWAS-MP. The returned data.frame will contain the input summary statistics and the direct and indirect effects. If OutDir is not NULL, will write summary statistics for the direct effect (direct_effect.sumstats.gz) and indirect effect (indirect_effect.sumstats.gz), and a file containing everythings (all_aligned.sumstats.gz).

When mode == 3, 2 inputs are expected: ss.own is GWAS-O, ss2 is GWAS-M or GWAS-P. If ss2 is GWAS-M, you're assuming the indirect paternal effect is 0. If ss2 is GWAS-P, you're assuming the indirect maternal effect is 0. If OutDir is not NULL, will write summary statistics for the direct effect (direct_effect.sumstats.gz), indirect effect (indirect_effect.sumstats.gz), indirect maternal (if ss2 is GWAS-M) or indirect paternal (if ss2 is GWAS-P) effect (indirect_ss2_effect.sumstats.gz), and a file containing everythings (all_aligned.sumstats.gz).

Besides the summary statistics, it is highly recommended to first run LDSC among any pair of your inputs and use LDSC's genetic covariance intercept to account for possible sample overlap.

Note, since the direct and indirect effects are linear combinations of input GWAS, it is thus critical that all the input GWAS were done on a same phenotype scale.

Value

Returns a data.frame containing both the input and output summary statistics. The basic information about the SNPs are copied from ss.own. The contents of this data.frame will be different depending on the mode.

CHR: chromosome

BP: base-pair coordinate

SNP: variant IDs

A1: effect allele

A2: non-effect allele

alpha: Corr(Gm, Gp) at each locus

beta.{own, ss2, ss3, dir, ind, ind.mat, ind.pat, ind.ss2}: effect sizes in the input GWAS summary statistics and for the direct and indirect effects.

se.{own, ss2, ss3, dir, ind, ind.mat, ind.pat, ind.ss2}: standard errors in the input GWAS summary statistics and for the direct and indirect effects.

p.{own, ss2, ss3, dir, ind, ind.mat, ind.pat, ind.ss2}: p-values in the input GWAS summary statistics and for the direct and indirect effects.

n.{own, ss2, ss3, dir, ind, ind.mat, ind.pat, ind.ss2}: sample sizes in the input GWAS summary statistics and the effective sample sizes for the direct and indirect effects.

Tutorials and examples can be found at: https://github.com/qlu-lab/DONUTS