gene2p: Aggregate P-Values for Genes with multiple SNPs
In merns/postgwas: GWAS Post-Processing Utilities

Description Usage Arguments Details Value References Examples

Calculates aggregate (combined) p-values for genes assigned to multiple SNPs, by taking into account the dependency structure between SNPs (mainly based on LD). This is done using the wrapping function gene2p, which can take specific algorithm implementations as function argument. Currently, re-implementations of the GATES and SpD algorithms (see references for original publications and authors) are available. The wrapping function 'gene2p' will be called by the user and manages LD calculation (r2fast function of GenABEL) and parallelization with subsequent application of the specified algorithm function.

gene2p(
  gwas,
  gts.source, 
  method = GATES, 
  cores = 1
)
SpD(ldmatrix, snps, p)
GATES(ldmatrix, snps, p)

`gwas`	data.frame. Has to contain columns 'SNP', 'P' and either 'geneid' or 'genename' (when both are present, the one with the smaller index is used). A single SNP may occur with different genes, but always has to have the same p-value.
`gts.source`	vector(1). Can be a HapMap population identifier (numeric) to retrieve genotyes for, or an object of class `snp.data` holding genotypes, or a GenABEL (.gwaa) or LINKAGE / PLINK (.ped) genotype file, with existing corresponding .phe and .map files, respectively. See the `gts.source` argument in the `getGenotypes` function for the exact file format specifications. GenABEL format is fastest and recommended. The files might contain more SNPs than actually needed. The .map file has to be in –map3 format (columns CHR, SNP, BP).
`method`	function. Can currently be the SpD or GATES functions. See 'Details' for more information.
`cores`	integer. The number of parallel processes to use (cores = 1 uses no parallelization).
`ldmatrix`	numeric. A matrix of ld values, dimensions matching the `p` and `snps` arguments.
`snps`	character. A vector of SNP identifiers.
`p`	numeric. A vector of p-values matching the length of the `snps` argument.

The SpD and GATES functions calculate an aggregate p-value for a single gene/locus, based on multiple tested SNPs (e.g. an association test) that are assigned to that locus. To account for statistical dependence or independence between the multiple SNPs, linkage disequilibrium information is required to decorrelate the tests. Further information about gene-based association tests and aggregate p-values and the specific GATEs and SpD methods can be found in the references section. The gene2p function enables an application of the methods to multiple genes (or a genomewide dataset, with certain runtime limitations) and automatic calculation of the required LD amtrix for each gene. Thus, conventionally the user will call the gene2p function to calculate a representative, gene-based p-value for a large set of genes that are annotated to multiple SNPs (e.g. by the link{snp2gene} function) as result of an association study. Calculations may be time- and space demanding, so depending on the number of SNPs, it might be a good idea to divide the dataset by chromosome, use only intragenic SNPs or prune the the dataset (e.g. drop SNPs evenly).

data frame: The 'gwas' argument with a column 'gene.p' added. Contains the original number of rows. Column 'gene.p' may contain NA values when the gene-wise p-value could not be calculated for that row. As a side effect, the retrieved genotype data is deposited in files gene2p.gwaa, gene2p.phe, gene2p.ped and gene2p.map.

The GATES and SpD algorithms have been proposed by Miao-Xin Li et. al. in http://dx.doi.org/doi:10.1016/j.ajhg.2011.01.019 and Dale Nyholt, respectively http://dx.doi.org/doi:10.1086/383251.

  snps <- data.frame(SNP = c("rs188090", "rs172154", "rs759704"))
  snps$P <- runif(nrow(snps))^2
  
  
  # offline LD annotation needs genotype files
  # hint: genotype data can also be preloaded using the getGenotypes function
  gwaafile <- system.file("extdata", "example.gwaa", package = "postgwas")
  
  gwas <- snp2gene.prox(snps, level = 0, use.buffer = TRUE)
  
  gene2p(gwas, gts.source = gwaafile)