snp.ld.analysis: Implementation of the SNP and LD data analysis using IRanges

Description Usage Arguments Details Value References

Description

This function wraps the work-flow of a SNP analysis. SNPs are assigned to genes using findOverlaps. Then, gene-specific scores are computed as combined scores from the SNP-pvalues.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
snp.ld.analysis(snpdata.url = stop("missing snpdata file"),
  genome.url = stop("missing genome file"), ld.data.hdf.url = NULL,
  include.ld.data = !is.null(ld.data.hdf.url), population = "CEU",
  full.match = TRUE, ld.rho.cutoff = 0.8, comparator = c(">=", "<=", ">",
  "<", "==", "!="), flank.genes.left = 10000, flank.genes.right = 10000,
  multiple.hits = TRUE, scoring.function = c("max", "min", "mean", "median",
  "product", "sum", "count", "p.ratio", "snp.ratio.score", "fisher", "saccone",
  "sidak", "bonferroni", "slowscore", "get.snps", "get.scores"),
  correction.type = c("Moskvina", "none"), ld.structure = FALSE,
  use.position = (scoring.function != "get.snps"),
  genome.name = "unknown genome", outfile.path = "./LDsnpRout.txt",
  add.call.header = TRUE, generate.plink.set = FALSE, ...)

Arguments

snpdata.url

character: A URL or file to read the SNPs in tab separated format.

genome.url

character: A URL or file to read the Genome in tab separated format.

ld.data.hdf.url

character: A URL or file to read the optional LD-data as a HDF5 file.

include.ld.data

logical: Should LD data be included in scoring? Takes more time and memory (default: FALSE)

population

Choose a population code, character vector of length one. The population code corresponds to the population codes in the LD data file. (see Details) (default: "EUR")

full.match

Do a full crossmatching of LD data, only if include.ld.data == TRUE. See Details. (default: TRUE)

ld.rho.cutoff

numeric: Cutoff value for the LD correlation read from HDF file (Default: 0.8)

comparator

Binary comparator (as string) for ld cutoff, used with the LD score in the ld file. one of ">","<",">=","<=","==", "!=" (default: ">=")

flank.genes.left

integer: Expand gene regions by this on the left (default: 1000)

flank.genes.right

integer: Expand gene regions by this on the right (default: 1000)

multiple.hits

logical: Allow multiple overlaps (Default: TRUE)

scoring.function

character: the name of the scoring function to use. One of: "max", "min", "mean", "median", "product", "sum", "count", "p.ratio", "snp.ratio.score", "saccone", "sidak", "bonferroni", "get.snps", "w.fisher", "brown", "forge", "vegas"

correction.type

character: the name of the correction function to use. One of: "Nyholt", "Moskvina", "Gao"

ld.structure

logical: If TRUE, LD stucture from hdf5 file is loaded. If FALSE, LD structure is estimated based on p-values (Default: FALSE)

use.position

logical: Should markers be matched by genomic postions or rs ids?

genome.name

character: optional genome.name

outfile.path

character: A file path to save the output to. Default "./LDsnpRout.txt" If NULL, no file is generated

add.call.header

logical: if TRUE a comment header is added to the output file containing the function call parameters

generate.plink.set

logical: Generate output file in Plik set format (Default: FALSE) If TRUE, scoring.function is ignored

...

Additional parameters passed to the scoring function. (e.g. pMax for the p.ratio)

Details

This function wraps the work-flow of a SNP analysis. SNPs are assigned to genes using findOverlaps. Then, gene-specific scores are computed as combined scores from the SNP-pvalues. If LD data is provided (not NULL), the scores of SNPs in high LD are added to the gene-specific list of scoring genes. Due to the large data-volume, the LD-data must be stored in a hdf5 file in a defined format, grouped by chromosome with 3 or 5 datasets (1-dimensional) per group: snp.id.one(the rs-identifier, first column), snp.id.two (second column), value.set (numeric score value). For full crossmatching the additional data-sets marker.pos.one, marker.pos.two are required. Missing chromosomes are ignored. Group names in the hdf5 must match chromosome names exactly, otherwise the data will not be loaded. See the example data file for the structure. The rs-identifiers must be stored as integers without the rs prefix.

Value

gene.score RangedData: A hidden object of type RangedData containing the genomic ranges and a score column with the gene scores resulting from the scoring function. The result is writen to a tab separated txt file unless the output file parameter is NULL.

References

Fisher R. (1925), Statistical methods for research workers. London: Oliver and Lloyd.

Peirce JL, Broman KW, Lu L, Williams RW. (2007) A simple method for combining genetic mapping data from multiple crosses and experimental designs. PLoS One. 2(10):e1036. PMID: 17940600

Saccone SF, Hinrichs AL, Saccone NL, Chase GA, Konvicka K, et al. (2007) Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs. Hum Mol Genet 16(1). PMID: 17135278


tvpolushina/test documentation built on May 3, 2019, 1:50 p.m.