snp.ld.analysis: Implementation of the SNP and LD data analysis using IRanges
In tvpolushina/test: LDsnpR - SNP-based gene scoring using LD data

Description Usage Arguments Details Value References

This function wraps the work-flow of a SNP analysis. SNPs are assigned to genes using findOverlaps. Then, gene-specific scores are computed as combined scores from the SNP-pvalues.

snp.ld.analysis(snpdata.url = stop("missing snpdata file"),
  genome.url = stop("missing genome file"), ld.data.hdf.url = NULL,
  include.ld.data = !is.null(ld.data.hdf.url), population = "CEU",
  full.match = TRUE, ld.rho.cutoff = 0.8, comparator = c(">=", "<=", ">",
  "<", "==", "!="), flank.genes.left = 10000, flank.genes.right = 10000,
  multiple.hits = TRUE, scoring.function = c("max", "min", "mean", "median",
  "product", "sum", "count", "p.ratio", "snp.ratio.score", "fisher", "saccone",
  "sidak", "bonferroni", "slowscore", "get.snps", "get.scores"),
  correction.type = c("Moskvina", "none"), ld.structure = FALSE,
  use.position = (scoring.function != "get.snps"),
  genome.name = "unknown genome", outfile.path = "./LDsnpRout.txt",
  add.call.header = TRUE, generate.plink.set = FALSE, ...)

`snpdata.url`	character: A URL or file to read the SNPs in tab separated format.
`genome.url`	character: A URL or file to read the Genome in tab separated format.
`ld.data.hdf.url`	character: A URL or file to read the optional LD-data as a HDF5 file.
`include.ld.data`	logical: Should LD data be included in scoring? Takes more time and memory (default: FALSE)
`population`	Choose a population code, character vector of length one. The population code corresponds to the population codes in the LD data file. (see Details) (default: "EUR")
`full.match`	Do a full crossmatching of LD data, only if include.ld.data == TRUE. See Details. (default: TRUE)
`ld.rho.cutoff`	numeric: Cutoff value for the LD correlation read from HDF file (Default: 0.8)
`comparator`	Binary comparator (as string) for ld cutoff, used with the LD score in the ld file. one of ">","<",">=","<=","==", "!=" (default: ">=")
`flank.genes.left`	integer: Expand gene regions by this on the left (default: 1000)
`flank.genes.right`	integer: Expand gene regions by this on the right (default: 1000)
`multiple.hits`	logical: Allow multiple overlaps (Default: TRUE)
`scoring.function`	character: the name of the scoring function to use. One of: "max", "min", "mean", "median", "product", "sum", "count", "p.ratio", "snp.ratio.score", "saccone", "sidak", "bonferroni", "get.snps", "w.fisher", "brown", "forge", "vegas"
`correction.type`	character: the name of the correction function to use. One of: "Nyholt", "Moskvina", "Gao"
`ld.structure`	logical: If TRUE, LD stucture from hdf5 file is loaded. If FALSE, LD structure is estimated based on p-values (Default: FALSE)
`use.position`	logical: Should markers be matched by genomic postions or rs ids?
`genome.name`	character: optional genome.name
`outfile.path`	character: A file path to save the output to. Default "./LDsnpRout.txt" If NULL, no file is generated
`add.call.header`	logical: if TRUE a comment header is added to the output file containing the function call parameters
`generate.plink.set`	logical: Generate output file in Plik set format (Default: FALSE) If TRUE, scoring.function is ignored
`...`	Additional parameters passed to the scoring function. (e.g. pMax for the p.ratio)

This function wraps the work-flow of a SNP analysis. SNPs are assigned to genes using findOverlaps. Then, gene-specific scores are computed as combined scores from the SNP-pvalues. If LD data is provided (not NULL), the scores of SNPs in high LD are added to the gene-specific list of scoring genes. Due to the large data-volume, the LD-data must be stored in a hdf5 file in a defined format, grouped by chromosome with 3 or 5 datasets (1-dimensional) per group: snp.id.one(the rs-identifier, first column), snp.id.two (second column), value.set (numeric score value). For full crossmatching the additional data-sets marker.pos.one, marker.pos.two are required. Missing chromosomes are ignored. Group names in the hdf5 must match chromosome names exactly, otherwise the data will not be loaded. See the example data file for the structure. The rs-identifiers must be stored as integers without the rs prefix.

gene.score RangedData: A hidden object of type RangedData containing the genomic ranges and a score column with the gene scores resulting from the scoring function. The result is writen to a tab separated txt file unless the output file parameter is NULL.

Fisher R. (1925), Statistical methods for research workers. London: Oliver and Lloyd.

Peirce JL, Broman KW, Lu L, Williams RW. (2007) A simple method for combining genetic mapping data from multiple crosses and experimental designs. PLoS One. 2(10):e1036. PMID: 17940600

Saccone SF, Hinrichs AL, Saccone NL, Chase GA, Konvicka K, et al. (2007) Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs. Hum Mol Genet 16(1). PMID: 17135278

tvpolushina/test documentation built on May 3, 2019, 1:50 p.m.