hybridenrich: Running Hybrid test, either from scratch or using two results...
In chipenrich: Gene Set Enrichment For ChIP-seq Peak Data

Description Usage Arguments Value Hybrid p-values Function inputs Joining two results files

Hybrid test is designed for people unsure of which test between ChIP-Enrich and Poly-Enrich to use, so it takes information of both and gives adjusted P-values. For more about ChIP- and Poly-Enrich, consult their corresponding documentation.

hybridenrich(peaks, out_name = "hybridenrich", out_path = getwd(),
  genome = supported_genomes(), genesets = c("GOBP", "GOCC", "GOMF"),
  locusdef = "nearest_tss", methods = c("chipenrich", "polyenrich"),
  weighting = NULL, mappability = NULL, qc_plots = TRUE,
  min_geneset_size = 15, max_geneset_size = 2000,
  num_peak_threshold = 1, randomization = NULL, n_cores = 1)

`peaks`	Either a file path or a `data.frame` of peaks in BED-like format. If a file path, the following formats are fully supported via their file extensions: .bed, .broadPeak, .narrowPeak, .gff3, .gff2, .gff, and .bedGraph or .bdg. BED3 through BED6 files are supported under the .bed extension. Files without these extensions are supported under the conditions that the first 3 columns correspond to 'chr', 'start', and 'end' and that there is either no header column, or it is commented out. If a `data.frame` A BEDX+Y style `data.frame`. See `GenomicRanges::makeGRangesFromDataFrame` for acceptable column names.
`out_name`	Prefix string to use for naming output files. This should not contain any characters that would be illegal for the system being used (Unix, Windows, etc.) The default value is "chipenrich", and a file "chipenrich_results.tab" is produced. If `qc_plots` is set, then a file "chipenrich_qcplots.pdf" is produced containing a number of quality control plots. If `out_name` is set to NULL, no files are written, and results then must be retrieved from the list returned by `chipenrich`.
`out_path`	Directory to which results files will be written out. Defaults to the current working directory as returned by `getwd`.
`genome`	One of the `supported_genomes()`.
`genesets`	A character vector of geneset databases to be tested for enrichment. See `supported_genesets()`. Alternately, a file path to a a tab-delimited text file with header and first column being the geneset ID or name, and the second column being Entrez Gene IDs. For an example custom gene set file, see the vignette.
`locusdef`	One of: 'nearest_tss', 'nearest_gene', 'exon', 'intron', '1kb', '1kb_outside', '1kb_outside_upstream', '5kb', '5kb_outside', '5kb_outside_upstream', '10kb', '10kb_outside', '10kb_outside_upstream'. For a description of each, see the vignette or `supported_locusdefs`. Alternately, a file path for a custom locus definition. NOTE: Must be for a `supported_genome()`, and must have columns 'chr', 'start', 'end', and 'gene_id' or 'geneid'. For an example custom locus definition file, see the vignette.
`methods`	A character string array specifying the method to use for enrichment testing. Currently actually unused as the methods are forced to be one chipenrich and one polyenrich.
`weighting`	A character string specifying the weighting method. Method name will automatically be "polyenrich_weighted" if given weight options. Current options are: 'signalValue', 'logsignalValue', and 'multiAssign'.
`mappability`	One of `NULL`, a file path to a custom mappability file, or an `integer` for a valid read length given by `supported_read_lengths`. If a file, it should contain a header with two column named 'gene_id' and 'mappa'. Gene IDs should be Entrez IDs, and mappability values should range from 0 and 1. For an example custom mappability file, see the vignette. Default value is NULL.
`qc_plots`	A logical variable that enables the automatic generation of plots for quality control.
`min_geneset_size`	Sets the minimum number of genes a gene set may have to be considered for enrichment testing.
`max_geneset_size`	Sets the maximum number of genes a gene set may have to be considered for enrichment testing.
`num_peak_threshold`	Sets the threshold for how many peaks a gene must have to be considered as having a peak. Defaults to 1. Only relevant for Fisher's exact test and ChIP-Enrich methods.
`randomization`	One of `NULL`, 'complete', 'bylength', or 'bylocation'. See the Randomizations section below.
`n_cores`	The number of cores to use for enrichment testing. We recommend using only up to the maximum number of physical cores present, as virtual cores do not significantly decrease runtime. Default number of cores is set to 1. NOTE: Windows does not support multicore enrichment.

A data.frame containing:

results

A data frame of the results from performing the gene set enrichment test on each geneset that was requested (all genesets are merged into one final data frame.) The columns are:

Geneset.ID: is the identifier for a given gene set from the selected database. For example, GO:0000003.
P.Value.x: is the probability of observing the degree of enrichment of the gene set given the null hypothesis that peaks are not associated with any gene sets, for the first test.
P.Value.y: is the same as above except for the second test.
P.Value.Hybrid: The calculated Hybrid p-value from the two tests
FDR.Hybrid: is the false discovery rate proposed by Bejamini \& Hochberg for adjusting the p-value to control for family-wise error rate.

Other variables given will also be included, see the corresponding methods' documentation for their details.

Given n tests that test for the same hypothesis, same Type I error rate, and converted to p-values: p_1, ..., p_n, the Hybrid p-value is computed as: n*min(p_1, ..., p_n). This hybrid test will have at most the same Type I error as any individual test, and if any of the tests have 100% power as sample size goes to infinity, then so will the hybrid test.

Every input in hybridenrich is the same as in chipenrich and polyenrich. Inputs unique to chipenrich are: num_peak_threshold; and inputs unique to polyenrich are: weighting. Currently the test only supports running chipenrich and polyenrich, but future plans will allow you to run any number of different support tests.

Combines two existing results files and returns one results file with hybrid p-values and FDR included. Current allowed inputs are objects from any of the supplied enrichment tests or a dataframe with at least the following columns: P.value, Geneset.ID. Optional columns include: Status. Currently we only allow for joining two results files, but future plans will allow you to join any number of results files.

chipenrich documentation built on Nov. 8, 2020, 8:11 p.m.