Description Usage Arguments Value Regulatory locations Method Examples
This method is designed for a set of narrow genomic regions (e.g. TF peaks) and is used to test whether the genomic regions assigned to genes in a gene set are closer to regulatory locations (i.e. promoters or enhancers) than by chance.
1 2 3 4 5 |
peaks |
Either a file path or a |
out_name |
Prefix string to use for naming output files. This should not
contain any characters that would be illegal for the system being used (Unix,
Windows, etc.) The default value is "proxReg", and a file "proxReg_results.tab"
is produced. If |
out_path |
Directory to which results files will be written out. Defaults
to the current working directory as returned by |
genome |
One of the |
reglocation |
One of: 'tss', 'enhancer'. Details in the "Regulatory locations" section |
genesets |
A character vector of geneset databases to be tested for
enrichment. See |
randomization |
One of: 'shuffle', 'unif', 'bylength', 'byenh'. These were used to test for Type I error under the null hypothesis. A general user will never have to use these. |
qc_plots |
A logical variable that enables the automatic generation of plots for quality control. |
min_geneset_size |
Sets the minimum number of genes a gene set may have to be considered for testing. |
max_geneset_size |
Sets the maximum number of genes a gene set may have to be considered for testing. |
n_cores |
The number of cores to use for testing. We recommend using only up to the maximum number of physical cores present, as virtual cores do not significantly decrease runtime. Default number of cores is set to 1. NOTE: Windows does not support multicore testing. |
A list, containing the following items:
opts |
A data frame containing the arguments/values passed to |
peaks |
A data frame containing peak assignments to genes. Peaks which do not overlap a gene locus are not included. Each peak that was assigned to a gene is listed, along with the peak midpoint or peak interval coordinates (depending on which was used), the gene to which the peak was assigned, the locus start and end position of the gene, and the distance from the peak to the TSS. The columns are:
|
results |
A data frame of the results from performing the proxReg test on each geneset that was requested (all genesets are merged into one final data frame.) The columns are:
|
Current supported regulatory locations are gene transcription start sites (tss) or enhancer locations (hg19 only)
ProxReg first calculates the distance between each peak midpoint and regulatory location in base pairs. For gene transcription start sites, since parts of the chromosome are more sparse than others, there is an association with gene locus length that needs to be adjusted for. When using tss as the regulatory location, the peak distances are adjusted for this confounding variable based on an average of 90 ENCODE ChIP-seq experiments (details in citation pending). Similarly, for enhancers, distances depend on the density of enhancers within a gene locus, so distance to enhancer is adjusted using an empirical average of 90 ChIP-seq ENCODE experiments.
For each gene set of interest, the genomic regions are divided into two groups indicating the gene with the nearest tss is in the gene set or not. A Wilcoxon Rank-Sum test is then done to test for a difference in the adjusted distances (either to tss or enhancer).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # Run proxReg using an example dataset, assigning peaks to the nearest TSS,
# and on a small custom geneset
data(peaks_E2F4, package = 'chipenrich.data')
peaks_E2F4 = subset(peaks_E2F4, peaks_E2F4$chrom == 'chr1')
gs_path = system.file('extdata','vignette_genesets.txt', package='chipenrich')
results = proxReg(peaks_E2F4, reglocation = 'tss',
genome = 'hg19', genesets=gs_path, out_name=NULL)
# Get the list of peaks that were assigned to genes and their distances to
# regulatory regions.
assigned_peaks = results$peaks
# Get the results of enrichment testing.
enrich = results$results
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.