Description Usage Arguments Value ChIP-Enrich Method Choosing A Method Randomizations See Also Examples
ChIP-Enrich is designed for use with 1,000s or 10,000s of narrow peaks which results in fewer gene loci containing a peak overall. For example, ChIP-seq experiments for transcription factors. For more details, see the 'ChIP-Enrich Method' section below. For help choosing a method, see the 'Choosing A Method' section below, or see the vignette.
1 2 3 4 5 6 | chipenrich(peaks, out_name = "chipenrich", out_path = getwd(),
genome = supported_genomes(), genesets = c("GOBP", "GOCC", "GOMF"),
locusdef = "nearest_tss", method = "chipenrich",
mappability = NULL, fisher_alt = "two.sided", qc_plots = TRUE,
min_geneset_size = 15, max_geneset_size = 2000,
num_peak_threshold = 1, randomization = NULL, n_cores = 1)
|
peaks |
Either a file path or a |
out_name |
Prefix string to use for naming output files. This should not
contain any characters that would be illegal for the system being used (Unix,
Windows, etc.) The default value is "chipenrich", and a file "chipenrich_results.tab"
is produced. If |
out_path |
Directory to which results files will be written out. Defaults
to the current working directory as returned by |
genome |
One of the |
genesets |
A character vector of geneset databases to be tested for
enrichment. See |
locusdef |
One of: 'nearest_tss', 'nearest_gene', 'exon', 'intron', '1kb',
'1kb_outside', '1kb_outside_upstream', '5kb', '5kb_outside', '5kb_outside_upstream',
'10kb', '10kb_outside', '10kb_outside_upstream'. For a description of each,
see the vignette or |
method |
A character string specifying the method to use for enrichment testing. Must be one of ChIP-Enrich ('chipenrich') (default), or Fisher's exact test ('fet'). |
mappability |
One of |
fisher_alt |
If method is 'fet', this option indicates the alternative for Fisher's exact test, and must be one of 'two-sided' (default), 'greater', or 'less'. |
qc_plots |
A logical variable that enables the automatic generation of plots for quality control. |
min_geneset_size |
Sets the minimum number of genes a gene set may have to be considered for enrichment testing. |
max_geneset_size |
Sets the maximum number of genes a gene set may have to be considered for enrichment testing. |
num_peak_threshold |
Sets the threshold for how many peaks a gene must have to be considered as having a peak. Defaults to 1. Only relevant for Fisher's exact test and ChIP-Enrich methods. |
randomization |
One of |
n_cores |
The number of cores to use for enrichment testing. We recommend using only up to the maximum number of physical cores present, as virtual cores do not significantly decrease runtime. Default number of cores is set to 1. NOTE: Windows does not support multicore enrichment. |
A list, containing the following items:
opts |
A data frame containing the arguments/values passed to |
peaks |
A data frame containing peak assignments to genes. Peaks which do not overlap a gene locus are not included. Each peak that was assigned to a gene is listed, along with the peak midpoint or peak interval coordinates (depending on which was used), the gene to which the peak was assigned, the locus start and end position of the gene, and the distance from the peak to the TSS. The columns are:
|
peaks_per_gene |
A data frame of the count of peaks per gene. The columns are:
|
results |
A data frame of the results from performing the gene set enrichment test on each geneset that was requested (all genesets are merged into one final data frame.) The columns are:
|
The ChIP-Enrich method uses the presence of a peak in its model for enrichment:
peak ~ GO + s(log10_length)
. Here, GO
is a binary vector indicating
whether a gene is in the gene set being tested, peak
is a binary vector
indicating the presence of a peak in a gene, and s(log10_length)
is a
binomial cubic smoothing spline which adjusts for the relationship between the
presence of a peak and locus length.
The following guidelines are intended to help select an enrichment function:
is designed for use with broad peaks that may intersect multiple gene loci, and cumulatively cover greater than 5% of the genome. For example, ChIP-seq experiments for histone modifications.
is designed for use with 1,000s or 10,000s of narrow peaks which results in fewer gene loci containing a peak overall. For example, ChIP-seq experiments for transcription factors.
is also designed for narrow peaks, for experiments with 100,000s of peaks, or in cases where the number of binding sites per gene affects its regulation. If unsure whether to use chipenrich or polyenrich, then we recommend hybridenrich.
is a combination of chipenrich and polyenrich, to be used when one is unsure which is the optimal method.
Randomization of locus definitions allows for the assessment of Type I Error under the null hypothesis. The randomization codes are:
NULL
:No randomizations, the default.
Shuffle the gene_id
and symbol
columns of the
locusdef
together, without regard for the chromosome location, or locus length.
The null hypothesis is that there is no true gene set enrichment.
Shuffle the gene_id
and symbol
columns of the
locusdef
together within bins of 100 genes sorted by locus length. The null
hypothesis is that there is no true gene set enrichment, but with preserved locus
length relationship.
Shuffle the gene_id
and symbol
columns of the
locusdef
together within bins of 50 genes sorted by genomic location. The null
hypothesis is that there is no true gene set enrichment, but with preserved
genomic location.
The return value with a selected randomization is the same list as without.
To assess the Type I error, the alpha
level for the particular data set
can be calculated by dividing the total number of gene sets with p-value < alpha
by the total number of tests. Users may want to perform multiple randomizations
for a set of peaks and take the median of the alpha
values.
Other enrichment functions: broadenrich
1 2 3 4 5 6 7 8 9 10 11 12 13 | # Run ChipEnrich using an example dataset, assigning peaks to the nearest TSS,
# and on a small custom geneset
data(peaks_E2F4, package = 'chipenrich.data')
peaks_E2F4 = subset(peaks_E2F4, peaks_E2F4$chrom == 'chr1')
gs_path = system.file('extdata','vignette_genesets.txt', package='chipenrich')
results = chipenrich(peaks_E2F4, method='chipenrich', locusdef='nearest_tss',
genome = 'hg19', genesets=gs_path, out_name=NULL)
# Get the list of peaks that were assigned to genes.
assigned_peaks = results$peaks
# Get the results of enrichment testing.
enrich = results$results
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.