motif_peaks: Look for overrepresented motif position peaks in a set of...

View source: R/motif_peaks.R

motif_peaksR Documentation

Look for overrepresented motif position peaks in a set of sequences.

Description

Using the motif position data from scan_sequences() (or elsewhere), test whether certain positions in the sequences have significantly higher motif density.

Usage

motif_peaks(hits, seq.length, seq.count, bandwidth, max.p = 1e-06,
  peak.width = 3, nrand = 100, plot = TRUE, BP = FALSE)

Arguments

hits

numeric A vector of sequence positions indicating motif sites.

seq.length

numeric(1) Length of sequences. Only one number is allowed, as all sequences must be of identical length. If missing, then the largest number from hits is used.

seq.count

numeric(1) Number of sequences with motif sites. If missing, then the number of unique values in hits is used.

bandwidth

numeric(1) Peak smoothing parameter. Smaller numbers will result in skinnier peaks, larger numbers will result in wider peaks. Leaving this empty will cause motif_peaks() to generate one by itself (see 'details').

max.p

numeric(1) Maximum P-value allowed for finding significant motif site peaks.

peak.width

numeric(1) Minimum peak width. A peak is defined as as the highest point within the value set by peak.width.

nrand

numeric(1) Number of random permutations for generating a null distribution. In order to calculate P-values, a set of random motif site positions are generated nrand times.

plot

logical(1) Will create a ggplot2 object displaying motif peaks.

BP

logical(1) Allows for the use of BiocParallel within motif_peaks(). See BiocParallel::register() to change the default backend. Setting BP = TRUE is only recommended for exceptionally large jobs. Keep in mind that this function will not attempt to limit its memory usage.

Details

Kernel smoothing is used to calculate motif position density. The implementation for this process is based on code from the KernSmooth R package (Wand 2015). These density estimates are used to determine peak locations and heights. To calculate the P-values of these peaks, a null distribution is calculated from peak heights of randomly generated motif positions.

If the bandwidth option is not supplied, then the following code is used (from KernSmooth):

del0 <- (1 / (4 * pi))^(1 / 10)

bandwidth <- del0 * (243 / (35 * length(hits)))^(1 / 5) * sqrt(var(hits))

Value

A DataFrame with peak positions and P-values. If plot = TRUE, then a list is returned with the DataFrame as the first item and the ggplot2 object as the second item.

Author(s)

Benjamin Jean-Marie Tremblay, benjamin.tremblay@uwaterloo.ca

References

Wand M (2015). KernSmooth: Functions for Kernel Smoothing Supporting Wand and Jones (1995). R package version 2.23-15, <URL: https://CRAN.R-project.org/package=KernSmooth>.

See Also

scan_sequences()

Examples

data(ArabidopsisMotif)
data(ArabidopsisPromoters)
if (R.Version()$arch != "i386") {
hits <- scan_sequences(ArabidopsisMotif, ArabidopsisPromoters, RC = FALSE)
res <- motif_peaks(as.vector(hits$start), 1000, 50)
# View plot:
res$Plot

# The raw plot data can be found in:
res$Plot$data
}


bjmt/universalmotif documentation built on Nov. 16, 2024, 7:38 a.m.