motif_peaks: Look for overrepresented motif position peaks in a set of...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/motif_peaks.R

Description

Using the motif position data from scan_sequences() (or elsewhere), test whether certain positions in the sequences have significantly higher motif density.

Usage

1
2
motif_peaks(hits, seq.length, seq.count, bandwidth, max.p = 1e-06,
  peak.width = 3, nrand = 100, plot = TRUE, BP = FALSE)

Arguments

hits

numeric A vector of sequence positions indicating motif sites.

seq.length

numeric(1) Length of sequences. Only one number is allowed, as all sequences must be of identical length. If missing, then the largest number from hits is used.

seq.count

numeric(1) Number of sequences with motif sites. If missing, then the number of unique values in hits is used.

bandwidth

numeric(1) Peak smoothing parameter. Smaller numbers will result in skinnier peaks, larger numbers will result in wider peaks. Leaving this empty will cause motif_peaks() to generate one by itself (see 'details').

max.p

numeric(1) Maximum P-value allowed for finding significant motif site peaks.

peak.width

numeric(1) Minimum peak width. A peak is defined as as the highest point within the value set by peak.width.

nrand

numeric(1) Number of random permutations for generating a null distribution. In order to calculate P-values, a set of random motif site positions are generated nrand times.

plot

logical(1) Will create a ggplot2 object displaying motif peaks.

BP

logical(1) Allows for the use of BiocParallel within motif_peaks(). See BiocParallel::register() to change the default backend. Setting BP = TRUE is only recommended for exceptionally large jobs. Keep in mind that this function will not attempt to limit its memory usage.

Details

Kernel smoothing is used to calculate motif position density. The implementation for this process is based on code from the KernSmooth R package \insertCitekernuniversalmotif. These density estimates are used to determine peak locations and heights. To calculate the P-values of these peaks, a null distribution is calculated from peak heights of randomly generated motif positions.

If the bandwidth option is not supplied, then the following code is used (from KernSmooth):

del0 <- (1 / (4 * pi))^(1 / 10)

bandwidth <- del0 * (243 / (35 * length(hits)))^(1 / 5) * sqrt(var(hits))

Value

A DataFrame with peak positions and P-values. If plot = TRUE, then a list is returned with the DataFrame as the first item and the ggplot2 object as the second item.

Author(s)

Benjamin Jean-Marie Tremblay, b2tremblay@uwaterloo.ca

References

\insertRef

kernuniversalmotif

See Also

scan_sequences()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(ArabidopsisMotif)
data(ArabidopsisPromoters)
if (R.Version()$arch != "i386") {
hits <- scan_sequences(ArabidopsisMotif, ArabidopsisPromoters, RC = FALSE)
res <- motif_peaks(as.vector(hits$start), 1000, 50)
# View plot:
res$Plot

# The raw plot data can be found in:
res$Plot$data
}

universalmotif documentation built on April 8, 2021, 6 p.m.