motifScanHits: Occurrence of motifs in a set of ordered sequences

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Finds positions of sequence motif hits above a specified threshold in a list of sequences of the same length ordered by a provided index. Motif is specified by a position weight matrix (PWM) that contains estimated probability of base b at position i and is usually constructed via call to PWM function. Position of each motif hit is specified in two-dimensional matrix, i.e. the first coordinate provides the ordinal number of the sequence and the second coordinate gives the position within the sequence where the motif occurs.

Usage

1
2
motifScanHits(regionsSeq, motifPWM, minScore = "80%",
    seqOrder = c(1:length(regionsSeq)))

Arguments

regionsSeq

A DNAStringSet object. Set of sequences of the same length in which to search for the motif hits.

motifPWM

A numeric matrix representing the Position Weight Matrix (PWM), such as returned by PWM function. Can contain either probabilities or log2 probability ratio of base b at position i.

minScore

The minimum score for counting a motif hit. Can be given as a character string containing a percentage (e.g. "85%") of the PWM score or a single number specifying score threshold. If a percentage is given, it is converted to a score value taking into account both minimal and maximal possible PWM scores as follows: minPWMscore + percThreshold/100 * (maxPWMscore - minPWMscore) This differs from the formula in the matchPWM function from the Biostrings package which takes into account only the maximal possible PWM score and considers the given percentage as the percentage of that maximal score: percThreshold/100 * maxPWMscore

seqOrder

Integer vector specifying the order of the provided input sequences. Must have the same length as the number of sequences in the regionSeq. The default value will order the sequences as they are ordered in the input regionSeq object.

Details

This function uses the matchPWM function to find matches to given motif in a set of input sequences. Only matches above specified minScore are considered as hits. Input sequences must all be of the same length and are ordered according to the index provided in the seqOrder argument, creating a n * m matrix, where n is the number of sequences and m is the length of the sequences. Positions of motif hits in the resulting matrix are returned as two-dimensional coordinates.

Value

The function returns a data.frame with positions of the motif hits in the set of input sequences. The input sequences of the same length are sorted according to the index in seqOrder argument and the positions of motif hits in the resulting n * m matrix (where n is the number of sequences and m is the length of the sequence) are provided. The sequence column in the data.frame provides the ordinal number of the sequence in the ordered list of sequences and the position column provides the start position of the motif hit within that sequence.

Author(s)

Vanja Haberle

See Also

plotMotifDensityMap
getPatternOccurrenceList

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
library(GenomicRanges)
load(system.file("data", "zebrafishPromoters.RData", package="seqPattern"))
promoterWidth <- elementMetadata(zebrafishPromoters)$interquantileWidth

load(system.file("data", "TBPpwm.RData", package="seqPattern"))

motifOccurrence <- motifScanHits(regionsSeq = zebrafishPromoters,
                                motifPWM = TBPpwm, minScore = "85%",
                                seqOrder = order(promoterWidth))
head(motifOccurrence)

Example output

Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
There were 12 warnings (use warnings() to see them)
  sequence position value
1        1       76     1
2        1      227     1
3        1      288     1
4        1      290     1
5        1      298     1
6        1      643     1

seqPattern documentation built on Nov. 8, 2020, 7:52 p.m.