runMotifDiscovery | R Documentation |
This function builds a random forest classifier to find the top most discriminative motifs in the query regions compared to the background. The background sequences are automatically generated based on the query regions. First, k-mers of a fixed length are generated. The query and control sequences are searched for k-mers allowing for mismatches. A random forest model is trained to find the most discriminative motifs.
runMotifDiscovery(
queryRegions,
resizeN = 0,
motifWidth = 6,
sampleN = 0,
genomeVersion,
maxMismatch = 1,
motifN = 5,
nCores = 1
)
queryRegions |
GRanges object containing coordinates of input query
regions imported by the |
resizeN |
Integer value (default: 0) to resize query regions if they are
shorter than the value of |
motifWidth |
A Positive integer (default: 6) for the generated k-mers. Warning: we recommend using values below 10 as the computation gets exponentially difficult as the motif width is increased. |
sampleN |
A positive integer value. The queryRegions are randomly
downsampled to include intervals as many as |
genomeVersion |
A character string to denote the BS genome library required to extract sequences. Example: 'hg19' |
maxMismatch |
A positive integer (default: 1) - maximum number of mismatches to allow when searching for k-mer matches in sequences. |
motifN |
A positive integer (default:5) denoting the maximum number of
motifs that should be returned by the |
nCores |
A positive integer (default:1) number of cores used for parallel execution. |
A list of four objects: k-mer count matrices for query and background and lists of string matches for the top discriminating motifs (motifN).
data(queryRegions)
motifResults <- runMotifDiscovery(queryRegions = queryRegions[1:1000],
genomeVersion = 'hg19',
motifWidth = 6,
resize = 15,
motifN = 1,
maxMismatch = 1,
nCores = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.