runMotifDiscovery: runMotifDiscovery
In RCAS: RNA Centric Annotation System

Description Usage Arguments Value Examples

This function builds a random forest classifier to find the top most discriminative motifs in the query regions compared to the background. The background sequences are automatically generated based on the query regions. First, k-mers of a fixed length are generated. The query and control sequences are searched for k-mers allowing for mismatches. A random forest model is trained to find the most discriminative motifs.

runMotifDiscovery(
  queryRegions,
  resizeN = 0,
  motifWidth = 6,
  sampleN = 0,
  genomeVersion,
  maxMismatch = 1,
  motifN = 5,
  nCores = 1
)

`queryRegions`	GRanges object containing coordinates of input query regions imported by the `importBed` function
`resizeN`	Integer value (default: 0) to resize query regions if they are shorter than the value of `resize`. Set to 0 to disable resize.
`motifWidth`	A Positive integer (default: 6) for the generated k-mers. Warning: we recommend using values below 10 as the computation gets exponentially difficult as the motif width is increased.
`sampleN`	A positive integer value. The queryRegions are randomly downsampled to include intervals as many as `sampleN`. The input will be downsampled only if this value is larger than zero and less than the total number of input intervals.
`genomeVersion`	A character string to denote the BS genome library required to extract sequences. Example: 'hg19'
`maxMismatch`	A positive integer (default: 1) - maximum number of mismatches to allow when searching for k-mer matches in sequences.
`motifN`	A positive integer (default:5) denoting the maximum number of motifs that should be returned by the `findDifferentialMotifs` function
`nCores`	A positive integer (default:1) number of cores used for parallel execution.

A list of four objects: k-mer count matrices for query and background and lists of string matches for the top discriminating motifs (motifN).

data(queryRegions)
motifResults <- runMotifDiscovery(queryRegions = queryRegions[1:1000],
                                  genomeVersion = 'hg19',
                                  motifWidth = 6, 
                                  resize = 15,
                                  motifN = 1,
                                  maxMismatch = 1,
                                  nCores = 1)