runMotifDiscovery: runMotifDiscovery

Description Usage Arguments Value Examples

Description

This function builds a random forest classifier to find the top most discriminative motifs in the query regions compared to the background. The background sequences are automatically generated based on the query regions. First, k-mers of a fixed length are generated. The query and control sequences are searched for k-mers allowing for mismatches. A random forest model is trained to find the most discriminative motifs.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
runMotifDiscovery(
  queryRegions,
  resizeN = 0,
  motifWidth = 6,
  sampleN = 0,
  genomeVersion,
  maxMismatch = 1,
  motifN = 5,
  nCores = 1
)

Arguments

queryRegions

GRanges object containing coordinates of input query regions imported by the importBed function

resizeN

Integer value (default: 0) to resize query regions if they are shorter than the value of resize. Set to 0 to disable resize.

motifWidth

A Positive integer (default: 6) for the generated k-mers. Warning: we recommend using values below 10 as the computation gets exponentially difficult as the motif width is increased.

sampleN

A positive integer value. The queryRegions are randomly downsampled to include intervals as many as sampleN. The input will be downsampled only if this value is larger than zero and less than the total number of input intervals.

genomeVersion

A character string to denote the BS genome library required to extract sequences. Example: 'hg19'

maxMismatch

A positive integer (default: 1) - maximum number of mismatches to allow when searching for k-mer matches in sequences.

motifN

A positive integer (default:5) denoting the maximum number of motifs that should be returned by the findDifferentialMotifs function

nCores

A positive integer (default:1) number of cores used for parallel execution.

Value

A list of four objects: k-mer count matrices for query and background and lists of string matches for the top discriminating motifs (motifN).

Examples

1
2
3
4
5
6
7
8
data(queryRegions)
motifResults <- runMotifDiscovery(queryRegions = queryRegions[1:1000],
                                  genomeVersion = 'hg19',
                                  motifWidth = 6, 
                                  resize = 15,
                                  motifN = 1,
                                  maxMismatch = 1,
                                  nCores = 1)

RCAS documentation built on Nov. 8, 2020, 8:03 p.m.