getMotifs: Screen target sequences for recurrent motifs

View source: R/getMotifs.R

getMotifsR Documentation

Screen target sequences for recurrent motifs

Description

The function getMotifs() scans the target sequences for the presence of recurrent motifs of a specific length defined in input. By setting rbp equals to TRUE, the identified motifs are matched with motifs of known RNA Binding Proteins (RBPs) deposited in the ATtRACT (http://attract.cnic.es) or MEME database (http://meme-suite.org/) and with motifs specified by the user. The user motifs must go in the file motifs.txt. If this file is absent or empty, only motifs from the ATtRACT or MEME database are considered in the analysis. By setting rbp equals to FALSE, only motifs that do not match with any motifs deposited in the databases or user motifs are reported in the final output. Location of the selected motifs is also reported. This corresponds to the start position of the motif within the sequence (1-index based).

Usage

getMotifs(
  targets,
  width = 6,
  database = "ATtRACT",
  species = "Hsapiens",
  memeIndexFilePath = 18,
  rbp = TRUE,
  reverse = FALSE,
  pathToMotifs = NULL
)

Arguments

targets

A list containing the target sequences to analyze. It can be generated with getCircSeqs, getSeqsAcrossBSJs or getSeqsFromGRs.

width

An integer specifying the length of all possible motifs to extract from the target sequences. Default value is 6.

database

A string specifying the RBP database to use. Possible options are ATtRACT or MEME. Default database is "ATtRACT".

species

A string specifying the species of the ATtRACT RBP motifs to use. Type data(attractSpecies) to see the possible options. Default value is "Hsapiens".

memeIndexFilePath

An integer specifying the index of the file path of the meme file to use.Type data(memeDB) to see the possible options. Default value is 18 corresponding to the following file: motif_databases/RNA/Ray2013_rbp_Homo_sapiens.meme

rbp

A logical specifying whether to report only motifs matching with known RBP motifs from ATtRACT database or user motifs specified in motifs.txt. If FALSE is specified only motifs that do not match with any of these motifs are reported. Default values is TRUE.

reverse

A logical specifying whether to reverse the motifs collected from ATtRACT database and from motifs.txt. If TRUE is specified all the motifs are reversed and analyzed together with the direct motifs as they are reported in the ATtRACT db and motifs.txt. Default value is FALSE.

pathToMotifs

A string containing the path to the motifs.txt file. The file motifs.txt contains motifs/regular expressions specified by the user. It must have 3 columns with headers:

id:

(1st column) - name of the motif. - e.g. RBM20 or motif1).

motif:

(2nd column) -motif/pattern to search.

length:

(3rd column) - length of the motif.

By default pathToMotifs is set to NULL and the file it is searched in the working directory. If motifs.txt is located in a different directory then the path needs to be specified. If this file is absent or empty only the motifs of RNA Binding Proteins in the ATtRACT database are considered in the motifs analysis.

Value

A list.

Examples

# Load data frame containing detected back-spliced junctions
data("mergedBSJunctions")

# Load short version of the gencode v19 annotation file
data("gtf")

# Example with the first back-spliced junction
# Multiple back-spliced junctions can also be analyzed at the same time

# Annotate the first back-spliced junction
annotatedBSJs <- annotateBSJs(mergedBSJunctions[1, ], gtf)

# Get genome
if (requireNamespace("BSgenome.Hsapiens.UCSC.hg19", quietly = TRUE)){

genome <- BSgenome::getBSgenome("BSgenome.Hsapiens.UCSC.hg19")

# Retrieve target sequences
targets <- getSeqsFromGRs(
    annotatedBSJs,
    genome,
    lIntron = 200,
    lExon = 10,
    type = "ie"
    )

# Get motifs
motifs <- getMotifs(
    targets,
    width = 6,
    database = 'ATtRACT',
    species = "Hsapiens",
    rbp = TRUE,
    reverse = FALSE)

}



Aufiero/circRNAprofiler documentation built on Nov. 3, 2024, 10:12 a.m.