runFimo: Find instances of motifs using FIMO

View source: R/fimo.R

runFimoR Documentation

Find instances of motifs using FIMO

Description

FIMO scans input sequences to identify the positions of matches to each input motif. FIMO has no sequence length or motif number restrictions.

Usage

runFimo(
  sequences,
  motifs,
  bfile = "motif",
  outdir = "auto",
  parse_genomic_coord = TRUE,
  skip_matched_sequence = FALSE,
  max_strand = TRUE,
  text = TRUE,
  meme_path = NULL,
  silent = TRUE,
  ...
)

Arguments

sequences

path to fasta file, or stringset input.

motifs

path to .meme format file, or universalmotif/universalmotif list input.

bfile

path to background file, or special values: "motif" to use 0-order frequencies contained in the motif, or "uniform" to use a uniform letter distribution. (default: "motif")

outdir

output directory location. Only used if text = FALSE. Default: "auto" to autogenerate directory name. Note: if not using a fasta path as input, this will be a temporary location unless explicity set.

parse_genomic_coord

logical(1) whether to parse genomic position from fasta headers. Fasta headers must be UCSC format positions (ie "chr:start-end"), but base 1 indexed (GRanges format). If names of fasta entries are genomic coordinates and parse_genomic_coord == TRUE, results will contain genomic coordinates of motif matches, otherwise FIMO will return relative coordinates (i.e. positions from 1 to length of the fasta entry).

skip_matched_sequence

logical(1) whether or not to include the DNA sequence of the match. Default: FALSE. Note: jobs will complete faster if set to TRUE. add_sequence() can be used to lookup the sequence after data import if parse_genomic_coord is TRUE, so setting this flag is not strictly needed.

max_strand

if match is found on both strands, only report strand with best match (default: TRUE).

text

logical(1) (default: TRUE). No output files will be created on the filesystem. The results are unsorted and no q-values are computed. This setting allows fast searches on very large inputs. When set to FALSE FIMO will discard 50% of the lower significance matches if >100,000 matches are detected. text = FALSE will also incur a performance penalty because it must first read a file to disk, then read it into memory. For these reasons, I suggest keeping text = TRUE.

meme_path

path to ⁠meme/bin/⁠ (optional). Defaut: NULL, searches "MEME_PATH" environment variable or "meme_path" option for path to "meme/bin/".

silent

logical(1) whether to suppress stdout/stderr printing to console (default: TRUE). If the command is failing or giving unexpected output, setting silent = FALSE can aid troubleshooting.

...

additional commandline arguments to fimo. See the FIMO Flag table below.

Details

Additional arguments passed to .... See: Fimo web manual for a complete description of FIMO flags.

FIMO Flag allowed values default description
alpha numeric(1) 1 alpha for calculating position-specific priors. Represents fraction of sites that are binding sites of TF of interest. Used in conjunction with psp
bfile "motif", "motif-file", "uniform", file path, "motif" If "motif" or "motif-file", use 0-order letter frequencies from motif. "uniform" sets uniform letter frequencies.
max_stored_scores integer(1) NULL maximum number of scores to be stored for computing q-values. used when text = FALSE, see FIMO webpage for details
motif_pseudo numeric(1) 0.1 pseudocount added to motif matrix
no_qvalue logical(1) FALSE only needed when text = FALSE, do not compute q-value for each p-value
norc logical(1) FALSE Do not score reverse complement strand
prior_dist file path NULL file containing binned distribution of priors
psp file path NULL file containing position specific priors. Requires prior_dist
qv_thresh logical(1) FALSE use q-values for the output threshold
thresh numeric(1) 1e-4 output threshold for returning a match, only matches with values less than thresh are returned.

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Value

GRanges object containing positions of each match. Note: if parse_genomic_coords = FALSE, each seqnames entry will be the full fasta header, and start/end will be the relative position within that sequence of the match. The GRanges object has the following additional mcols: * motif_id = primary name of matched motif * motif_alt_id = alternate name of matched motif * score = score of match (higher score is a better match) * pvalue = pvalue of the match * qvalue = qvalue of the match * matched_sequence = sequence that matches the motif

Citation

If you use runFimo() in your analysis, please cite:

Charles E. Grant, Timothy L. Bailey, and William Stafford Noble, "FIMO: Scanning for occurrences of a given motif", Bioinformatics, 27(7):1017-1018, 2011. full text

Examples

if (meme_is_installed()){
# Generate some example input sequences
seq <- universalmotif::create_sequences()
# sequences must have values in their fasta headers
names(seq) <- seq_along(seq)
# Create random example motif to search for
motif <- universalmotif::create_motif()

# Search for motif in sequences
# parse_genomic_coord set to FALSE since fasta headers aren't in "chr:start-end" format.
runFimo(seq, motif, parse_genomic_coord = FALSE)
}

snystrom/dremeR documentation built on Oct. 13, 2024, 10:48 p.m.