runAme: Motif enrichment using AME

View source: R/generics.R

runAme.listR Documentation

Motif enrichment using AME

Description

AME identifies known motifs (provided by the user) that are enriched in your input sequences.

Usage

## S3 method for class 'list'
runAme(
  input,
  control = "shuffle",
  outdir = "auto",
  method = "fisher",
  database = NULL,
  meme_path = NULL,
  sequences = FALSE,
  silent = TRUE,
  ...
)

## S3 method for class 'BStringSetList'
runAme(
  input,
  control = "shuffle",
  outdir = "auto",
  method = "fisher",
  database = NULL,
  meme_path = NULL,
  sequences = FALSE,
  silent = TRUE,
  ...
)

## Default S3 method:
runAme(
  input,
  control = "shuffle",
  outdir = "auto",
  method = "fisher",
  database = NULL,
  meme_path = NULL,
  sequences = FALSE,
  silent = TRUE,
  ...
)

runAme(
  input,
  control = "shuffle",
  outdir = "auto",
  method = "fisher",
  database = NULL,
  meme_path = NULL,
  sequences = FALSE,
  silent = TRUE,
  ...
)

Arguments

input

path to fasta, or DNAstringset (optional: DNAStringSet object names contain fasta score, required for partitioning mode)

control

default: "shuffle", or set to DNAstringset or path to fasta file to use those sequences in discriminative mode. If input is a list of DNAStringSet objects, and control is set to a character vector of names in input, those regions will be used as background in discriminitive mode and AME will skip running on any control entries (NOTE: if input contains an entry named "shuffle" and control is set to "shuffle", it will use the input entry, not the AME shuffle algorithm). If control is a Biostrings::BStringSetList (generated using get_sequence), all sequences in the list will be combined as the control set. Set to NA for partitioning based on input fasta score (see get_sequence() for assigning fasta score). If input sequences are not assigned fasta scores but AME is run in partitioning mode, the sequence order will be used as the score.

outdir

Path to output directory location to save data files. If set to "auto", will use location of input files if passing file paths, otherwise will write to a temporary directory. default: "auto"

method

default: fisher (allowed values: fisher, ranksum, pearson, spearman, 3dmhg, 4dmhg)

database

path to .meme format file, universalmotif list object, dreme results data.frame, or list() of multiple of these. If objects are assigned names in the list, that name will be used as the database id in the output. It is highly recommended you set a name if not using a file path so the database name will be informative, otherwise the position in the list will be used as the database id. For example, if the input is: list("motifs.meme", list_of_motifs), the database id's will be: "motifs.meme" and "2". If the input is list("motifs.meme", "customMotifs" = list_of_motifs), the database id's will be "motifs.meme" and "customMotifs".

meme_path

path to "meme/bin/" (default: NULL). Will use default search behavior as described in check_meme_install() if unset.

sequences

logical(1) add results from sequences.tsv to sequences list column to returned data.frame. Valid only if method = "fisher". See AME outputs webpage for more information (Default: FALSE).

silent

whether to suppress stdout (default: TRUE), useful for debugging.

...

additional arguments passed to AME (see AME Flag table below)

Details

AME can be run using several modes:

  1. Discriminative mode: to discover motifs enriched relative to shuffled input, or a set of provided control sequences

  2. Partitioning mode: in which AME uses some biological signal to identify the association between the signal and motif presence.

To read more about how AME works, see the AME Tutorial

Additional AME arguments

memes allows passing any valid flag to it's target programs via .... For additional details for all valid AME arguments, see the AME Manual webpage. For convenience, a table of valid parameters, and brief descriptions of their function are provided below:

AME Flag allowed values default description
kmer integer(1) 2 kmer frequency to preserve when shuffling control sequences
seed integer(1) 1 seed for random number generator when shuffling control sequences
scoring "avg", "max", "sum", "totalhits" "avg" Method for scoring a sequence for matches to a PWM (avg, max, sum, totalhits)
hit_lo_fraction numeric(1) 0.25 fraction of hit log odds score to exceed to be considered a "hit"
evalue_report_threshold numeric(1) 10 E-value threshold for reporting a motif as significantly enriched
fasta_threshold numeric(1) 0.001 AME will classify sequences with FASTA scores below this value as positives. Only valid when method = "fisher", poslist = "pwm", control = NA, fix_partition = NULL.
fix_partition numeric(1) NULL AME evaluates only the partition of the first N sequences. Only works when control = NA and poslist = "fasta"
poslist "pwm", "fasta" "fasta" When using paritioning mode (control = NA), test thresholds on either PWM or Fasta score
log_fscores logical(1) FALSE Convert FASTA scores into log-space (only used when method = "pearson")
log_pwmscores logical(1) FALSE Convert PWM scores into log-space (only used for method = "pearson" or method = "spearman)
lingreg_switchxy logical(1) FALSE Make the x-points FASTA scores and y-points PWM scores (only used for method = "pearson" or method = "spearman)
xalph file path NULL(1) alphabet file to use if input motifs are in different alphabet than input sequences
bfile "motif", "motif-file", "uniform", path to file NULL source of 0-order background model. If "motif" or "motif-file" 0-order letter frequencies in the first motif file are used. If "uniform" uses uniform letter frequencies.
motif_pseudo numeric(1) 0.1 Addd this pseudocount when converting from frequency matrix to log-odds matrix
inc character(1) NULL use only motifs with names matching this regex
exc character(1) NULL exclude motifs with names matching this regex

Value

a data.frame of AME enrichment results. If run using a BStringsSetList input, returns a list of data.frames.

Citation

If you use runAme() in your analysis, please cite:

Robert McLeay and Timothy L. Bailey, "Motif Enrichment Analysis: A unified framework and method evaluation", BMC Bioinformatics, 11:165, 2010, doi:10.1186/1471-2105-11-165. full text

Licensing

The MEME Suite is free for non-profit use, but for-profit users should purchase a license. See the MEME Suite Copyright Page for details.

Examples

if (meme_is_installed()) {
# Create random named sequences as input for example
seqs <- universalmotif::create_sequences(rng.seed = 123)
names(seqs) <- seq_along(seqs)

# An example path to a motif database file in .meme format
motif_file <- system.file("extdata", "flyFactorSurvey_cleaned.meme", package = "memes")

runAme(seqs, database = motif_file)

# Dreme results dataset for example
dreme_xml <- system.file("extdata", "dreme.xml", package = "memes")
dreme_results <- importDremeXML(dreme_xml)

# database can be set to multiple values like so: 
runAme(seqs, database = list(motif_file, "my_dreme_motifs" = dreme_results))
}

snystrom/memes documentation built on Feb. 6, 2023, 2:57 a.m.