findamotif: Find the single most enriched motif in a set of DNA sequences

Description Usage Arguments Details Value

Description

Find the single most enriched motif in a set of DNA sequences

Usage

1
2
3
4
5
findamotif(seqs, len, scores = NULL, nits = 50, scoring_its = 5,
  n_for_refine = 1000, prior = NULL, updateprior = 1, plen = 0.9,
  seed = NULL, verbosity = 1, motif_rank = 1,
  motif_blacklist = NULL, range = 50, stranded_prior = F,
  motif_seed = "central", conv_t = 0, conv_n = 200)

Arguments

seqs

a vector of strings giving the DNA sequences in which to find a motif

len

length of motif to find (min=4)

scores

a set of regional scores giving weights; e.g. ChIP-Seq enrichment values

nits

number of iterations used for motif refinement

n_for_refine

the top n_for_refine scoring regions only are used for motif refinement

prior

a vector of length 10 probabilities giving the initial probability of a motif being found across different parts of the sequence from start:end. If left unspecified the initial prior is set at uniform and the algorithm tries to learn where motifs are, e.g. if they are centrally enriched.

updateprior

a flag - should the algorithm update (learn) the prior on where the motifs occur within the DNA sequences(default is 1)?

plen

a parameter setting the geometric prior on how long each motif found should be. plen=0.05 corresponds to a mean length of 20bp and is the default. Setting plen large penalises longer motifs more

seed

integer; seed for random number generation, set this for exactly reproducible results.

verbosity

integer; How verbose should this function be? 0=silent, 3=everything.

motif_rank

integer; which rank of seed motif to use (1st seed motif, 2nd etc.)

motif_blacklist

charachter vector; motifs not to use as seed motif

range

integer; range around center to check for central enrichment

motif_seed

string; "central", "modal", "random", or a string e.g. "ACGTGAC"

Details

This function identifies a single PWM from an iterative Gibbs sampler described in Altemose et al. eLife 2017. Function 2 can refine multiple motifs further, jointly.

The user must input a set of DNA sequences, a score for each sequence (e.g. an enrichment value or any other score), and a length for an initial motif (e.g. 8 bp) used to seed the algorithm.

There are additional optional parameters.

The program outputs a list of results, including information on the inferred PWM (i.e. motif found), as well as a probabilistic output of which regions contain this motif, and posterior distributions of the other parameters

Value

List item with the following items:
Details of input data given:

Details of overall fitted model:

Details of output for given data:


MyersGroup/MotifFinder documentation built on June 7, 2019, 3:42 p.m.