getmotifs: Jointly call and refine a set of seed motifs provided by the...

Description Usage Arguments Details Value

Description

Jointly call and refine a set of seed motifs provided by the findamotif function

Usage

1
2
3
4
5
getmotifs(scorematset, dimvec, seqs, maxwidth = 800, alpha = 0.5,
  incprob = 0.99999, maxits = 30, plen = 0.05, updatemot = 1,
  updatealpha = 1, ourprior = NULL, updateprior = 1, bg = -1,
  dt = T, allowinf = FALSE, seed = NULL, verbosity = 1,
  stranded_prior = F, conv_t = 0.05, conv_n = 200)

Arguments

scorematset

is a set of matrices, row-concatenated, giving pwms (log-scale) for the initialisation of the algorithm. scorematset is of dimension sum(dimvec) rows by 4 columns. and the first dimvec[1] rows of this matrix gives the pwm for the first motifs, the next dimvec[2] rows the second motif, and so on

dimvec

gives the lengths of each of the initial motifs. If dimvec is of length n_motifs, motif k is of length dimvec[k]

seqs

a vector of input sequences used for finding motifs within. Lower case bases are ignored/masked - e.g. if repeats are an issue. In some cases it may be helpful NOT to mask repeats that may contain motif matches

maxwidth

the length that elements of "seqs" will be trimmed to (around their centre). Run times depend roughly linearly on this parameter

alpha

a vector of initial assumed probabilities each motif is present in a sequence

incprob

can usually be left as default value

maxits

the number of iterations (if no motif is found the algorithm could terminate early)

plen

a parameter setting the geometric prior on how long each motif found should be. plen=0.05 corresponds to a mean length of 20bp and is the default. Setting plen large penalises longer motifs more

updatemot

a flag - should the algorithm update (learn) the initial motifs (default is 1)

updatealpha

a flag - should the algorithm update (learn) the initial motifs (default is 1)

ourprior

a vector of length 10 probabilities giving the initial probability of a motif being found across different parts of the sequence from start:end. If left unspecified the initial prior is set at uniform and the algorithm tries to learn where motifs are, e.g. if they are centrally enriched.

updateprior

a flag - should the algorithm update (learn) the prior on where the motifs occur within the DNA sequences(default is 1)

bg

should be left at default value normally (technical parameter setting background model)

dt

logical; should a data table of the results be returned

allowinf

a flag - should infinite values be allowed in scoremat (not recommended, default is FALSE).

seed

integer; seed for random number generation, set this for exactly reproducible results.

verbosity

integer; How verbose should this function be? 0=silent, 3=everything.

Details

Given a user-input set of initial PWMs and input sequences to identify motifs, run a Gibbs sampler to update these motifs, and output the results

The user can also optionally supply priors on the fraction of sequences containing motifs, the likely length of motifs, and the positional distribution of motifs within the sequences.

User-supplied information can either be updated (the default) by the algorithm, or fixed at the input values

The program outputs a list of results, including information on inferred PWMs (i.e. motifs found), as well as a probabilistic output of which regions contain which motifs, and posterior distributions of the other parameters

If you use this program, please cite Altemose et al. eLife 2017

Value

The code returns detailed output as a list, whose elements are as follows (access these using commands like outputlist$scoremat)

Details of input data given:

Details of overall fitted model:

Details of output for given data


MyersGroup/MotifFinder documentation built on June 7, 2019, 3:42 p.m.