Description Usage Arguments Details Value
Find the single most enriched motif in a set of DNA sequences
1 2 3 4 5 |
seqs |
a vector of strings giving the DNA sequences in which to find a motif |
len |
length of motif to find (min=4) |
scores |
a set of regional scores giving weights; e.g. ChIP-Seq enrichment values |
nits |
number of iterations used for motif refinement |
n_for_refine |
the top n_for_refine scoring regions only are used for motif refinement |
prior |
a vector of length 10 probabilities giving the initial probability of a motif being found across different parts of the sequence from start:end. If left unspecified the initial prior is set at uniform and the algorithm tries to learn where motifs are, e.g. if they are centrally enriched. |
updateprior |
a flag - should the algorithm update (learn) the prior on where the motifs occur within the DNA sequences(default is 1)? |
plen |
a parameter setting the geometric prior on how long each motif found should be. plen=0.05 corresponds to a mean length of 20bp and is the default. Setting plen large penalises longer motifs more |
seed |
integer; seed for random number generation, set this for exactly reproducible results. |
verbosity |
integer; How verbose should this function be? 0=silent, 3=everything. |
motif_rank |
integer; which rank of seed motif to use (1st seed motif, 2nd etc.) |
motif_blacklist |
charachter vector; motifs not to use as seed motif |
range |
integer; range around center to check for central enrichment |
motif_seed |
string; "central", "modal", "random", or a string e.g. "ACGTGAC" |
This function identifies a single PWM from an iterative Gibbs sampler described in Altemose et al. eLife 2017. Function 2 can refine multiple motifs further, jointly.
The user must input a set of DNA sequences, a score for each sequence (e.g. an enrichment value or any other score), and a length for an initial motif (e.g. 8 bp) used to seed the algorithm.
There are additional optional parameters.
The program outputs a list of results, including information on the inferred PWM (i.e. motif found), as well as a probabilistic output of which regions contain this motif, and posterior distributions of the other parameters
List item with the following items:
Details of input data given:
seqs: the vector of input sequences used for finding motifs within
trimmedseqs: the vector of input sequences used for finding motifs within, after trimming to shorten long input sequences
Details of overall fitted model:
scoremat: a matrix giving the pwm (log-scale) for the identified motif after iteration
scorematdim: the length of the identified motif, and scoremat is of dimension scorematdimx4
prior: a vector of length 10 probabilities giving the inferred probability of a motif being found across different parts of the sequence from start to end.
alpha: a vector of probabilities giving the inferred probability of the motif being found within each input region
bindmat: a version of scoremat accounting for the background sequence composition
background is the inferred background model
Details of output for given data:
regprobs, regprob are in this case identical vectors giving the probability of the motif occurring in each given input sequence
bestpos is a vector giving the best match to the motif in each given input sequence
whichregs is a vector showing which input sequences had motifs identified in the final round of sampling of the Gibbs sampler
whichpos: for motifs identified in regions described in whichreg, the start positions of motifs identified in the final round of sampling of the Gibbs sampler
whichmot: not needed in this case
whichstrand: for motifs identified in regions described in whichreg, the strand associated with motifs identified in the final round of sampling of the Gibbs sampler, relative to the input sequence
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.