motifbreakR: Predict The Disruptiveness Of Single Nucleotide Polymorphisms...
In motifbreakR: A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

Description Usage Arguments Details Value See Also Examples

Predict The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites.

motifbreakR(
  snpList,
  pwmList,
  threshold = 0.85,
  filterp = FALSE,
  method = "default",
  show.neutral = FALSE,
  verbose = FALSE,
  bkg = c(A = 0.25, C = 0.25, G = 0.25, T = 0.25),
  legacy.score = TRUE,
  BPPARAM = bpparam()
)

`snpList`	The output of `snps.from.rsid` or `snps.from.file`
`pwmList`	An object of class `MotifList` containing the motifs that you wish to interrogate
`threshold`	Numeric; the maximum p-value for a match to be called or a minimum score threshold
`filterp`	Logical; filter by p-value instead of by pct score.
`method`	Character; one of `default`, `log`, `ic`, or `notrans`; see details.
`show.neutral`	Logical; include neutral changes in the output
`verbose`	Logical; if running serially, show verbose messages
`bkg`	Numeric Vector; the background probabilites of the nucleotides used with method=`log` method=`ic`
`BPPARAM`	a BiocParallel object see `register` and see `getClass("BiocParallelParam")` for additional parameter classes. Try `BiocParallel::registered()` to see what's availible and for example `BiocParallel::bpparam("SerialParam")` would allow serial evaluation.

motifbreakR works with position probability matrices (PPM). PPM are derived as the fractional occurrence of nucleotides A,C,G, and T at each position of a position frequency matrix (PFM). PFM are simply the tally of each nucleotide at each position across a set of aligned sequences. With a PPM, one can generate probabilities based on the genome, or more practically, create any number of position specific scoring matrices (PSSM) based on the principle that the PPM contains information about the likelihood of observing a particular nucleotide at a particular position of a true transcription factor binding site. What follows is a discussion of the three different algorithms that may be employed in calls to the motifbreakR function via the method argument.

Suppose we have a frequency matrix M of width n (i.e. a PPM as described above). Furthermore, we have a sequence s also of length n, such that s_i in {A,T,G,C}, i = 1 … n. Each column of M contains the frequencies of each letter in each position.

Commonly in the literature sequences are scored as the sum of log probabilities:

Equation 1

F( s,M ) = ∑_(i = 1)^n log ((M_s_i,_i)/b_s_i)

where b_s_i is the background frequency of letter s_i in the genome of interest. This method can be specified by the user as method='log'.

As an alternative to this method, we introduced a scoring method to directly weight the score by the importance of the position within the match sequence. This method of weighting is accessed by specifying method='ic' (information content). A general representation of this scoring method is given by:

Equation 2

F( s,M ) = p_s . ω_M

where p_s is the scoring vector derived from sequence s and matrix M, and w_M is a weight vector derived from M. First, we compute the scoring vector of position scores p

Equation 3

p_s = ( M_s_i,_i ) where (i = 1 … n)/(s_i in {A,C,G,T})

and second, for each M a constant vector of weights ω_M = ( ω_1, ω_2, …, ω_n).

There are two methods for producing ω_M. The first, which we call weighted sum, is the difference in the probabilities for the two letters of the polymorphism (or variant), i.e. Δ p_s_i, or the difference of the maximum and minimum values for each column of M:

Equation 4.1

ω_i = max{M_i} - min{M_i} where i = 1 … n

The second variation of this theme is to weight by relative entropy. Thus the relative entropy weight for each column i of the matrix is given by:

Equation 4.2

ω_i = ∑_{j in {A,C,G,T}} {M_(j,i)} log2(M_(j,i)/b_i) where i = 1 … n

where b_i is again the background frequency of the letter i.

Thus, there are 3 possible algorithms to apply via the method argument. The first is the standard summation of log probabilities (method='log'). The second and third are the weighted sum and information content methods (method='default' and method='ic') specified by equations 4.1 and 4.2, respectively. motifbreakR assumes a uniform background nucleotide distribution (b) in equations 1 and 4.2 unless otherwise specified by the user. Since we are primarily interested in the difference between alleles, background frequency is not a major factor, although it can change the results. Additionally, inclusion of background frequency introduces potential bias when collections of motifs are employed, since motifs are themselves unbalanced with respect to nucleotide composition. With these cautions in mind, users may override the uniform distribution if so desired. For all three methods, motifbreakR scores and reports the reference and alternate alleles of the sequence (F( s_ref,M ) and F( s_alt,M )), and provides the matrix scores p_s_ref and p_s_alt of the SNP (or variant). The scores are scaled as a fraction of scoring range 0-1 of the motif matrix, M. If either of F( s_ref,M ) and F( s_alt,M ) is greater than a user-specified threshold (default value of 0.85) the SNP is reported. By default motifbreakR does not display neutral effects, (Δ p_i < 0.4) but this behaviour can be overridden.

Additionally, now, with the use of TFMPvalue-package, we may filter by p-value of the match. This is unfortunately a two step process. First, by invoking filterp=TRUE and setting a threshold at a desired p-value e.g 1e-4, we perform a rough filter on the results by rounding all values in the PWM to two decimal place, and calculating a scoring threshold based upon that. The second step is to use the function calculatePvalue() on a selection of results which will change the Refpvalue and Altpvalue columns in the output from NA to the p-value calculated by TFMsc2pv. This can be (although not always) a very memory and time intensive process if the algorithm doesn't converge rapidly.

a GRanges object containing:

`REF`	the reference allele for the SNP
`ALT`	the alternate allele for the SNP
`snpPos`	the coordinates of the SNP
`motifPos`	the coordinates of the SNP within the TF binding motif
`geneSymbol`	the geneSymbol corresponding to the TF of the TF binding motif
`dataSource`	the source of the TF binding motif
`providerName, providerId`	the name and id provided by the source
`seqMatch`	the sequence on the 5' -> 3' direction of the "+" strand that corresponds to DNA at the position that the TF binding motif was found.
`pctRef`	The score as determined by the scoring method, when the sequence contains the reference SNP allele, normalized to a scale from 0 - 1. If `filterp = FALSE`, this is the value that is thresholded.
`pctAlt`	The score as determined by the scoring method, when the sequence contains the alternate SNP allele, normalized to a scale from 0 - 1. If `filterp = FALSE`, this is the value that is thresholded.
`scoreRef`	The score as determined by the scoring method, when the sequence contains the reference SNP allele
`scoreAlt`	The score as determined by the scoring method, when the sequence contains the alternate SNP allele
`Refpvalue`	p-value for the match for the pctRef score, initially set to `NA`. see `calculatePvalue` for more information
`Altpvalue`	p-value for the match for the pctAlt score, initially set to `NA`. see `calculatePvalue` for more information
`alleleRef`	The proportional frequency of the reference allele at position `motifPos` in the motif
`alleleAlt`	The proportional frequency of the alternate allele at position `motifPos` in the motif
`effect`	one of weak, strong, or neutral indicating the strength of the effect.

each SNP in this object may be plotted with plotMB

See snps.from.rsid and snps.from.file for information about how to generate the input to this function and plotMB for information on how to visualize it's output

 library(BSgenome.Hsapiens.UCSC.hg19)
 # prepare variants
 load(system.file("extdata",
                  "pca.enhancer.snps.rda",
                  package = "motifbreakR")) # loads snps.mb
 pca.enhancer.snps <- sample(snps.mb, 20)
 # Get motifs to interrogate
 data(hocomoco)
 motifs <- sample(hocomoco, 50)
 # run motifbreakR
 results <- motifbreakR(pca.enhancer.snps,
                        motifs, threshold = 0.85,
                        method = "ic",
                        BPPARAM=BiocParallel::SerialParam())

motifbreakR documentation built on Nov. 8, 2020, 5:31 p.m.

motifbreakR index

README.md motifbreakR: an Introduction

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

motifbreakR
A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

motifbreakR: Predict The Disruptiveness Of Single Nucleotide Polymorphisms...
In motifbreakR: A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to motifbreakR in motifbreakR...

R Package Documentation

Browse R Packages

We want your feedback!

motifbreakR A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

motifbreakR: Predict The Disruptiveness Of Single Nucleotide Polymorphisms... In motifbreakR: A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to motifbreakR in motifbreakR...

R Package Documentation

Browse R Packages

We want your feedback!

motifbreakR
A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites

motifbreakR: Predict The Disruptiveness Of Single Nucleotide Polymorphisms...
In motifbreakR: A Package For Predicting The Disruptiveness Of Single Nucleotide Polymorphisms On Transcription Factor Binding Sites