toPWM-methods: toPWM method

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Converts a raw frequency matrix (PFMatrix) to a position weight matrix (PWMatrix). It takes the type, bases background frequencies, pseudocounts as parameters.

Usage

1
2
  toPWM(x, type=c("log2probratio", "prob"), pseudocounts=0.8, 
        bg=c(A=0.25, C=0.25, G=0.25, T=0.25))

Arguments

x

For toPWM, a PFMatrix, rectangular DNAStringSet object ("rectangular" means that all elements have the same number of characters) with no IUPAC ambiguity letters, a rectangular character vector or a matrix with rownames containing at least A, C, G and T, or a PFMatrixList object.

type

The type of PWM generated, should be one of "log2probratio" or "prob". "log2probratio" will generate the PWM matrix in log-scale, while "prob" will give the PWM matrix in probability scale of 0 to 1.

pseudocounts

pseudocounts is a numeric non-negative vector, which means you can specify different pseudocounts for each site. The values will be recycled if shorter than the length of sites. 0.8 is recommended. See the reference below for more details. In the TFBS perl module, the squared root of the column sum of the matrix, i.e., the number of motifs used to construct the PFM, is used.

bg

bg is a vector of background frequencies of four bases with names containing A, C, G, T. When toPWM is applied to a PFMatrix, if bg is not specified, it will use the bg information contained in PFMatrix.

Details

The raw position frequency matrix (PFM) is usually converted into a position weight matrix (PWM), also known as position specific scoring matrix (PSSM). The PWM provides the probability of each base at certain position and used for scanning the genomic sequences. The implementation here is slightly different from PWM in Biostrings package by choosing the pseudocounts. Pseudocounts is necessary for correcting the small number of counts or eliminating the zero values before log transformation.

postProbs = (PFM + bg * pseudocounts) / (colSums(PFM) + sum(bg) * pseudocounts)

priorProbs = bg / sum(bg)

PWM_log2probratio = log2(postProbs / priorProbs)

PWM_prob = postProbs

Value

A PWMatrix object that contains the background frequency and pseudocounts used.

Author(s)

Ge Tan

References

Wasserman, W. W., & Sandelin, A. (2004). Applied bioinformatics for the identification of regulatory elements. Nature Publishing Group, 5(4), 276-287. doi:10.1038/nrg1315

Nishida, K., Frith, M. C., & Nakai, K. (2009). Pseudocounts for transcription factor binding sites. Nucleic acids research, 37(3), 939-944. doi:10.1093/nar/gkn1019

See Also

toICM, XMatrix

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
  ## Constructe a PFMatrix
  pfm <- PFMatrix(ID="MA0004.1", name="Arnt", matrixClass="Zipper-Type", 
                  strand="+", bg=c(A=0.25, C=0.25, G=0.25, T=0.25), 
                  tags=list(family="Helix-Loop-Helix", species="10090", 
                            tax_group="vertebrates",
                            medline="7592839", type="SELEX", ACC="P53762", 
                            pazar_tf_id="TF0000003",
                            TFBSshape_ID="11", TFencyclopedia_ID="580"),
                  profileMatrix=matrix(c(4L,  19L, 0L,  0L,  0L,  0L,
                                         16L, 0L,  20L, 0L,  0L,  0L,
                                         0L,  1L,  0L,  20L, 0L,  20L,
                                         0L,  0L,  0L,  0L,  20L, 0L), 
                                       byrow=TRUE, nrow=4, 
                                       dimnames=list(c("A", "C", "G", "T")))
                  )
  ## Convert it into a PWMatrix
  pwm <- toPWM(pfm, type="log2probratio", pseudocounts=0.8)
  
  ## Conversion on PWMatrixList
  data(MA0003.2)
  data(MA0004.1)
  pfmList <- PFMatrixList(pfm1=MA0003.2, pfm2=MA0004.1, use.names=TRUE)
  pwmList <- toPWM(pfmList, pseudocounts=0.8)

ge11232002/TFBSTools documentation built on Sept. 12, 2021, 12:07 p.m.