seqToPSSM: Build a position-specific scoring matrix (PSSM) from a set of...
In jvanheld/stats4bioinfo: Utilities for the book "Statistics for bioinformatics"

Description Usage Arguments Details Author(s) Examples

Given a vector of sequences, built a position-specific scoring matrix (PSSM) with different derived statistics (counts, frequencies, probabilities, weights, information content).

1 2	seqToPSSM(sequences, prior = NULL, pseudo.count = 2, IC.log.base = 2, case.sensitive = FALSE)

`sequences`	vector of strings corresponding to biological sequences (DNA, RNA, proteins)
`prior=NULL`	vector of residue prior probabilities (names must correspond to residues)
`pseudo.count=2`	pseudo-count
`IC.log.base=2`	Logarithmic base for the information content
`case.sensitive=FALSE`	by default residues are considered case-insensitive and converted to uppercases.

First version: 2016-12-23 Last modification: 2016-12

Jacques van Helden (Jacques.van-Helden@univ-amu.fr)

## Define the sequences of yeast Met31p binding sites
sequences <- c(
  "MET28"="cgcccAAAACTGTGGtgttag",
  "MET3"="gttgtAAAACTGTGGCTTTGT",
  "MUP3"="cggaaAAAACTGTGGcgtcgc",
  "SAM1"="acaggAAAACTGTGGtggcgc",
  "SAM2"="gcttgAAAACTGTGGcgtttt",
  "MET6"="gtcgcAAAACTGTGGtagtca",
  "MET30"="ccgcgCAAACTGTGGcttccc",
  "ZWF1"="ataagCAAACTGTGGgttcat",
  "MET14"="cctcaAAAAATGTGGcaatgg",
  "MET17"="tcatgAAAACTGTGTaacata",
  "MET2"="tgcaaAAAATTGTGGatgcac",
  "MET8"="ggaaaAAAAATGTGAaaatcg",
  "MET1"="cataaTAAACTGTGAacggac")

## Chose priors based on yeast non-coding sequences
prior <- c("A"=0.32, "C"=0.18, "G"=0.18, "T"=0.32)

## Build the PSSM
pssm <- seqToPSSM(seq=sequences, prior = prior)

## Print count table
print(pssm$counts)

## Print weight matrix
signif(pssm$weights, digits=2)

## Plot a heatmap with the weights
heatmap.simple(pssm$counts, auto.margins=FALSE, xlab="Position", 
     ylab="Residues", main="Yeast Met13p count matrix", las=1)