fit.gev: Fit a gev distribution for a pwm matrix.
In matthuska/tRap: A biophysical model for transcription factor binding affinities

Description Usage Arguments Details Value Author(s) Examples

Fit a gev distribution for a pwm matrix.

1	fit.gev(pwm, sequences, gc.content = 0.5, both.strands = TRUE)

`pwm`	position specific count matrix with 4 rows: A, C, G, T
`sequences`	the promoter sequences to fit the model
`gc.content`	GC content to be passed to the `affinity` function
`both.strands`	compute affinity for both strands (default: TRUE)

sequences is a list of character vectors. Each of these character vectors contains promoter sequences of the same length, since the parameters of the generalized extreme value distribution for the pwm are dependent on the sequence length. For each set of sequences of the same length, the GEV parameters are fit. Finally these parameters are used in a linear model dependent on the logarithm base 10 of the length of the sequence.

An object of class GevFit. It contains two elements:

params

the length dependent parameters given as the regression coefficients shape0, shape1, scale0, scale1, loc0 and loc1. These can be used to compute the gev parameters of a sequence of length l as follows: shape = shape0 + shape1 * log10(l) etc.

Matthias Heinig <heinig@molgen.mpg.de>

pwm = matrix(c(5, 4, 3, 1, 10, 12, 5, 3, 3, 5, 3, 10), nrow=4)
sequences = lapply(c(100, 200, 300), function(l) sapply(1:100,
function(x) paste(c("A", "C", "G", "T")[sample(4, l, replace=TRUE)],
collapse="")))

fit.gev(pwm, sequences)