karlinMonteCarlo | R Documentation |
Estimates p-value of the local score based on a Monte Carlo estimation of Gumble parameters from simulations of smaller sequences with same distribution. Appropriate for great sequences with length > 10^3, for i.i.d and markovian sequence models.
karlinMonteCarlo(
local_score,
sequence_length,
simulated_sequence_length,
FUN,
...,
numSim = 1000,
plot = TRUE,
keepSimu = FALSE
)
karlinMonteCarlo_double(
local_score,
sequence_length,
simulated_sequence_length,
FUN,
...,
numSim = 1000,
plot = TRUE,
keepSimu = FALSE
)
local_score |
local score observed in a sequence. |
sequence_length |
length of the sequence |
simulated_sequence_length |
length of simulated sequences produced by |
FUN |
function to simulate similar sequences with. |
... |
parameters for FUN |
numSim |
number of sequences to create for estimation FUN |
plot |
boolean value if to display plots for cumulated function, density and linearization of cumulative density function |
keepSimu |
Boolean, default to FALSE. If TRUE, the simulated local scores are returned as the localScores element of the output list. |
The length of the simulated sequences is an argument specific to the
function provided for simulation. Thus, it has to be provided also in the
parameter simulated_sequence_length
in the arguments of the "Monte
Carlo - Karlin" function. It is a crucial detail as it influences precision
and computation time of the result. Note that to get an appropriate
estimation, the given average score must be non-positive. Be careful that
the parameters names of the function FUN
should differ from those of
karlinMonteCarlo
function.
Methods - Parameters K^\star
and \lambda
of Karlin and Dembo (1990) are estimated by a linear
regression on the log(-log(cumulative distribution function of the local
scores)) on shorter sequences (size simulated_sequence_length
). The
formula used are : \hat{\lambda} = -\hat{b}
and \hat{K^\star} =
exp(\hat{a})/simulated\_sequence\_length
where \hat{a}
is the
intercept of the regression and \hat{b}
is the slope of the
regression. Then p-value is given by p = exp(-K^\star * exp(-\lambda*x
))
where x = local\_score - \log(sequence\_length)/\lambda
.
The
density plot produced by plot == TRUE
depends on the type of the
simulated local scores: if they are integer, a barplot of relative
frequency is used, else plot(density(...))
is used.
This
function calls localScoreC
which type of the output depends
on the type of the input. To be efficient, be aware to use a simulating
function FUN
that return a vector of adequate type ("integer" or
"numeric"). Warning: in R, typeof(c(1,3,4,10)) == "double"
. You can
set a type of a vector with mode()
or as.integer()
functions
for example.
karlinMonteCarlo_double()
is deprecated. At this
point, it is just a call to karlinMonteCarlo()
function.
If keepSimu
is FALSE, returns a list containing:
p_value | Probability to obtain a local score with a value
greater or equal to the parameter local_score |
K*
| Parameter K^* defined in Karlin and Dembo (1990) |
lambda | Parameter \lambda defined in Karlin and
Dembo (1990) |
If keepSimu
is TRUE, returns a list containing:
p_value | Probability to obtain a local score with a value greater or equal to the parameter local_score |
K* | Parameter K^* defined in Karlin and Dembo (1990) |
lambda | Parameter \lambda defined in Karlin and Dembo (1990) |
localScores | Vector of size numSim containing the simulated local scores for sequence size of simulated_sequence_length
|
monteCarlo
localScoreC
mySeq <- sample(-7:6, replace = TRUE, size = 100000)
#MonteCarlo taking random sample from the input sequence itself
karlinMonteCarlo(local_score = 160, sequence_length = 100000,
simulated_sequence_length = 1000,
FUN = function(x, sim_length) {
return(sample(x = x,
size = sim_length,
replace = TRUE))
},
x = mySeq,
sim_length = 1000,
numSim = 1000)
#Markovian example (longer computation)
MyTransMat_reels <- matrix(c(0.3, 0.1, 0.1, 0.1, 0.4,
0.2, 0.2, 0.1, 0.2, 0.3,
0.3, 0.4, 0.1, 0.1, 0.1,
0.3, 0.3, 0.1, 0.2, 0.1,
0.2, 0.1, 0.2, 0.4, 0.1),
ncol = 5, byrow=TRUE)
karlinMonteCarlo(local_score = 18.5, sequence_length = 200000,
simulated_sequence_length = 1500,
FUN = transmatrix2sequence,
matrix = MyTransMat_reels,
score =c(-1.5,-0.5,0,0.5,1), length = 1500,
plot=TRUE, numSim = 1500)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.