karlinMonteCarlo: Monte Carlo - Karlin [p-value]

View source: R/rMethods.R

karlinMonteCarloR Documentation

Monte Carlo - Karlin [p-value]

Description

Estimates p-value of the local score based on a Monte Carlo estimation of Gumble parameters from simulations of smaller sequences with same distribution. Appropriate for great sequences with length > 10^3, for i.i.d and markovian sequence models.

Usage

karlinMonteCarlo(
  local_score,
  sequence_length,
  simulated_sequence_length,
  FUN,
  ...,
  numSim = 1000,
  plot = TRUE,
  keepSimu = FALSE
)

karlinMonteCarlo_double(
  local_score,
  sequence_length,
  simulated_sequence_length,
  FUN,
  ...,
  numSim = 1000,
  plot = TRUE,
  keepSimu = FALSE
)

Arguments

local_score

local score observed in a sequence.

sequence_length

length of the sequence

simulated_sequence_length

length of simulated sequences produced by

FUN

function to simulate similar sequences with.

...

parameters for FUN

numSim

number of sequences to create for estimation FUN

plot

boolean value if to display plots for cumulated function, density and linearization of cumulative density function

keepSimu

Boolean, default to FALSE. If TRUE, the simulated local scores are returned as the localScores element of the output list.

Details

The length of the simulated sequences is an argument specific to the function provided for simulation. Thus, it has to be provided also in the parameter simulated_sequence_length in the arguments of the "Monte Carlo - Karlin" function. It is a crucial detail as it influences precision and computation time of the result. Note that to get an appropriate estimation, the given average score must be non-positive. Be careful that the parameters names of the function FUN should differ from those of karlinMonteCarlo function.
Methods - Parameters K^\star and \lambda of Karlin and Dembo (1990) are estimated by a linear regression on the log(-log(cumulative distribution function of the local scores)) on shorter sequences (size simulated_sequence_length). The formula used are : \hat{\lambda} = -\hat{b} and \hat{K^\star} = exp(\hat{a})/simulated\_sequence\_length where \hat{a} is the intercept of the regression and \hat{b} is the slope of the regression. Then p-value is given by p = exp(-K^\star * exp(-\lambda*x )) where x = local\_score - \log(sequence\_length)/\lambda.
The density plot produced by plot == TRUE depends on the type of the simulated local scores: if they are integer, a barplot of relative frequency is used, else plot(density(...)) is used.
This function calls localScoreC which type of the output depends on the type of the input. To be efficient, be aware to use a simulating function FUN that return a vector of adequate type ("integer" or "numeric"). Warning: in R, typeof(c(1,3,4,10)) == "double". You can set a type of a vector with mode() or as.integer() functions for example.
karlinMonteCarlo_double() is deprecated. At this point, it is just a call to karlinMonteCarlo() function.

Value

If keepSimu is FALSE, returns a list containing:

p_value Probability to obtain a local score with a value greater or equal to the parameter local_score
K* Parameter K^* defined in Karlin and Dembo (1990)
lambda Parameter \lambda defined in Karlin and Dembo (1990)

If keepSimu is TRUE, returns a list containing:

p_value Probability to obtain a local score with a value greater or equal to the parameter local_score
K* Parameter K^* defined in Karlin and Dembo (1990)
lambda Parameter \lambda defined in Karlin and Dembo (1990)
localScores Vector of size numSim containing the simulated local scores for sequence size of simulated_sequence_length

See Also

monteCarlo localScoreC

Examples


mySeq <- sample(-7:6, replace = TRUE, size = 100000)
#MonteCarlo taking random sample from the input sequence itself
karlinMonteCarlo(local_score = 160, sequence_length = 100000,
               simulated_sequence_length = 1000,
               FUN = function(x, sim_length) {
                        return(sample(x = x,
                               size = sim_length,
                               replace = TRUE))
                     },
               x = mySeq,
               sim_length = 1000,
               numSim = 1000)


#Markovian example (longer computation)
MyTransMat_reels <-  matrix(c(0.3, 0.1, 0.1, 0.1, 0.4,
                              0.2, 0.2, 0.1, 0.2, 0.3,
                              0.3, 0.4, 0.1, 0.1, 0.1,
                              0.3, 0.3, 0.1, 0.2, 0.1,
                              0.2, 0.1, 0.2, 0.4, 0.1),
                              ncol = 5, byrow=TRUE)
karlinMonteCarlo(local_score = 18.5, sequence_length = 200000,
                 simulated_sequence_length = 1500,
                 FUN = transmatrix2sequence,
                 matrix = MyTransMat_reels,
                 score =c(-1.5,-0.5,0,0.5,1), length = 1500,
                 plot=TRUE, numSim = 1500)


localScore documentation built on April 3, 2025, 5:26 p.m.