R/RcppExports.R

Defines functions localScoreC_double localScoreC exact_mc stationary_distribution maxPartialSumd mcc karlin daudin

Documented in daudin exact_mc karlin localScoreC localScoreC_double maxPartialSumd mcc stationary_distribution

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' @description Calculates the exact p-value in the identically and independantly distributed of a given local score, a sequence length that 'must not be too large' and for a given score distribution
#' @details Small in this context depends heavily on your machine. On a 3,7GHZ machine this means for daudin(1000, 5000, c(0.2, 0.2, 0.2, 0.1, 0.2, 0.1), -2, 3)
#' an execution time of ~2 seconds. This is due to the calculation method using matrix exponentation which becomes very fast very slow. The size of the matrix of the exponentiation is equal to a+1 with a the local score value. The matrix must be put at the power n, with n the sequence length.
#' Moreover, it is known that the local score value is expected to be in mean of order log(n).
#' @title Daudin [p-value] [iid]
#' @return A double representing the probability of a local score as high as the one given as argument
#' @param localScore the observed local score
#' @param sequence_length length of the sequence
#' @param score_probabilities the probabilities for each score from lowest to greatest
#' @param sequence_min minimum score
#' @param sequence_max maximum score
#' @examples 
#' daudin(localScore = 4, sequence_length = 50, 
#' score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), sequence_min = -3, sequence_max = 2)
#' @export
daudin <- function(localScore, sequence_length, score_probabilities, sequence_min, sequence_max) {
    .Call('_localScore_daudin', PACKAGE = 'localScore', localScore, sequence_length, score_probabilities, sequence_min, sequence_max)
}

#' @description Calculates an approximated p-value of a given local score value and a long sequence length in the identically and independantly distributed model for the sequence. See also mcc() function for another approximated method in the i.i.d. model 
#' @details This method works the better the longer the sequence is. Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistent. In such case, you could try to change your score scheme (in case of discretization)
#' or use the function \code{\link{karlinMonteCarlo}} .
#' @title Karlin [p-value] [iid]
#' @return A double representing the probability of a localScore as high as the one given as argument
#' @param localScore the observed local score
#' @param sequence_length length of the sequence (at least several hundreds)
#' @param score_probabilities the probabilities for each unique score from lowest to greatest
#' @param sequence_min minimum score
#' @param sequence_max maximum score
#' @examples 
#' karlin(150, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -5, 5)
#' @export
karlin <- function(localScore, sequence_length, score_probabilities, sequence_min, sequence_max) {
    .Call('_localScore_karlin', PACKAGE = 'localScore', localScore, sequence_length, score_probabilities, sequence_min, sequence_max)
}

#' @description Calculates an approximated p-value for a given local score value and a medium to long sequence length in the identically and independantly distributed model
#' @details This methods is actually an improved method of Karlin and produces more precise results. It should be privileged whenever possible. \cr
#' As with karlin, the method works the better the longer the sequence. Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistency. In such case, you could try to change your score scheme (in case of discretization)
#' or use the function \code{\link{karlinMonteCarlo}} .
#' @title MCC [p-value] [iid]
#' @return A double representing the probability of a local score as high as the one given as argument
#' @param localScore the observed local score
#' @param sequence_length length of the sequence (up to one hundred)
#' @param score_probabilities the probabilities for each unique score from lowest to greatest
#' @param sequence_min minimum score
#' @param sequence_max maximum score
#' @examples 
#' mcc(40, 100, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' mcc(40, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' @export
mcc <- function(localScore, sequence_length, score_probabilities, sequence_min, sequence_max) {
    .Call('_localScore_mcc', PACKAGE = 'localScore', localScore, sequence_length, score_probabilities, sequence_min, sequence_max)
}

#' @description Calculates the distribution of the maximum of the partial sum process for a given value in the identically and independantly distributed model
#' @details Implement the formula (4) of the article Mercier, S., Cellier, D., & Charlot, D. (2003). An improved approximation for assessing the statistical significance of molecular sequence features. Journal of Applied Probability, 40(2), 427-441. doi:10.1239/jap/1053003554 \cr
#' Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistency. 
#' @title Maximum of the partial sum [probability] [iid]
#' @return A double representing the probability of the maximum of the partial sum process equal to k
#' @param k value at which calculates the probability
#' @param score_probabilities the probabilities for each unique score from lowest to greatest
#' @param sequence_min minimum score
#' @param sequence_max maximum score
#' @examples 
#' maxPartialSumd(10, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' @export
maxPartialSumd <- function(k, score_probabilities, sequence_min, sequence_max) {
    .Call('_localScore_maxPartialSumd', PACKAGE = 'localScore', k, score_probabilities, sequence_min, sequence_max)
}

#' @description Calculates stationary distribution of markov transition matrix by use of eigenvectors of length 1
#' @title Stationary distribution [Markov chains]
#' @return A vector with the probabilities
#' @param m Transition Matrix [matrix object]
#' @examples 
#' B = t(matrix (c(0.2, 0.8, 0.4, 0.6), nrow = 2))
#' stationary_distribution(B)
#' @export
stationary_distribution <- function(m) {
    .Call('_localScore_stationary_distribution', PACKAGE = 'localScore', m)
}

#' @description Calculates the exact p-value for short numerical Markov chains. Memory usage and time computation can be too large for a high local score value and high score range (see details).
#' @title Exact method for p-value [Markov chains]
#' @return A double representing the probability of a localScore as high as the one given as argument
#' @param localScore Integer local score for which the p-value should be calculated
#' @param m Transition matrix [matrix object]. Optionnaly, rownames can be corresponding score values. m should be a transition matrix of an ergodic Markov chain.
#' @param sequence_length Length of the sequence
#' @param score_values A integer vector of sequence score values (optional). If not set, the rownames of m are used if they are numeric and set.
#' @param prob0 Vector of probability distribution of the first score of the sequence (optional). If not set, the stationnary distribution of m is used.
#' @details This method computation needs to allocate a square matrix of size localScore^(range(score_values)). This matrix is then exponentiated to sequence_length.
#' @examples 
#' mTransition <- t(matrix(c(0.2, 0.3, 0.5, 0.3, 0.4, 0.3, 0.2, 0.4, 0.4), nrow = 3))
#' scoreValues <- -1:1
#' initialProb <- stationary_distribution(mTransition)
#' exact_mc(localScore = 12, m = mTransition, sequence_length = 100, 
#'         score_values = scoreValues, prob0 = initialProb)
#' exact_mc(localScore = 150, m = mTransition, sequence_length = 1000, 
#'          score_values = scoreValues, prob0 = initialProb)
#' rownames(mTransition) <- scoreValues
#' exact_mc(localScore = 12, m = mTransition, sequence_length = 100, prob0 = initialProb)
#' # Minimal specification
#' exact_mc(localScore = 12, m = mTransition, sequence_length = 100)
#' @export
exact_mc <- function(localScore, m, sequence_length, score_values = NULL, prob0 = NULL) {
    .Call('_localScore_exact_mc', PACKAGE = 'localScore', localScore, m, sequence_length, score_values, prob0)
}

#' @description Calculates the local score for a sequence of integer scores. Only provides the
#' first occurrence of the local score. Use function suboptimalSegment() or Lindley() to obtain the others localizations of the different realizations of the local score.
#' @title Local score
#' @return A structure containing: the local score value and the begin and end index of the segment realizing this optimal score ; all the local maxima of the Lindley process (non negative excursion) and their begin and ens index ; the record times of the Lindley process but only the ones corresponding to the begin index of non negative excursions
#' @param v : a sequence of integer values as vector.
#' @param supressWarnings : if warnings should not be displayed
#' @examples 
#' seq.OneSegment=c(1,-2,3,1,-1,2)
#' # one segment realizing the local score value
#' localScoreC(seq.OneSegment) 
#' seq.TwoSegments=c(1,-2,3,1,2,-2,-2,-1,1,-2,3,1,2,-1,-2,-2,-1,1)
#' # two segments realizing the local score value
#' localScoreC(seq.TwoSegments) 
#' # only the first realization
#' localScoreC(seq.TwoSegments)$localScore 
#' # all the realization of the local together with the suboptimal ones
#' localScoreC(seq.TwoSegments)$suboptimalSegmentScores 
#' # for small sequences, you can also use lindley() fonction to check if 
#' # several segments achieve the local Score
#' lindley(seq.TwoSegments) 
#' plot(1:length(seq.TwoSegments),lindley(seq.TwoSegments),type='b')
#' seq.TwoSegments.InSameExcursion=c(1,-2,3,2,-1,0,1,-2,-2,-4,1)
#' localScoreC(seq.TwoSegments.InSameExcursion)
#' # lindley() shows two realizations in the same excursion (no 0 value between the two LS values)
#' lindley(seq.TwoSegments.InSameExcursion) 
#' # same beginning index but two possible ending indexes
#' # only one excursion realizes the local score even in there is two possible length of segment
#' localScoreC(seq.TwoSegments.InSameExcursion)$suboptimalSegmentScores 
#' plot(1:length(seq.TwoSegments.InSameExcursion),lindley(seq.TwoSegments.InSameExcursion),type='b')
#' @export
localScoreC <- function(v, supressWarnings = FALSE) {
    .Call('_localScore_localScoreC', PACKAGE = 'localScore', v, supressWarnings)
}

#' @description Calculates the local score for a sequence of doubles. Only provides the
#' first occurrence. Use function suboptimalSegment() or Lindley() to obtain the others localizations of the different realizations of the local score.
#' @title Local score for sequences of floating values
#' @return A structure containing: the local score value and the begin and end index of the segment realizing this optimal score ; all the local maxima of the Lindley process (non negative excursion) and their begin and ens index ; the record times of the Lindley process but only the ones corresponding to the begin index of non negative excursions 
#' @param v A sequence of values as vector.
#' @param supressWarnings if warnings should be displayed
#' @examples 
#' localScoreC_double(c(1.2,-2.1,3.5,1.7,-1.1,2.3))
#' seq.TwoSegments=c(1.2,-2.1,3.5,1.7,2,-2,-2,-3.5,1,3.5,1.7,1,-2,-2)
#' # two segments realizing the local score value
#' localScoreC(seq.TwoSegments) 
#' # only the first realization
#' localScoreC(seq.TwoSegments)$localScore 
#' # all the realization of the local together with the suboptimal ones
#' localScoreC(seq.TwoSegments)$suboptimalSegmentScores 
#' # for small sequences, you can also use lindley() fonction to check if 
#' # several segments achieve the local score
#' lindley(seq.TwoSegments) 
#' plot(1:length(seq.TwoSegments),lindley(seq.TwoSegments),type='b')
#' seq.TwoSegments.InSameExcursion=c(1,-2,3,2,-1,0,1,-2,-2)
#' localScoreC(seq.TwoSegments.InSameExcursion)
#' # lindley() shows two realizations in the same excursion (no 0 value between the two LS values)
#' lindley(seq.TwoSegments.InSameExcursion) 
#' plot(1:length(seq.TwoSegments.InSameExcursion),lindley(seq.TwoSegments.InSameExcursion),type='b')
#' # same beginning index but two possible ending indexes
#' # only one excursion realizes the local score even in there is two possible length of segment
#' localScoreC(seq.TwoSegments.InSameExcursion)$suboptimalSegmentScores 
#' @export
localScoreC_double <- function(v, supressWarnings = FALSE) {
    .Call('_localScore_localScoreC_double', PACKAGE = 'localScore', v, supressWarnings)
}

Try the localScore package in your browser

Any scripts or data that you put into this service are public.

localScore documentation built on Nov. 3, 2023, 1:08 a.m.