R/RcppExports.R
In localScore: Package for Sequence Analysis by Local Score

Documented in daudin exact_mc karlin karlin_parameters localScoreC maxPartialSumd mcc stationary_distribution

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' @description Calculates the exact p-value in the identically and independently distributed of a given local score, a sequence length that 'must not be too large' and for a given score distribution
#' @title Daudin [p-value] [iid]
#' @return A double representing the probability of a local score as high as the one given as argument. 
#' @param local_score the observed local score
#' @param sequence_length length of the sequence
#' @param score_probabilities the probabilities for each score from lowest to greatest (Optionnaly with scores as names)
#' @param sequence_min minimum score (optional if \code{score_values} OR names(score_probabilities) is defined)
#' @param sequence_max maximum score (optional if \code{score_values} OR names(score_probabilities) is defined)
#' @param score_values vector of integer score values, associated to score_probabilities  (optional if
#' \code{sequence_min} and \code{sequence_max} OR names(score_probabilities) are defined)
#' @details 
#' Either \code{sequence_min} and \code{sequence_max} are specified as input, OR all possible score values in 
#' \code{score_values} vector ; one of this choice is required. <cr>
#' Small in this context depends heavily on your machine. 
#' On a 3,7GHZ machine this means for daudin(1000, 5000, c(0.2, 0.2, 0.2, 0.1, 0.2, 0.1), -2, 3)
#' an execution time of ~2 seconds. This is due to the calculation method using matrix exponentiation
#' which takes times. The size of the matrix of the exponentiation is equal to a+1 with a the
#' local score value. The matrix must be put at the power n, with n the sequence length.
#' Moreover, it is known that the local score value is expected to be in mean of order log(n).
#' @examples 
#' p1 <- daudin(local_score = 4, sequence_length = 50, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        sequence_min = -3, sequence_max = 2)
#' p2 <- daudin(local_score = 4, sequence_length = 50, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        score_values = as.integer(-3:2))
#' p1 == p2 # TRUE
#' 
#' prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
#' score_values <- which(prob != 0) - 6 # keep only non null probability scores
#' prob0 <- prob[prob != 0]             # and associated probability
#' p <- daudin(150, 10000, prob, sequence_min = -5, sequence_max =  5)
#' p0 <- daudin(150, 10000, prob0, score_values = score_values)
#' names(prob0) <- score_values
#' p1 <- daudin(150, 10000, prob0)
#' p == p0 # TRUE
#' p == p1 # TRUE
#' @export
daudin <- function(local_score, sequence_length, score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL) {
    .Call('_localScore_daudin', PACKAGE = 'localScore', local_score, sequence_length, score_probabilities, sequence_min, sequence_max, score_values)
}

#' @description \code{karlin} Calculates an approximated p-value of a given local score value and a long sequence length in the identically and independently distributed model for the sequence. See also \code{\link{mcc}} function for another approximated method in the i.i.d. model that improved the one given by \code{\link{karlin}} or \code{\link{daudin}} for exact calculation. \cr
#' \code{karlin_parameters} is a annex function returning the parameters \eqn{\lambda}, \eqn{K^+} and \eqn{K^*} defined in Karlin and Dembo (1992).
#' @details This method works the better the longer the sequence is. Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistent. In such case, you could try to change your score scheme (in case of discretization)
#' or use the function \code{\link{karlinMonteCarlo}} .
#' This function implements the formulae given in Karlin and Dembo (1992), page 115-6. As the score is discrete here (lattice score function),
#' there is no limit distribution of the local score with the size of the sequence, but an inferior and a superior bound
#' are given. The output of this function is conservative as it gives the upper bound for the p-value.
#' Notice the lower bound can easily be found as it is the same call of function with parameter value local_score+1.
#' @title Karlin [p-value] [iid]
#' @return A double representing the probability of a local score as high as the one given as argument
#' @inheritParams daudin
#' @seealso \code{\link{mcc}}, \code{\link{daudin}}, \code{\link{karlinMonteCarlo}}, \code{\link{monteCarlo}}
#' @examples 
#' karlin(150, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -5, 5)
#' p1 <- karlin(local_score = 15, sequence_length = 5000, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        sequence_min = -3, sequence_max = 2)
#' p2 <- karlin(local_score = 15, sequence_length = 5000, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        score_values = -3:2)
#' p1 == p2 # TRUE
#' 
#' prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
#' score_values <- which(prob != 0) - 6 # keep only non null probability scores
#' prob0 <- prob[prob != 0]             # and associated probability
#' p <- karlin(150, 10000, prob, sequence_min = -5, sequence_max =  5)
#' p0 <- karlin(150, 10000, prob0, score_values = score_values) 
#' names(prob0) <- score_values
#' p1 <- karlin(150, 10000, prob0)
#' p == p0 # TRUE
#' p == p1 # TRUE
#' @export
karlin <- function(local_score, sequence_length, score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL) {
    .Call('_localScore_karlin', PACKAGE = 'localScore', local_score, sequence_length, score_probabilities, sequence_min, sequence_max, score_values)
}

#' @rdname karlin
#' @export
karlin_parameters <- function(score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL) {
    .Call('_localScore_karlin_parameters', PACKAGE = 'localScore', score_probabilities, sequence_min, sequence_max, score_values)
}

#' @description Calculates an approximated p-value for a given local score value and a medium to long sequence length in the identically and independently distributed model
#' @details This methods is actually an improved method of Karlin and produces more precise results. It should be privileged whenever possible. \cr
#' As with karlin, the method works the better the longer the sequence. Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistency. In such case, you could try to change your score scheme (in case of discretization)
#' or use the function \code{\link{karlinMonteCarlo}} .
#' @title MCC [p-value] [iid]
#' @return A double representing the probability of a local score as high as the one given as argument
#' @inheritParams daudin
#' @seealso \code{\link{karlin}}, \code{\link{daudin}}, \code{\link{karlinMonteCarlo}}, \code{\link{monteCarlo}}
#' @examples 
#' mcc(40, 100, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' mcc(40, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' mcc(150, 10000, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -5, 5)
#' p1 <- mcc(local_score = 15, sequence_length = 5000, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        sequence_min = -3, sequence_max = 2)
#' p2 <- mcc(local_score = 15, sequence_length = 5000, 
#'        score_probabilities = c(0.2, 0.3, 0.1, 0.2, 0.1, 0.1), 
#'        score_values = -3:2)
#' p1 == p2 # TRUE
#' 
#' prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
#' score_values <- which(prob != 0) - 6 # keep only non null probability scores
#' prob0 <- prob[prob != 0]             # and associated probability
#' p <- mcc(150, 10000, prob, sequence_min = -5, sequence_max =  5)
#' p0 <- mcc(150, 10000, prob0, score_values = score_values)
#' names(prob0) <- score_values
#' p1 <- mcc(150, 10000, prob0)
#' p == p0 # TRUE
#' p == p1 # TRUE
#' @export
mcc <- function(local_score, sequence_length, score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL) {
    .Call('_localScore_mcc', PACKAGE = 'localScore', local_score, sequence_length, score_probabilities, sequence_min, sequence_max, score_values)
}

#' @description Calculates the distribution of the maximum of the partial sum process for a given value in the identically and independently distributed model
#' @details Implement the formula (4) of the article Mercier, S., Cellier, D., & Charlot, D. (2003). An improved approximation for assessing the statistical significance of molecular sequence features. Journal of Applied Probability, 40(2), 427-441. doi:10.1239/jap/1053003554 \cr
#' Important note : the calculus of the parameter of the distribution uses
#' the resolution of a polynome which is a function of the score distribution, of order max(score)-min(score). There exists only empirical methods to solve a polynome of order greater that 5
#' with no warranty of reliable solution.
#' The found roots are checked internally to the function and an error message is throw in case of inconsistency. 
#' @title Maximum of the partial sum [probability] [iid]
#' @return A double representing the probability of the maximum of the partial sum process equal to k
#' @param k value at which calculates the probability
#' @inheritParams daudin
#' @examples 
#' maxPartialSumd(10, c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02), -6, 4)
#' prob <- c(0.08, 0.32, 0.08, 0.00, 0.08, 0.00, 0.00, 0.08, 0.02, 0.32, 0.02)
#' score_values <- which(prob != 0) - 7 # keep only non null probability scores
#' prob0 <- prob[prob != 0]             # and associated probability
#' p <- maxPartialSumd(10, prob, sequence_min = -6, sequence_max =  4)
#' p0 <- maxPartialSumd(10,prob0, score_values = score_values)
#' p == p0 # TRUE
#' @export
maxPartialSumd <- function(k, score_probabilities, sequence_min = NULL, sequence_max = NULL, score_values = NULL) {
    .Call('_localScore_maxPartialSumd', PACKAGE = 'localScore', k, score_probabilities, sequence_min, sequence_max, score_values)
}

#' @description Calculates stationary distribution of markov transition matrix by use of eigenvectors of length 1
#' @title Stationary distribution [Markov chains]
#' @return A vector with the probabilities
#' @param m Transition Matrix [matrix object]
#' @examples 
#' B = t(matrix (c(0.2, 0.8, 0.4, 0.6), nrow = 2))
#' stationary_distribution(B)
#' @export
stationary_distribution <- function(m) {
    .Call('_localScore_stationary_distribution', PACKAGE = 'localScore', m)
}

#' @description Calculates the exact p-value for short numerical Markov chains. Memory usage and time computation can be too large for a high local score value and high score range (see details).
#' @title Exact method for p-value [Markov chains]
#' @return A double representing the probability of a local score as high as the one given as argument
#' @param local_score Integer local score for which the p-value should be calculated
#' @param m Transition matrix [matrix object]. Optionality, rownames can be corresponding score values. m should be a transition matrix of an ergodic Markov chain.
#' @param sequence_length Length of the sequence
#' @param score_values A integer vector of sequence score values (optional). If not set, the rownames of m are used if they are numeric and set.
#' @param prob0 Vector of probability distribution of the first score of the sequence (optional). If not set, the stationnary distribution of m is used.
#' @details This method computation needs to allocate a square matrix of size local_score^(range(score_values)). This matrix is then exponentiated to sequence_length.
#' @seealso \code{\link{monteCarlo}}
#' @examples 
#' mTransition <- matrix(c(0.2, 0.3, 0.5, 0.3, 0.4, 0.3, 0.2, 0.4, 0.4), byrow = TRUE, ncol = 3)
#' scoreValues <- -1:1
#' initialProb <- stationary_distribution(mTransition)
#' exact_mc(local_score = 12, m = mTransition, sequence_length = 100, 
#'         score_values = scoreValues, prob0 = initialProb)
#' exact_mc(local_score = 150, m = mTransition, sequence_length = 1000, 
#'          score_values = scoreValues, prob0 = initialProb)
#' rownames(mTransition) <- scoreValues
#' exact_mc(local_score = 12, m = mTransition, sequence_length = 100, prob0 = initialProb)
#' # Minimal specification
#' exact_mc(local_score = 12, m = mTransition, sequence_length = 100)
#' # Score valeus with "holes"
#' scoreValues <- c(-2, -1, 2)
#' mTransition <- matrix(c(0.2, 0.3, 0.5, 0.3, 0.4, 0.3, 0.2, 0.4, 0.4), byrow = TRUE, ncol = 3)
#' initialProb <- stationary_distribution(mTransition)
#' exact_mc(local_score = 50, m = mTransition, sequence_length = 100, 
#'         score_values = scoreValues, prob0 = initialProb)
#' @export
exact_mc <- function(local_score, m, sequence_length, score_values = NULL, prob0 = NULL) {
    .Call('_localScore_exact_mc', PACKAGE = 'localScore', local_score, m, sequence_length, score_values, prob0)
}

#' @title Local score
#' @description Calculates the local score for a sequence of scores, the sub-optimal segments, and the associated record times. The local score is the maximal sum of values contained in a segment among all possible segments of the sequence. In other word, it generalizes a sliding window approach, considering of all possible windows size.
#' @param v a sequence of numerical values as vector (integer or double).
#' @param suppressWarnings (optional) if warnings should not be displayed
#' @return A list containing:
#' \item{localScore}{the local score value and the begin and end index of the segment realizing this optimal score;}
#' \item{suboptimalSegmentScores}{An array containing sub-optimal local scores, that is all the local maxima of the Lindley rocess (non negative excursion) and their begin and end index;}
#' \item{RecordTime}{The record times of the Lindley process as defined in Karlin and Dembo (1990).}
#' @details The \code{localScoreC} function is implemented in a templated C function. Be aware that the type of the output (\code{integer} or \code{double}) depends on the type of the input. The function \code{localScoreC_double} \code{localScoreC_int} explicitly use the corresponding type (with an eventual conversion in case of integer). Warning: in R, \code{typeof(c(1,3,4,10)) == "double"}. You can set a type of a vector with \code{mode()} or \code{as.integer()} functions for example. \cr
#' \code{localScoreC_int} is just a call to \code{as.integer()} before calling \code{localScoreC}. \code{localScore_double} is just a call to \code{localScoreC}, and as such is deprecated.
#' @seealso \code{\link{lindley}}
#' @examples 
#' localScoreC(c(1.2,-2.1,3.5,1.7,-1.1,2.3))
#' # one segment realizing the local score value
#' seq.OneSegment <- c(1,-2,3,1,-1,2)
#' localScoreC(seq.OneSegment) 
#' seq.TwoSegments <- c(1,-2,3,1,2,-2,-2,-1,1,-2,3,1,2,-1,-2,-2,-1,1)
#' # two segments realizing the local score value
#' localScoreC(seq.TwoSegments) 
#' # only the first realization
#' localScoreC(seq.TwoSegments)$localScore 
#' # all the realization of the local together with the suboptimal ones
#' localScoreC(seq.TwoSegments)$suboptimalSegmentScores 
#' # for small sequences, you can also use lindley() function to check if 
#' # several segments achieve the local Score
#' lindley(seq.TwoSegments) 
#' plot(1:length(seq.TwoSegments),lindley(seq.TwoSegments),type='b')
#' seq.TwoSegments.InSameExcursion <- c(1,-2,3,2,-1,0,1,-2,-2,-4,1)
#' localScoreC(seq.TwoSegments.InSameExcursion)
#' # lindley() shows two realizations in the same excursion (no 0 value between the two LS values)
#' lindley(seq.TwoSegments.InSameExcursion) 
#' # same beginning index but two possible ending indexes
#' # only one excursion realizes the local score even in there is two possible lengths of segment
#' localScoreC(seq.TwoSegments.InSameExcursion)$suboptimalSegmentScores 
#' plot(1:length(seq.TwoSegments.InSameExcursion),lindley(seq.TwoSegments.InSameExcursion),type='b')
#' # Technical note about type correspondance
#' seq.OneSegment <- c(1,-2,3,1,-1,2)
#' seq.OneSegmentI <- as.integer(seq.OneSegment)
#' typeof(seq.OneSegment)  # "double" (beware)
#' typeof(seq.OneSegmentI) # "integer"
#' LS1 <- localScoreC(seq.OneSegment, suppressWarnings = TRUE)
#' LS1I <- localScoreC(seq.OneSegmentI, suppressWarnings = TRUE)
#' typeof(LS1$localScore)  # "double"
#' typeof(LS1I$localScore) # "integer"
#' typeof(LS1$suboptimalSegmentScores$value)  # "double"
#' typeof(LS1I$suboptimalSegmentScores$value) # "integer"
#' # Force to use integer values (trunk values if necessary)
#' seq2 <- seq.OneSegment + 0.5
#' localScoreC(seq2, suppressWarnings = TRUE)
#' localScoreC_int(seq.OneSegment, suppressWarnings = TRUE)
#' @export
localScoreC <- function(v, suppressWarnings = FALSE) {
    .Call('_localScore_localScoreC', PACKAGE = 'localScore', v, suppressWarnings)
}
Any scripts or data that you put into this service are public.
localScore documentation built on June 8, 2025, 10:34 a.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
localScore
Package for Sequence Analysis by Local Score

R/RcppExports.R
In localScore: Package for Sequence Analysis by Local Score

Defines functions localScoreC exact_mc stationary_distribution maxPartialSumd mcc karlin_parameters karlin daudin

Documented in daudin exact_mc karlin karlin_parameters localScoreC maxPartialSumd mcc stationary_distribution

Try the localScore package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

localScore Package for Sequence Analysis by Local Score

R/RcppExports.R In localScore: Package for Sequence Analysis by Local Score

Defines functions localScoreC exact_mc stationary_distribution maxPartialSumd mcc karlin_parameters karlin daudin

Documented in daudin exact_mc karlin karlin_parameters localScoreC maxPartialSumd mcc stationary_distribution

Try the localScore package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

localScore
Package for Sequence Analysis by Local Score

R/RcppExports.R
In localScore: Package for Sequence Analysis by Local Score