R/RRAalpha.R

Defines functions ct.RRAalphaBatch ct.makeRhoNull ct.RRAaPvals ct.RRAalpha ct.alphaBeta ct.numcores

Documented in ct.alphaBeta ct.makeRhoNull ct.numcores ct.RRAalpha ct.RRAalphaBatch ct.RRAaPvals

## This document contains functions for the RRAa algorithm.

##' @title Checks and Possibly Sets the Number of Cores to be Used in Parallel Processing
##' @description This function determines the number of cores that the user is expecting to 
##' use during parallel processing operations, and if absent, sets the \code{mc.cores} option
##' to the maximum value. Users who do not wish to use all available cores during parallel 
##' processing should do so by invoking \code{options()} from the command line prior to analysis. 
##' 
##' @return Nothing, but invisibly sets \code{options(mc.cores)} if currently \code{NULL}. 
##' @keywords internal
##' @author Russell Bainer, Pete Haverty
ct.numcores <- function() {
    .Deprecated()
    if (.Platform$OS.type == "windows") {
        options(mc.cores = 1)
    }
    if (is.null(getOption("mc.cores"))) {
        numcores = detectCores()
        options(mc.cores = numcores)
    }
    invisible()
}

##' @title Aggregation of P-value Ranks using a Beta Distribution and Alpha Cutoff  
##' @description This function calculates an alpha-modified rho statistic from a set of normalized ranks by comparing them to a uniform distribution. 
##' Specifically, the ranks are ordered and p-values calculated at each position in the ordered vector by comparison to a Beta distribution. The rho value 
##' returned is the smallest p-value identified in this way across all scores. Should not be used by end users.
##' @param p.in A single column matrix of rank scores, with row.names indicating the gRNA labels.
##' @return A numeric rho value corresponding to the minimum rank order P. 
##' @author Russell Bainer, modified from code from the \code{RobustRankAggreg} package. 
##' 
##' Citation: 
##' Kolde, R. et al, Bioinformatics. 2012 Feb 15;28(4):573-80. doi: 10.1093/bioinformatics/btr709.
##' @keywords internal
##' @examples
##' testp <- runif(20)
##'
##' ct.alphaBeta(testp)  
##' @export
ct.alphaBeta <- function(p.in) {
    n <- length(p.in)
    return(min(pbeta(p.in, seq_len(n), n - seq_len(n) + 1)))
}


##' @title Aggregation of P-value Ranks using a Beta Distribution and Alpha Cutoff  
##' @description This function is called internally as a single instance of the beta aggregation step in RRAa. Users should not interact with it directly. 
##' The expected input is a set of rank statistics, and a paired \code{alpha} argument defining which values to consider in downstream analyses (see below). 
##' 
##' As of gCrisprTools 2.0, this function does not consider `row.names` associated with the provided values, and the p-values are expected to be provided 
##' in register with the provided `g.key` object.
##' @param p A single column matrix of rank statistics. 
##' @param g.key data.frame with guide and gene names
##' @param shuffle Logical indicating whether to shuffle the rank statistics prior to calculating the rho statistics (useful for permutation).
##' @return Nothing, or a named list of target-level P-values, which are treated as a rho statistic in the permutation step. 
##' @author Russell Bainer
##' @keywords internal
##' @examples 
##' data('fit')
##' data('ann')
##' geneScores <- ct.RRAalpha(fit$p.value, ann, shuffle = FALSE)
##' @export
ct.RRAalpha <- function(p, g.key, shuffle = FALSE) {
    stopifnot(identical(nrow(p), nrow(g.key)))

    if (shuffle) {
        p <- sample(p, nrow(p))
    }

    symbol = g.key$geneSymbol
    ord = order(symbol, p)
    p.collect <- split(p[ord], symbol[ord])
    rhoscores <- vapply(p.collect, ct.alphaBeta, numeric(1))
    return(rhoscores)
}

##' @title gRNA signal aggregation via RRAa
##' @description This is a wrapper function implementing the RRAalpha p-value aggregation algorithm. Takes in a set of gRNA rank scores (formatted as a single-column 
##' numeric matrix with row.names indicating the guide names) and a list object of gRNA annotations (names are the gene targets, and each element of the list contains 
##' a vector of the corresponding guide names). The rank scores are converted to gene-level statistics that are thenm transformed into empirical p-values by permutation. 
##' @param p A single column matrix of ranking scores, with row.names indicating the gRNA labels
##' @param g.key An annotation data frame of gRNAs, minimally containing a factorized 'geneSymbol' column indicating the target names. This is typically generated by 
##' calling the \code{ct.buildKeyFromAnnotation()} function.  
##' @param permute Number of permutations to be used during empirical p-value estimation.
##' @param permutation.seed numeric seed for permutation reproducibility.
##'   Default: \code{NULL} means to not set any seed.
##' @param multi.core Deprecated, does nothing
##' @return A named list of target-level empirical P-values. 
##' @author Russell Bainer
##' @keywords internal
##' @examples 
##' data('fit')
##' data('ann')
##' genePvals <- ct.RRAaPvals(fit$p.value, ann, permute = 100)
##' @export
ct.RRAaPvals <- function(p, g.key, permute, permutation.seed = NULL, multi.core = NULL) {

    ## Input checks
    if (!is.null(multi.core)) {
        warning("The multi.core argument of ct.RRAaPvals is deprecated. This function is now sufficiently fast on one core.")
    }
    if (!(is.matrix(p) && ncol(p) == 1)) {
        stop("P-values should be input as a single-column matrix with row names contained in the gs.list")
    }
    if (ncol(p) > 1) {
        p = p[, 1, drop = FALSE]
        warning("Multiple columns are present in the p-value matrix. Using the first column: ", colnames(p))
    }
    if (!is.data.frame(g.key)) {
        stop("The annotation provided must be a data frame.")
    }
    if (!("geneSymbol" %in% names(g.key))) {
        stop("The provided annotation does not contain a geneSymbol column.")
    }
    if (!identical(nrow(g.key), nrow(p))) {
        stop("P-values and gene keys do not have the same number of rows!")
    }

    if (anyNA(p)) {
        notna = !is.na(p)
        p = p[notna]
        g.key = g.key[notna, ]
    }

    is.null(permutation.seed) || is.numeric(permutation.seed) || stop("'permutation.seed' must be numeric or NULL.")
    set.seed(permutation.seed)  # default NULL will have no effect

    ## Everything apparently ok, generate P-values.
    obs <- ct.RRAalpha(p, g.key)

    message("Permuting ", permute, " times, this may take a minute ...")

    n.guides = table(g.key$geneSymbol)
    n.guides.table = sort.int(unique(n.guides))
    rho.nulls = lapply(n.guides.table, ct.makeRhoNull, p, permute)
    names(rho.nulls) = as.character(n.guides.table)
    stopifnot(identical(names(n.guides), names(obs)))
    out = unlist(mapply(obs, as.character(n.guides), FUN = function(x, y) {
        mean(rho.nulls[[y]] <= x)
    }, SIMPLIFY = FALSE), recursive = FALSE)

    return(out)
}

##' Make null distribution for RRAalpha tests
##'
##' Makes random distribution of Rho value by taking nperm random samples of n rank stats, p.
##' @param n single integer, number of guides per gene
##' @param p numeric vector of rank statistics
##' @param nperm single integer, how many random samples to take.
##' @return numeric vector of Rho values
##' @examples 
##' ct.makeRhoNull(3, 1:9, 5)
##' @export 
ct.makeRhoNull <- function(n, p, nperm) {
    message("Making Rho null distribution for", n, "guides per gene.")
    vapply(seq_len(nperm), function(i) {
        p.in = sort.int(sample(p, n, replace = FALSE))
        ct.alphaBeta(p.in)
    }, numeric(1))
}

##' @title Create Batches of Null Permutations for a Crispr Screen
##' @description This is a wrapper function to partition batches of calls to \code{ct.RRAalpha()} for multicore processing. 
##' It is called internally as a single instance of the beta aggregation step in RRAa. Users should not interact with it directly. 
##' @param p A single column matrix of rank statistics, with row.names indicating the gRNA labels. 
##' @param g.key data.frame with guide and gene names
##' @param result.environment The target environment containing the quasi-global variables incremented during the permutations in the child functions. 
##' @param batch.size Number of iterations to deploy to each daughter process. 
##' @param permutation.seed numeric seed for permutation reproducibility. Default is \code{NULL}, in which case no seed is set.
##' @return An integer vector indicating the number of iterations in which each gene's score was better than those indicated in \code{result.environment$obs}. 
##' @keywords internal
##' @author Russell Bainer
ct.RRAalphaBatch <- function(p, g.key, result.environment, batch.size = 100, permutation.seed = NULL) {

    .Deprecated("ct.makeRhoNull", msg = "Constructing null distributions is now fast, so ct.RRAalphaBatch is unnecessary.")
    ## make a batch environment
    batch.env <- new.env()
    batch.env$obs <- result.environment$obs
    batch.env$target.positive.iterations <- rep.int(0, result.environment$ngenes)
    ## run permutations and increment as needed
    set.seed(permutation.seed)  # default NULL will have no effect

    invisible(replicate(batch.size, ct.RRAalpha(p, g.key, shuffle = TRUE)))

    return(batch.env$target.positive.iterations)
}
RussBainer/gCrisprTools documentation built on Nov. 5, 2022, 2:35 p.m.