R/RRAalpha.R

Defines functions ct.RRAalphaBatch ct.makeRhoNull ct.RRAaPvals ct.RRAalpha ct.alphaBeta ct.numcores

Documented in ct.alphaBeta ct.makeRhoNull ct.numcores ct.RRAalpha ct.RRAalphaBatch ct.RRAaPvals

## This document contains functions for the RRAa algorithm. 

##' @title Checks and Possibly Sets the Number of Cores to be Used in Parallel Processing
##' @description This function determines the number of cores that the user is expecting to 
##' use during parallel processing operations, and if absent, sets the \code{mc.cores} option
##' to the maximum value. Users who do not wish to use all available cores during parallel 
##' processing should do so by invoking \code{options()} from the command line prior to analysis. 
##' 
##' @return Nothing, but invisibly sets \code{options(mc.cores)} if currently \code{NULL}. 
##' @keywords internal
##' @author Russell Bainer, Pete Haverty
ct.numcores <- function()  {
    .Deprecated()
    if( .Platform$OS.type == "windows" ){
        options(mc.cores = 1)
    }
    if(is.null(getOption('mc.cores'))){
        numcores = detectCores()
        options(mc.cores = numcores)
    }
    invisible()
}

##' @title Aggregation of P-value Ranks using a Beta Distribution and Alpha Cutoff  
##' @description This function calculates an alpha-modified rho statistic from a set of normalized ranks by comparing them to a uniform distribution. 
##' Specifically, the ranks are ordered and p-values calculated at each position in the ordered vector by comparison to a Beta distribution. The rho value 
##' returned is the smallest p-value identified in this way across all scores. Should not be used by end users.
##' @param p.in A single column matrix of rank scores, with row.names indicating the gRNA labels.
##' @return A numeric rho value corresponding to the minimum rank order P. 
##' @author Russell Bainer, modified from code from the \code{RobustRankAggreg} package. 
##' 
##' Citation: 
##' Kolde, R. et al, Bioinformatics. 2012 Feb 15;28(4):573-80. doi: 10.1093/bioinformatics/btr709.
##' @import RobustRankAggreg
##' @keywords internal
##' @examples
##' library(RobustRankAggreg)
##' testp <- runif(20)
##'
##' min(betaScores(testp))
##' ct.alphaBeta(testp)  
##' @export
ct.alphaBeta <- function(p.in){ 
    n <- length(p.in)  
    return(min(pbeta(p.in, 1:n, n - (1:n) + 1)))
}


##' @title Aggregation of P-value Ranks using a Beta Distribution and Alpha Cutoff  
##' @description This function is called internally as a single instance of the beta aggregation step in RRAa. Users should not interact with it directly. 
##' The expected input is a list of rank statistics, and a paired \code{alpha} argument defining which values to consider in downstream analyses (see below). 
##' @param p A single column matrix of rank statistics, with \code{row.names} indicating the gRNA labels. 
##' @param g.key data.frame with guide and gene names
##' @param shuffle Logical indicating whether to shuffle the rank statistics prior to calculating the rho statistics (useful for permutation).
##' @return Nothing, or a named list of target-level P-values, which are treated as a rho statistic in the permutation step. 
##' @author Russell Bainer
##' @keywords internal
##' @examples 
##' data('fit')
##' data('ann')
##' geneScores <- ct.RRAalpha(fit$p.value, ann, shuffle = FALSE)
##' @export
ct.RRAalpha <- function(p, g.key, shuffle = FALSE) {
    stopifnot(identical(rownames(p),rownames(g.key)))

    if(shuffle){
        p <- sample(p, nrow(p))
    }

    symbol = g.key$geneSymbol
    ord = order( symbol, p )
    p.collect <- split(p[ord], symbol[ord])
    rhoscores <- vapply(p.collect, ct.alphaBeta, numeric(1))
    return(rhoscores)
}  

##' @title gRNA signal aggregation via RRAa
##' @description This is a wrapper function implementing the RRAalpha p-value aggregation algorithm. Takes in a set of gRNA rank scores (formatted as a single-column 
##' numeric matrix with row.names indicating the guide names) and a list object of gRNA annotations (names are the gene targets, and each element of the list contains 
##' a vector of the corresponding guide names). The rank scores are converted to gene-level statistics that are thenm transformed into empirical p-values by permutation. 
##' @param p A single column matrix of ranking scores, with row.names indicating the gRNA labels
##' @param g.key An annotation data frame of gRNAs, minimally containing a factorized "geneSymbol" column indicating the target names. This is typically generated by calling the \code{ct.buildKeyFromAnnotation()} function.  
##' @param permute Number of permutations to be used during empirical p-value estimation.
##' @param permutation.seed numeric seed for permutation reproducibility.
##'   Default: \code{NULL} means to not set any seed.
##' @param multi.core Deprecated, does nothing
##' @return A named list of target-level empirical P-values. 
##' @author Russell Bainer
##' @keywords internal
##' @examples 
##' data('fit')
##' data('ann')
##' genePvals <- ct.RRAaPvals(fit$p.value, ann, permute = 100)
##' @export
ct.RRAaPvals <- function(p, g.key, permute, permutation.seed = NULL, multi.core=NULL) {
    
    ##Input checks
    if (!is.null(multi.core)) {
        warning("The multi.core argument of ct.RRAaPvals is deprecated. This function is now sufficiently fast on one core.")
    }
    if( ! (is.matrix(p) && ncol(p) == 1) ) {
        stop("P-values should be input as a single-column matrix with row names contained in the gs.list")
    }
    if(ncol(p) > 1) {
        p = p[,1,drop=FALSE]
        warning(paste('Multiple columns are present in the p-value matrix. Using the first column:',colnames(p)))
    }
    if(!is.data.frame(g.key)){stop("The annotation provided must be a data frame.")}
    if(!("geneSymbol" %in% names(g.key))){stop("The provided annotation does not contain a geneSymbol column.")}
    if(!identical(row.names(g.key), row.names(p))){stop("Provided p-value list have mismatched names.")}

    if (anyNA(p)) {
        notna = ! is.na(p)
        p = p[ notna ]
        g.key = g.key[ notna, ]
    }
    
    is.null(permutation.seed) || is.numeric(permutation.seed) ||
        stop("'permutation.seed' must be numeric or NULL.")
    set.seed(permutation.seed) # default NULL will have no effect

    ##Everything apparently ok, generate P-values. 
    obs <- ct.RRAalpha(p, g.key)

    message(paste("Permuting", permute, 'times, this may take a minute ...'))

    n.guides = table(g.key$geneSymbol)
    n.guides.table = sort.int(unique(n.guides))
    rho.nulls = lapply(n.guides.table,ct.makeRhoNull,p,permute)
    names(rho.nulls) = as.character(n.guides.table)
    stopifnot(identical( names(n.guides), names(obs) ))
    out = unlist(mapply(
        obs, as.character(n.guides),
        FUN=function(x,y) {
            mean( rho.nulls[[y]] <= x )
        }, SIMPLIFY=FALSE),recursive=FALSE)

    return(out)
}

##' Make null distribution for RRAalpha tests
##'
##' Makes random distribution of Rho value by taking nperm random samples of n rank stats, p.
##' @param n single integer, number of guides per gene
##' @param p numeric vector of rank statistics
##' @param nperm single integer, how many random samples to take.
##' @return numeric vector of Rho values
##' @export 
ct.makeRhoNull <- function(n,p,nperm) {
    message(paste("Making Rho null distribution for",n,"guides per gene."))
    vapply(1:nperm,
           function(i) {
               p.in = sort.int(sample(p,n,replace=FALSE))
               ct.alphaBeta(p.in)
           },
           numeric(1)
           )
}

##' @title Create Batches of Null Permutations for a Crispr Screen
##' @description This is a wrapper function to partition batches of calls to \code{ct.RRAalpha()} for multicore processing. 
##' It is called internally as a single instance of the beta aggregation step in RRAa. Users should not interact with it directly. 
##' @param p A single column matrix of rank statistics, with row.names indicating the gRNA labels. 
##' @param g.key data.frame with guide and gene names
##' @param result.environment The target environment containing the quasi-global variables incremented during the permutations in the child functions. 
##' @param batch.size Number of iterations to deploy to each daughter process. 
##' @param permutation.seed numeric seed for permutation reproducibility. Default is \code{NULL}, in which case no seed is set.
##' @return An integer vector indicating the number of iterations in which each gene's score was better than those indicated in \code{result.environment$obs}. 
##' @keywords internal
##' @author Russell Bainer
##' @export
ct.RRAalphaBatch <- function(p, g.key, result.environment, batch.size = 100, 
                             permutation.seed = NULL){

    .Deprecated("ct.makeRhoNull",msg="Constructing null distributions is now fast, so ct.RRAalphaBatch is unnecessary.")
    ##make a batch environment
    batch.env <- new.env()
    batch.env$obs <- result.environment$obs
    batch.env$target.positive.iterations <- rep.int(0, result.environment$ngenes)
    ## run permutations and increment as needed
    set.seed(permutation.seed) # default NULL will have no effect

    invisible(replicate(batch.size, ct.RRAalpha(p, g.key, shuffle = TRUE)))
    
    return(batch.env$target.positive.iterations)
}

Try the gCrisprTools package in your browser

Any scripts or data that you put into this service are public.

gCrisprTools documentation built on Nov. 17, 2017, 1:37 p.m.