R/NormalizeBySlope.R
In gCrisprTools: Suite of Functions for Pooled Crispr Screen QC and Analysis

Documented in ct.normalizeBySlope ct.normalizeFactoredQuantiles ct.normalizeFQ ct.normalizeGuides ct.normalizeMedians

##' @title Normalize sample abundance estimates by the slope of the values in the central range
##' @description This function normalizes Crispr gRNA abundance estimates by equalizing the slopes of the middle (logged) values of the
##' distribution across samples. Specifically, the algorithm ranks the gRNA abundance estimates within each sample and determines a relationship between
##' rank change and gRNA within a trimmed region of the distribution via a linear fit. It then adjusts each sample such that the center of the logged
##' abundance distribution is strictly horizontal and returns these values as median-scaled counts in the appropriate slot of the input ExpressionObject.
##' @param ExpressionObject An ExpressionSet containing, at minimum, count data accessible by \code{exprs}, or an EList object with count data in the $E 
##' slot (usually returned by \link[limma]{voom}).
##' @param trim The proportion to be trimmed from each end of the distributionbefore performing the linear fit; algorithm defaults to 25% such that the
##' fit is performed on the interquartile range.
##' @param ... Other arguments to be passed to \code{ct.normalizeMedians()}, if desired.
##' @param lib.size An optional vector of size factor adjusted library size. 
##'   Default: \code{NULL} means to use sum of column counts as a lib.size.
##' @return A renormalized object of the same type as the provided object.
##' @author Russell Bainer
##' @import limma
##' @examples data('es')
##' data('ann')
##' 
##' #Build the sample key and library sizes for visualization
##' library(Biobase)
##' sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), "ControlReference"))
##' names(sk) <- row.names(pData(es))
##' ls <- colSums(exprs(es))
##' 
##' es.norm <- ct.normalizeBySlope(es, lib.size= ls)
##' ct.gRNARankByReplicate(es, sk, lib.size= ls)
##' ct.gRNARankByReplicate(es.norm, sk, lib.size= ls)
##' @export
ct.normalizeBySlope <-
  function(ExpressionObject,
           trim = 0.25,
           lib.size = NULL,
           ...) {
    
  if(!(class(ExpressionObject) %in% c("ExpressionSet", "EList"))){stop(paste(deparse(substitute(ExpressionObject)), "is not an ExpressionSet or Elist."))}

  #log them
  if(methods::is(ExpressionObject, "EList")){
    e.dat <- ExpressionObject$E
  }else{
    counts <- exprs(ExpressionObject)
    if (is.null(lib.size)){
      lib.size <- colSums(counts)
     }else if(!is.numeric(lib.size) | length(lib.size) != ncol(counts)){
        stop('If specified, lib.size must be a numeric vector of the same length as the number of samples in the eset.')
      }  
    e.dat <- t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))
    }

  #extract the inner 50% and fit a lm
  shuffle <- apply(e.dat, 2, sort, decreasing = TRUE, index.return = TRUE)
  locs <- round((nrow(e.dat)*trim)): round(nrow(e.dat)*(1-trim))
  slopes <- lapply(shuffle, function(x){lm(unlist(x[[1]][locs]) ~ locs)$coefficients[2]})

  
  #correct the values
  corrected <- vapply(names(slopes), function(x){
    outlist <- shuffle[[x]][[1]] + ((length(shuffle[[x]][[1]]):1) * slopes[[x]])
    return(outlist[row.names(e.dat)])}, 
    numeric(nrow(e.dat)))
  
  #Back to counts: 
  correctedCounts <- 2^corrected
  correctedCounts <- round(t(t(correctedCounts) * ((lib.size + 1) / 1e+06)) - 0.5)
  
  #update and return the object
  if(is(ExpressionObject, "ExpressionSet")){
    exprs(ExpressionObject) <- correctedCounts
    ExpressionObject <- ct.normalizeMedians(ExpressionObject, lib.size)
    return(ExpressionObject)
    }else{
      co <- 2^corrected
      new.v <- voom(correctedCounts, design = ExpressionObject$design, normalize.method="scale")
      return(new.v)
      }
}


##' @title Normalize an ExpressionSet Containing a Crispr Screen
##' @description This function normalizes Crispr gRNA abundance estimates contained in an \code{ExpressionSet} object.
##' Currently four normalization methods are implemented: median scaling (via \code{normalizeMedianValues}), slope-based
##' normalization (via \code{ct.normalizeBySlope()}), scaling to the median of the nontargeting control values (via 
##' \code{ct.normalizeNTC()}), and spline fitting to the distribution of the nontargeting gRNAs (via \code{ct.normalizeSpline()}). 
##' Because of the peculiarities of pooled Crispr screening data, these implementations may be more stable than the endogenous methods 
##' used downstream by \link[limma]{voom}. See the respective man pages for further details about specific normalization approaches.
##' @param eset An ExpressionSet object with integer count data extractable with \code{exprs()}.
##' @param method The normalization method to use.
##' @param annotation The annotation object for the library, required for the methods employing nontargeting controls.
##' @param sampleKey An (optional) sample key, supplied as an ordered factor linking the samples to experimental
##' variables. The \code{names} attribute should exactly match those present in \code{eset}, and the control set is assumed to be
##' the first \code{level}. If `method` = `FQ`, the sampleKey is taken as the `sets` argument (and its format requirements are similarly 
##' relaxed; see `?ct.normalizeFC`).
##' @param lib.size An optional vector of voom-appropriate library size adjustment factors, usually calculated with \code{\link[edgeR]{calcNormFactors}} 
##' and transformed to reflect the appropriate library size. These adjustment factors are interpreted as the total library sizes for each sample, 
##' and if absent will be extrapolated from the columnwise count sums of the \code{exprs} slot of the \code{eset}.
##' @param plot.it Logical indicating whether to plot the ranked log2 gRNA count distributions before and after normalization. 
##' @param ... Other parameters to be passed to the individual normalization methods.
##' @return A renormalized ExpressionSet. If specified, the sample level counts will be scaled so as to maintain the validity 
##' of the specified \code{lib.size} values. 
##' @author Russell Bainer
##' @import limma
##' @seealso \code{\link{ct.normalizeMedians}}, \code{\link{ct.normalizeBySlope}}, \code{\link{ct.normalizeNTC}}, \code{\link{ct.normalizeSpline}}
##' @examples data('es')
##' data('ann')
##' 
##' #Build the sample key as needed
##' library(Biobase)
##' sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), "ControlReference"))
##' names(sk) <- row.names(pData(es))
##' 
##' es.norm <- ct.normalizeGuides(es, 'scale', annotation = ann, sampleKey = sk, plot.it = TRUE)
##' es.norm <- ct.normalizeGuides(es, 'slope', annotation = ann, sampleKey = sk, plot.it = TRUE)
##' es.norm <- ct.normalizeGuides(es, 'controlScale', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget')
##' es.norm <- ct.normalizeGuides(es, 'controlSpline', annotation = ann, sampleKey = sk, plot.it = TRUE, geneSymb = 'NoTarget')
##' @export
ct.normalizeGuides <- function(eset, method = c("scale", 'FQ', "slope", "controlScale", "controlSpline"), annotation = NULL, sampleKey = NULL, lib.size = NULL, plot.it = FALSE, ...){
  if(!methods::is(eset, "ExpressionSet")){
    stop(paste(deparse(substitute(eset)), "is not an ExpressionSet."))
  }

  choices <- c("scale", 'FQ', "slope", "controlScale", "controlSpline")
  method <- match.arg(method, choices)

  if(method %in% c("controlScale", "controlSpline")){
    if(is.null(annotation)){
      stop("An annotation object must be provided to perform", deparse(substitute(method)), "normalization.")
    }
    annotation <- ct.prepareAnnotation(
      ann = annotation,
      object =  eset,
      controls = TRUE,
      throw.error = FALSE
    )
  }
  
  if(is.null(lib.size)){
    lib.size <- colSums(exprs(eset))
  } else if(!is.numeric(lib.size) | (length(lib.size) != ncol(eset))){
    stop('If specified, lib.size must be a numeric vector of length equal to the number of samples.')
  }
  
  
  new.eset <- switch(method, 
                     scale = ct.normalizeMedians(eset, lib.size = lib.size), 
                     FQ = ct.normalizeFQ(eset, sets = sampleKey, lib.size = lib.size),
                     slope = ct.normalizeBySlope(eset, lib.size = lib.size, ...), 
                     controlScale = ct.normalizeNTC(eset, annotation, lib.size = lib.size, ...), 
                     controlSpline = ct.normalizeSpline(eset, annotation, lib.size = lib.size, ...)
                    )
  # set negative counts to 0's if they happen to be present after normalization
  exprs(new.eset) <- apply(
    X = exprs(new.eset), 
    MARGIN = 2, 
    FUN = function(col) {
      col[col < 0] <- 0
      col
    })
    
  if(plot.it){
    par(mfrow = c(2, 1))
    ct.gRNARankByReplicate(eset, sampleKey, lib.size = lib.size) 
    ct.gRNARankByReplicate(new.eset, sampleKey, lib.size = lib.size)
   }
   return(new.eset)  
  }


##' @title Normalize sample abundance estimates by median gRNA counts
##' @description This function normalizes Crispr gRNA abundance estimates by equalizing the median gRNA abundance values after
##' correcting for library size. It does this by converting raw count values to log2 counts per million and optionally adjusting further in 
##' the usual way by dividing these values by user-specified library size factors. THis method should be more stable than the endogenous 
##' scaling functions used in \code{voom} in th especific case of Crispr screens or other cases where the median number of observed counts may be low. 
##' @param eset An \code{ExpressionSet} containing, at minimum, count data accessible by \code{exprs}.
##' @param lib.size An optional vector of voom-appropriate library size adjustment factors, usually calculated with \code{\link[edgeR]{calcNormFactors}} 
##' and transformed to reflect the appropriate library size. These adjustment factors are interpreted as the total library sizes for each sample, 
##' and if absent will be extrapolated from the columnwise count sums of the \code{exprs} slot of the \code{eset}.
##' @return A renormalized ExpressionSet object of the same type as the provided object.
##' @author Russell Bainer
##' @import limma
##' @examples data('es')
##' 
##' #Build the sample key and library sizes for visualization
##' library(Biobase)
##' sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), "ControlReference"))
##' names(sk) <- row.names(pData(es))
##' ls <- colSums(exprs(es))
##' 
##' es.norm <- ct.normalizeMedians(es, lib.size= ls)
##' ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls)
##' ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
##' @export
ct.normalizeMedians <- function(eset, lib.size = NULL){
  if(!methods::is(eset, 'ExpressionSet')){
    stop("Please provide an ExpressionSet object for normalization.")
  }
  
  counts <- exprs(eset)
  
  if (is.null(lib.size)){
    lib.size <- colSums(counts)
  }else if(!is.numeric(lib.size) | length(lib.size) != ncol(counts)){
      stop('If specified, lib.size must be a numeric vector of the same length as the number of samples in the eset.')
    } 
  
  y <- t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))
  cmed <- apply(y, 2, median, na.rm = TRUE)
  cmed <- cmed - mean(cmed)
  correctedCounts <- 2^t(t(y) - cmed)
  correctedCounts <- (t(t(correctedCounts) * ((lib.size + 1) / 1e+06)) - 0.5)
  
  exprs(eset) <- round(correctedCounts)
  return(eset)
  }

##' @title Factored Quantile Normalization
##' @description This function applies quantile normalization to subsets of samples defined by a provided factor, correcting for library size. 
##' It does this by converting raw count values to log2 counts per million and optionally adjusting further in 
##' the usual way by dividing these values by user-specified library size factors; then this matrix is split into groups according to the 
##' provided factor that are quantile normalized, and then the groups are median scaled to each other before conversion back into 
##' raw counts. This method is best used in comparisons for long timecourse screens, where groupwise differences in growth rate 
##' cause uneven intrinsic dialation of construct distributions.
##' 
##' Note that this normalization strategy is not appropriate for experiments where significant distortion of the libraries is expected as a 
##' consequence of the screening strategy (e.g., strong selection screens). 
##' @param eset An \code{ExpressionSet} containing, at minimum, count data accessible by \code{exprs}.
##' @param sets A character or factor object delineating which samples shoudl be grouped together during the normalization step. Must 
##' be the same length as the number of columns in the provided eset, and cannot contain `NA` or `NULL` values. 
##' @param lib.size An optional vector of voom-appropriate library size adjustment factors, usually calculated with \code{\link[edgeR]{calcNormFactors}} 
##' and transformed to reflect the appropriate library size. These adjustment factors are interpreted as the total library sizes for each sample, 
##' and if absent will be extrapolated from the columnwise count sums of the \code{exprs} slot of the \code{eset}.
##' @return A renormalized ExpressionSet object of the same type as the provided object.
##' @author Russell Bainer
##' @import limma
##' @examples data('es')
##' 
##' #Build the sample key and library sizes for visualization
##' library(Biobase)
##' sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), "ControlReference"))
##' names(sk) <- row.names(pData(es))
##' ls <- colSums(exprs(es))
##' 
##' es.norm <- ct.normalizeFQ(es, sets = gsub('(Death|Control)', '', pData(es)$TREATMENT_NAME), lib.size= ls)
##' ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls)
##' ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
##' @export
ct.normalizeFQ <- function(eset, sets, lib.size = NULL){
  if(!methods::is(eset, 'ExpressionSet')){
    stop("Please provide an ExpressionSet object for normalization.")
  }
  
  stopifnot(length(na.omit(sets)) == ncol(eset), !any(vapply(sets, is.null, logical(1))))
  
  counts <- exprs(eset)
  
  if (is.null(lib.size)){
    lib.size <- colSums(counts)
  }else if(!is.numeric(lib.size) | length(lib.size) != ncol(counts)){
    stop('If specified, lib.size must be a numeric vector of the same length as the number of samples in the eset.')
  } 
  
  y <- t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))
  
  #Apply Factored Quantiles
  quant <- lapply(unique(sets), function(x){limma::normalizeQuantiles(y[,(sets == x)])})
  y <- do.call('cbind', quant)
  
  cmed <- apply(y, 2, median, na.rm = TRUE)
  cmed <- cmed - mean(cmed)
  correctedCounts <- 2^t(t(y) - cmed)
  correctedCounts <- (t(t(correctedCounts) * ((lib.size + 1) / 1e+06)) - 0.5)
  
  exprs(eset) <- round(correctedCounts)
  return(eset)
}

##' @title Apply Factored Quantile Normalization to gRNA counts
##' @description This function normalizes Crispr gRNA abundance estimates by equalizing the median gRNA abundance values after
##' correcting for library size. It does this by converting raw count values to log2 counts per million and optionally adjusting further in 
##' the usual way by dividing these values by user-specified library size factors. THis method should be more stable than the endogenous 
##' scaling functions used in \code{voom} in th especific case of Crispr screens or other cases where the median number of observed counts may be low. 
##' @param eset An \code{ExpressionSet} containing, at minimum, count data accessible by \code{exprs}.
##' @param lib.size An optional vector of voom-appropriate library size adjustment factors, usually calculated with \code{\link[edgeR]{calcNormFactors}} 
##' and transformed to reflect the appropriate library size. These adjustment factors are interpreted as the total library sizes for each sample, 
##' and if absent will be extrapolated from the columnwise count sums of the \code{exprs} slot of the \code{eset}.
##' @return A renormalized ExpressionSet object of the same type as the provided object.
##' @author Russell Bainer
##' @import limma
##' @examples data('es')
##' 
##' #Build the sample key and library sizes for visualization
##' library(Biobase)
##' sk <- ordered(relevel(as.factor(pData(es)$TREATMENT_NAME), "ControlReference"))
##' names(sk) <- row.names(pData(es))
##' ls <- colSums(exprs(es))
##' 
##' es.norm <- ct.normalizeMedians(es, lib.size= ls)
##' ct.gRNARankByReplicate(es, sampleKey = sk, lib.size= ls)
##' ct.gRNARankByReplicate(es.norm, sampleKey = sk, lib.size= ls)
##' @export
ct.normalizeFactoredQuantiles <- function(eset, lib.size = NULL){
  if(!methods::is(eset, 'ExpressionSet')){
    stop("Please provide an ExpressionSet object for normalization.")
  }
  
  counts <- exprs(eset)
  
  if (is.null(lib.size)){
    lib.size <- colSums(counts)
  }else if(!is.numeric(lib.size) | length(lib.size) != ncol(counts)){
    stop('If specified, lib.size must be a numeric vector of the same length as the number of samples in the eset.')
  } 
  
  y <- t(log2(t(counts + 0.5)/(lib.size + 1) * 1e+06))
  cmed <- apply(y, 2, median, na.rm = TRUE)
  cmed <- cmed - mean(cmed)
  correctedCounts <- 2^t(t(y) - cmed)
  correctedCounts <- (t(t(correctedCounts) * ((lib.size + 1) / 1e+06)) - 0.5)
  
  exprs(eset) <- round(correctedCounts)
  return(eset)
}
Any scripts or data that you put into this service are public.
gCrisprTools documentation built on Nov. 8, 2020, 8:17 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
gCrisprTools
Suite of Functions for Pooled Crispr Screen QC and Analysis

R/NormalizeBySlope.R
In gCrisprTools: Suite of Functions for Pooled Crispr Screen QC and Analysis

Defines functions ct.normalizeFactoredQuantiles ct.normalizeFQ ct.normalizeMedians ct.normalizeGuides ct.normalizeBySlope

Documented in ct.normalizeBySlope ct.normalizeFactoredQuantiles ct.normalizeFQ ct.normalizeGuides ct.normalizeMedians

Try the gCrisprTools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gCrisprTools Suite of Functions for Pooled Crispr Screen QC and Analysis

R/NormalizeBySlope.R In gCrisprTools: Suite of Functions for Pooled Crispr Screen QC and Analysis

Defines functions ct.normalizeFactoredQuantiles ct.normalizeFQ ct.normalizeMedians ct.normalizeGuides ct.normalizeBySlope

Documented in ct.normalizeBySlope ct.normalizeFactoredQuantiles ct.normalizeFQ ct.normalizeGuides ct.normalizeMedians

Try the gCrisprTools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

gCrisprTools
Suite of Functions for Pooled Crispr Screen QC and Analysis

R/NormalizeBySlope.R
In gCrisprTools: Suite of Functions for Pooled Crispr Screen QC and Analysis