R/peeling.R

Defines functions peeling

Documented in peeling

#' Apply the peeling procedure at a given marker
#'
#' @param x An n by m numeric matrix containing DNA copy number data from n subjects at m markers.
#'
#' @param marker.data marker.data A two-column numeric matrix of marker position data for markers in the 
#'   autosomes.  Column 1 contains the chromosome number for each marker, and column 2 contains the position 
#'   (in base pairs) for each markers.  This is a submatrix of the marker position matrix used by 
#'   \code{\link{quickLook}} and \code{\link{detailedLook}}.
#'
#' @param cytoband A character vector of length m that contains the chromosome arm (p or q) for each 
#'   marker.  This is produced by the \code{\link{makeCytoband}} function.
#'
#' @param k A positive integer between 1 and m that represents the most aberrant marker.
#'
#' @return A list containing two components:  (1) the n by m matrix produced by applying the peeling algorithm 
#'   to the matrix \code{x} at marker \code{k}, and (2) the peak interval around marker \code{k}, as described 
#'   in Bioinformatics (2011) 27(5) 678 - 685.
#'
#' @details The peeling procedure is detailed in Algorithm 2 of Bioinformatics (2011) 27(5) 678 - 685, but here 
#'   we provide a brief overview.  By construction, marker \code{k} represents the most aberrant gain (loss).  
#'   The peeling procedure rescales all copy number values in \code{x} that contribute to making marker \code{k} 
#'   aberrant, so that after applying the peeling procedure marker \code{k} is "null."  By construction, the 
#'   rescaling procedure is restricted to entries in \code{x} that correspond to markers in the same chromosome 
#'   arm as \code{k}.  This allows users to assess the statistical significance of multiple gains (losses) throughout 
#'   the genome.
#'
#' @export

peeling = function(x, marker.data, cytoband, k) 
	{
    n = dim(x)[1]
    m = dim(x)[2]
    rowmeans = rowMeans(x)
    grandmean = mean(rowmeans)
    chr = marker.data[k, 1]
    chr.start = min(which(marker.data[, 1] == chr))
    chr.end = max(which(marker.data[, 1] == chr))
    chr.m = chr.end - chr.start + 1
    chr.k = k - chr.start + 1
    chr.data = x[, chr.start:chr.end]
    col.means = colMeans(chr.data)
    col.indicator = as.numeric(col.means > grandmean)
    col.indicator = recodeBinary(col.indicator, chr.k)
    exceed = rep(1, n) %o% col.indicator
    which.band = cytoband[k]
    band.vec = cytoband[chr.start:chr.end]
    band.ind.vec = as.numeric(band.vec == which.band)
    band.matrix = rep(1, n) %o% band.ind.vec
    exceed = exceed * band.matrix
    peak.interval = as.numeric(colSums(exceed) > 0)
    chr.left.side = min(which(peak.interval == 1))
    chr.right.side = max(which(peak.interval == 1))
    left.side = chr.left.side + chr.start - 1
    right.side = chr.right.side + chr.start - 1
    columnmean = mean(chr.data[, chr.k])
    exceed.running = matrix(0, n, chr.m)
    exceed.running[, chr.k] = as.numeric(chr.data[, chr.k] > 
        rowmeans)
    if (chr.k > 1) {
        for (j in (chr.k - 1):1) {
            exceed.running[, j] = exceed.running[, (j + 1)] * 
                exceed[, j] * as.numeric(chr.data[, j] > rowmeans)
        }
    }
    if (chr.k < chr.m) {
        for (j in (chr.k + 1):chr.m) {
            exceed.running[, j] = exceed.running[, (j - 1)] * 
                exceed[, j] * as.numeric(chr.data[, j] > rowmeans)
        }
    }
    not.imputed = chr.data * (1 - exceed.running)
    to.be.imputed = chr.data * exceed.running
    num.term1 = n * grandmean
    num.term2 = sum(not.imputed[, chr.k])
    num.term3 = sum(exceed.running[, chr.k] * rowmeans)
    den.term = sum(to.be.imputed[, chr.k]) - num.term3
    tau = (num.term1 - num.term2 - num.term3)/den.term
    imputed = tau * (to.be.imputed - (matrix(rowmeans, n, chr.m) * 
        exceed.running)) + (matrix(rowmeans, n, chr.m) * exceed.running)
    final.matrix = imputed + not.imputed
    x[, chr.start:chr.end] = final.matrix
    output.list = list(x, c(left.side, right.side))
    return(output.list)
	}

Try the dinamic package in your browser

Any scripts or data that you put into this service are public.

dinamic documentation built on May 29, 2024, 8:45 a.m.