#' @title Calculate the median of the sampled copy number values from bins
#' associated to selected segments.
#'
#' @description This function calculates the median of the sampled copy number
#' values from bins associated to selected segments. The median of the samples
#' copy number values can be calculated multiple times for the same segments
#' (sampling with replacement, bootstrap).
#' There is two way to select the number of times the median is calculated for
#' a segment. The first way is using a minimum number of bins per
#' segments; the integer division of the current number of bins with the
#' specified minimum number of bins gives the number of times the median is
#' calculated. The second way is to pass an integer that directly specifying
#' the number of times the median is calculated for each segment.
#'
#' @param mysegs a \code{data.frame} containing information about the
#' segments in 5 columns: \itemize{
#' \item{\code{StartProbe}}{ a \code{numeric} that tabulates the (integer)
#' start position of each segment in internal units such as probe numbers.}
#' \item{\code{EndProbe}}{ a \code{numeric} that tabulates the (integer)
#' end position of each segment in internal units such as probe numbers.}
#' \item{\code{chrom}}{ a \code{numeric} representing the chromosome.}
#' \item{\code{segmedian}}{ a \code{numeric} representing the median for the
#' group of bins associated to one segment}
#' \item{\code{segmad}}{ a \code{numeric} representing the median absolute
#' deviation for the group of bins associated to one segment}
#' }
#'
#' @param ratcol a \code{vector} containing the copy number values (usually in
#' log2) for each bin associated to a segment present in \code{mysegs}. The
#' length of the \code{vector} should correspond to the total number of bins
#' present in \code{mysegs}.
#'
#' @param startcol a \code{character} string specifying the name of column
#' in \code{mysegs} that tabulates the (integer) start position of each segment
#' in internal units such as probe numbers for data of CGH microarray origin.
#' Default: "StartProbe".
#'
#' @param endcol a \code{character} string specifying the name of column
#' in \code{mysegs} that tabulates the (integer) end postion of each segment
#' in internal units such as probe numbers for data of CGH microarray origin.
#' Default: "EndProbe".
#'
#' @param blocksize a \code{integer} specifying how many bins must be
#' present in a segment so that the segment is selected to be sampled.
#' Either \code{blocksize} or \code{times} must be specified by user.
#' Default: \code{0}.
#'
#' @param times a \code{integer} specifying the number of times
#' each segment must be sampled.
#' Either \code{blocksize} or \code{times} must be specified by user.
#' Default: \code{0}.
#'
#' @return a \code{data.frame} containing the information about the selected
#' segments and the median of the sampled copy number values with replacement
#' from the associated bins. It contains 3 columns:
#' \itemize{
#' \item{\code{StartProbe}}{ a \code{numeric} that tabulates the (integer)
#' start position of each segment in internal units such as probe numbers.}
#' \item{\code{EndProbe}}{ a \code{numeric} that tabulates the (integer)
#' end position of each segment in internal units such as probe numbers.}
#' \item{\code{NoName}}{ a \code{numeric} representing the median value of the
#' sampled bins.}
#' }
#'
#' @examples
#'
#' ## Create a data.frame with 3 segments on chromosome 1
#' segData <- data.frame(StartProbe=c(1, 9, 13), EndProbe=c(8, 12, 15),
#' chrom=c(1,1,1), segmedian=c(0.06662475, 0.06719237, 0.07111544),
#' segmad=c(0.06213208, 0.04722233, 0.07633202))
#'
#' ## Copy number ratio (in log2) for each bin
#' ## Multiples bins are associated to 1 segment
#' ratcol <- c(0.062073840, 0.10913919, 0.143459489, 0.033994620,
#' -0.072243732, 0.082252725, 0.151908930, 0.101589490, 0.08554752,
#' -0.011155011, -0.122291649, 0.063634112, 0.110149474, 0.043328961,
#' 0.1632174529)
#'
#' ## Use an integer division to determine the number of times each
#' ## segment is sampled
#' CNprep:::segsample(mysegs=segData, ratcol=ratcol, blocksize=4)
#'
#' ## Each segment is sampled the same number of times
#' CNprep:::segsample(mysegs=segData, ratcol=ratcol, times=2)
#'
#' @author Alexander Krasnitz, Guoli Sun
#' @keywords internal
segsample <- function(mysegs, ratcol, startcol="StartProbe",
endcol="EndProbe", blocksize=0, times=0)
{
## At least one parameter (blocksize of times) must be set
if(blocksize == 0 & times == 0) {
stop("One of blocksize or times must be set")
}
## Only one parameter (blocksize of times) must be set
if(blocksize != 0 & times != 0) {
stop("Only one of blocksize or times can be set")
}
## Number of bootstraps done one each segment depends of the
## segment length if blocksize parameter is used
## Otherwise, the number of bootstraps is the same for all segments
segtable <- mysegs[,c(startcol, endcol), drop=FALSE]
## Comment Pascal: at least one result should be different from zero
if (blocksize != 0) {
segtable <- segtable[rep(seq_len(nrow(segtable)),
times=(segtable[,endcol] - segtable[, startcol] + 1) %/%
blocksize),]
}
if (times != 0) {
segtable <- segtable[rep(seq_len(nrow(segtable)), each=times),]
}
## Calculate the mean of the sampled bins for each segment
## Each segment may be sampled more than once depending of the
## blocksize and times parameters
return(cbind(segtable, apply(segtable, 1, smedian.sample, v=ratcol)))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.