segsample: Calculate the median of the sampled copy number values from...

View source: R/segsample.R

segsampleR Documentation

Calculate the median of the sampled copy number values from bins associated to selected segments.

Description

This function calculates the median of the sampled copy number values from bins associated to selected segments. The median of the samples copy number values can be calculated multiple times for the same segments (sampling with replacement, bootstrap). There is two way to select the number of times the median is calculated for a segment. The first way is using a minimum number of bins per segments; the integer division of the current number of bins with the specified minimum number of bins gives the number of times the median is calculated. The second way is to pass an integer that directly specifying the number of times the median is calculated for each segment.

Usage

segsample(
  mysegs,
  ratcol,
  startcol = "StartProbe",
  endcol = "EndProbe",
  blocksize = 0,
  times = 0
)

Arguments

mysegs

a data.frame containing information about the segments in 5 columns:

  • StartProbe a numeric that tabulates the (integer) start position of each segment in internal units such as probe numbers.

  • EndProbe a numeric that tabulates the (integer) end position of each segment in internal units such as probe numbers.

  • chrom a numeric representing the chromosome.

  • segmedian a numeric representing the median for the group of bins associated to one segment

  • segmad a numeric representing the median absolute deviation for the group of bins associated to one segment

ratcol

a vector containing the copy number values (usually in log2) for each bin associated to a segment present in mysegs. The length of the vector should correspond to the total number of bins present in mysegs.

startcol

a character string specifying the name of column in mysegs that tabulates the (integer) start position of each segment in internal units such as probe numbers for data of CGH microarray origin. Default: "StartProbe".

endcol

a character string specifying the name of column in mysegs that tabulates the (integer) end postion of each segment in internal units such as probe numbers for data of CGH microarray origin. Default: "EndProbe".

blocksize

a integer specifying how many bins must be present in a segment so that the segment is selected to be sampled. Either blocksize or times must be specified by user. Default: 0.

times

a integer specifying the number of times each segment must be sampled. Either blocksize or times must be specified by user. Default: 0.

Value

a data.frame containing the information about the selected segments and the median of the sampled copy number values with replacement from the associated bins. It contains 3 columns:

  • StartProbe a numeric that tabulates the (integer) start position of each segment in internal units such as probe numbers.

  • EndProbe a numeric that tabulates the (integer) end position of each segment in internal units such as probe numbers.

  • NoName a numeric representing the median value of the sampled bins.

Author(s)

Alexander Krasnitz, Guoli Sun

Examples


## Create a data.frame with 3 segments on chromosome 1
segData <- data.frame(StartProbe=c(1, 9, 13), EndProbe=c(8, 12, 15),
    chrom=c(1,1,1), segmedian=c(0.06662475, 0.06719237, 0.07111544),
    segmad=c(0.06213208, 0.04722233, 0.07633202))
    
## Copy number ratio (in log2) for each bin 
## Multiples bins are associated to 1 segment
ratcol <- c(0.062073840, 0.10913919,  0.143459489,  0.033994620, 
    -0.072243732, 0.082252725,  0.151908930,  0.101589490,  0.08554752, 
    -0.011155011, -0.122291649, 0.063634112,  0.110149474,  0.043328961,  
    0.1632174529)
    
## Use an integer division to determine the number of times each
## segment is sampled
CNprep:::segsample(mysegs=segData, ratcol=ratcol, blocksize=4)

## Each segment is sampled the same number of times
CNprep:::segsample(mysegs=segData, ratcol=ratcol, times=2)


KrasnitzLab/CNprep documentation built on May 28, 2022, 8:32 p.m.