runVarbin | R Documentation |
runVarbin performs the variable binning (VarBin) algorithm to a series of BAM files resulting from short-read sequencing.
runVarbin(
dir,
genome = c("hg38", "hg19"),
resolution = c("220kb", "55kb", "110kb", "195kb", "280kb", "500kb", "1Mb", "2.8Mb"),
remove_Y = FALSE,
is_paired_end = FALSE,
method = c("CBS", "multipcf"),
vst = c("ft", "log"),
seed = 17,
min_bincount = 10,
alpha = 1e-05,
merge_levels_alpha = 1e-05,
gamma = 40,
name = "segment_ratios",
BPPARAM = bpparam()
)
dir |
A path containing .BAM files from short-read sequencing. |
genome |
A character indicating the choice of genome assembly. |
resolution |
A character indicating the resolution for the scaffold of the VarBin method, i. e. the bin resulting bin size. |
remove_Y |
A boolean when set to TRUE, removes information from the chrY from the dataset. |
is_paired_end |
A boolean indicating if bam files are from single-read or pair end sequencing. |
method |
A character indicating the segmentation method. |
vst |
A character indicating the variance stabilization transformation to be performed. See runVst details. |
seed |
A numeric scalar that sets the seed for CBS segmentation permutation reproducibility. |
min_bincount |
A numerical indicating the minimum mean bin counts a cell should have to remain in the dataset. |
alpha |
A numeric with the. significance levels for the test to accept
change-points for CBS segmentation. See |
merge_levels_alpha |
A numeric with the significance levels for the merge levels test to accept two different segments. |
gamma |
A numeric passed on to 'multipcf' segmentation. Penalty for each
discontinuity in the curve, default is 40. See |
name |
A character with the name for the slot returned by |
BPPARAM |
A BiocParallelParam specifying how the function should be parallelized. |
runVarbin is a convenient wrapper for CopyKit's pre-processing module.
It runs runCountReads
, runVst
and, runSegmentation
.
runCountReads
takes as input duplicate marked BAM files from whole
genome sequencing and runs the variable binning pipeline algorithm. Briefly,
the genome is split into pre-determined bins. The bin size is controlled by
the argument bin_size
. By using VarBin, for a diploid cell, each bin
will receive equal amount of reads, controlling for mappability.
A lowess function is applied to perform GC correction across the bins. The argument
genome
can be set to 'hg38' or 'hg19' to select the scaffolds genome
assembly.
Information regarding the alignment of the reads to the bins and from the bam
files are stored in the #' colData
.
runVst
performs variance stabilization to reduce the overdispersion
from the negative binomial distribution nature of the bin counts and reduce
technical bias. The argument vst
controls the choice of the transformation
allowing either the Freeman-Tukey transformation by using the option 'ft' (recommended)
or a logarithmic transformation with the option 'log'. Using a 'log' transformation
may result in long segmentation times for a few cells with large breakpoint counts.
runSegmentation
Fits a piece-wise constant function to the transformed
the smoothed bin counts. Bin counts are smoothed with
smooth.CNA
. Segmentation can be chosen to one of the
following:
CBS: runSegmentation
Fits a piece-wise constant function
to the transformed the smoothed bin counts. Bin counts are smoothed with
smooth.CNA
using the Circular Binary Segmentation
(CBS) algorithm from segment
with default it applies
undo.prune with value of 0.05.
multipcf: Performs the joint segmentation from the copynumber
package using the multipcf
function. By fitting
piecewise constant curves with common breakpoints for all samples.
The resulting segment means are further refined with MergeLevels to join adjacent segments with non-significant differences in segmented means.
An scCNA object containing the bin counts, the ratios and the segment ratios.
Darlan Conterno Minussi
Navin, N., Kendall, J., Troge, J. et al. Tumour evolution inferred by single-cell sequencing. Nature 472, 90–94 (2011). https://doi.org/10.1038/nature09807
Baslan, T., Kendall, J., Ward, B., et al (2015). Optimizing sparse sequencing of single cells for highly multiplex copy number profiling. Genome research, 25(5), 714–724. https://doi.org/10.1101/gr.188060.114
Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics. 2004 Oct;5(4):557-72. doi: 10.1093/biostatistics/kxh008. PMID: 15475419.
Freeman, M. F.; Tukey, J. W. (1950), "Transformations related to the angular and the square root", The Annals of Mathematical Statistics, 21 (4), pp. 607–611, doi:10.1214/aoms/1177729756, JSTOR 2236611
Fryzlewicz, P. (2014). WILD BINARY SEGMENTATION FOR MULTIPLE CHANGE-POINT DETECTION. The Annals of Statistics, 42(6), 2243-2281. Retrieved July 30, 2021, from http://www.jstor.org/stable/43556493
Nilsen G, Liestol K, Van Loo P, Vollan H, Eide M, Rueda O, Chin S, Russell R, Baumbusch L, Caldas C, Borresen-Dale A, Lingjaerde O (2012). “Copynumber: Efficient algorithms for single- and multi-track copy number segmentation.” BMC Genomics, 13(1), 591.
## Not run:
copykit_obj <- runVarbin("~/path/to/bam/files/", remove_Y = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.