View source: R/facets-wrapper.R
preProcSample | R Documentation |
Processes a snp read count matrix and generates a segmentation tree
preProcSample( rcmat, ndepth = 35, het.thresh = 0.25, snp.nbhd = 250, cval = 25, deltaCN = 0, gbuild = c("hg19", "hg38", "hg18", "mm9", "mm10"), hetscale = TRUE, unmatched = FALSE, MandUnormal = FALSE, ndepthmax = 5000, spanT = 0.2, spanA = 0.2, spanX = 0.2, donorCounts = NULL )
rcmat |
data frame with 6 required columns: Chrom, Pos, NOR.DP, NOR.RD, TUM.DP and TUM.RD. Additional variables are ignored. Ref and Alt columns required for transplant cases with option donorCounts. |
ndepth |
minimum normal sample depth to keep |
het.thresh |
vaf threshold to call a SNP heterozygous |
snp.nbhd |
window size |
cval |
critical value for segmentation |
deltaCN |
minimum detectable difference in CN from diploid state |
gbuild |
genome build used for the alignment of the genome. Default value is human genome build hg19. Other possibilities are hg38 & hg18 for human and mm9 & mm10 for mouse. Chromosomes used for analysis are 1-22, X for humans and 1-19 for mouse. Option udef can be used to analyze other genomes. |
hetscale |
(logical) variable to indicate if logOR should get more weight in the test statistics for segmentation and clustering. Usually only 10 % of snps are hets and hetscale gives the logOR contribution to T-square as 0.25/proportion of hets. |
unmatched |
indicator of whether the normal sample is unmatched. When this is TRUE hets are called using tumor reads only and logOR calculations are different. Use het.thresh = 0.1 or lower when TRUE. |
MandUnormal |
analyzing both matched and unmatched normal for log ratio normalization |
ndepthmax |
loci for which normal coverage exceeds this number (default is 1000) will be discarded as PCR duplicates. Fof high coverage sample increase this and ndepth commensurately. |
spanT |
span value tumor |
spanA |
span value autosomes |
spanX |
span value X |
donorCounts |
snp read count matrix for donor sample(s). Required columns: Chromosome Position Ref Alt and for each donor sample,i: RefDonoriR RefDonoriA RefDonoriE RefDonoriD RefDonoriDP |
The SNPs in a genome are not evenly spaced. Some regions have multiple SNPs in a small neighborhood. Thus using all loci will induce serial correlation in the data. To avoid it we sample loci such that only a single locus is used in an interval of length snp.nbhd. So in order to get reproducible results use set.seed to fix the random number generator seed.
pmat |
Read counts and other elements of all the loci |
seg.tree |
a list of matrices one for each chromosome. the matrix gives the tree structure of the splits. each row corresponds to a segment with the parent row as the first element the start-1 and end index of each segment and the maximal T^2 statistic. the first row is the whole chromosome and its parent row is by definition 0. |
jointseg |
The data that were segmented. Only the loci that were sampled within a snp.nbhd are present. segment results given. |
hscl |
scaling factor for logOR data. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.