preProcSample: Pre-process a sample
In rptashkin/facets2n: Cellular Faction and Copy Numbers from Tumor Sequencing

preProcSample

R Documentation

Pre-process a sample

Description

Processes a snp read count matrix and generates a segmentation tree

Usage

preProcSample(
  rcmat,
  ndepth = 35,
  het.thresh = 0.25,
  snp.nbhd = 250,
  cval = 25,
  deltaCN = 0,
  gbuild = c("hg19", "hg38", "hg18", "mm9", "mm10"),
  hetscale = TRUE,
  unmatched = FALSE,
  MandUnormal = FALSE,
  ndepthmax = 5000,
  spanT = 0.2,
  spanA = 0.2,
  spanX = 0.2,
  donorCounts = NULL
)

Arguments

`rcmat`	data frame with 6 required columns: Chrom, Pos, NOR.DP, NOR.RD, TUM.DP and TUM.RD. Additional variables are ignored. Ref and Alt columns required for transplant cases with option donorCounts.
`ndepth`	minimum normal sample depth to keep
`het.thresh`	vaf threshold to call a SNP heterozygous
`snp.nbhd`	window size
`cval`	critical value for segmentation
`deltaCN`	minimum detectable difference in CN from diploid state
`gbuild`	genome build used for the alignment of the genome. Default value is human genome build hg19. Other possibilities are hg38 & hg18 for human and mm9 & mm10 for mouse. Chromosomes used for analysis are 1-22, X for humans and 1-19 for mouse. Option udef can be used to analyze other genomes.
`hetscale`	(logical) variable to indicate if logOR should get more weight in the test statistics for segmentation and clustering. Usually only 10 % of snps are hets and hetscale gives the logOR contribution to T-square as 0.25/proportion of hets.
`unmatched`	indicator of whether the normal sample is unmatched. When this is TRUE hets are called using tumor reads only and logOR calculations are different. Use het.thresh = 0.1 or lower when TRUE.
`MandUnormal`	analyzing both matched and unmatched normal for log ratio normalization
`ndepthmax`	loci for which normal coverage exceeds this number (default is 1000) will be discarded as PCR duplicates. Fof high coverage sample increase this and ndepth commensurately.
`spanT`	span value tumor
`spanA`	span value autosomes
`spanX`	span value X
`donorCounts`	snp read count matrix for donor sample(s). Required columns: Chromosome Position Ref Alt and for each donor sample,i: RefDonoriR RefDonoriA RefDonoriE RefDonoriD RefDonoriDP

Details

The SNPs in a genome are not evenly spaced. Some regions have multiple SNPs in a small neighborhood. Thus using all loci will induce serial correlation in the data. To avoid it we sample loci such that only a single locus is used in an interval of length snp.nbhd. So in order to get reproducible results use set.seed to fix the random number generator seed.

Value

`pmat`	Read counts and other elements of all the loci
`seg.tree`	a list of matrices one for each chromosome. the matrix gives the tree structure of the splits. each row corresponds to a segment with the parent row as the first element the start-1 and end index of each segment and the maximal T^2 statistic. the first row is the whole chromosome and its parent row is by definition 0.
`jointseg`	The data that were segmented. Only the loci that were sampled within a snp.nbhd are present. segment results given.
`hscl`	scaling factor for logOR data.