preProcSample: Pre-process a sample

View source: R/facets-wrapper.R

preProcSampleR Documentation

Pre-process a sample

Description

Processes a snp read count matrix and generates a segmentation tree

Usage

preProcSample(
  rcmat,
  ndepth = 35,
  het.thresh = 0.25,
  snp.nbhd = 250,
  cval = 25,
  deltaCN = 0,
  gbuild = c("hg19", "hg38", "hg18", "mm9", "mm10"),
  hetscale = TRUE,
  unmatched = FALSE,
  MandUnormal = FALSE,
  ndepthmax = 5000,
  spanT = 0.2,
  spanA = 0.2,
  spanX = 0.2,
  donorCounts = NULL
)

Arguments

rcmat

data frame with 6 required columns: Chrom, Pos, NOR.DP, NOR.RD, TUM.DP and TUM.RD. Additional variables are ignored. Ref and Alt columns required for transplant cases with option donorCounts.

ndepth

minimum normal sample depth to keep

het.thresh

vaf threshold to call a SNP heterozygous

snp.nbhd

window size

cval

critical value for segmentation

deltaCN

minimum detectable difference in CN from diploid state

gbuild

genome build used for the alignment of the genome. Default value is human genome build hg19. Other possibilities are hg38 & hg18 for human and mm9 & mm10 for mouse. Chromosomes used for analysis are 1-22, X for humans and 1-19 for mouse. Option udef can be used to analyze other genomes.

hetscale

(logical) variable to indicate if logOR should get more weight in the test statistics for segmentation and clustering. Usually only 10 % of snps are hets and hetscale gives the logOR contribution to T-square as 0.25/proportion of hets.

unmatched

indicator of whether the normal sample is unmatched. When this is TRUE hets are called using tumor reads only and logOR calculations are different. Use het.thresh = 0.1 or lower when TRUE.

MandUnormal

analyzing both matched and unmatched normal for log ratio normalization

ndepthmax

loci for which normal coverage exceeds this number (default is 1000) will be discarded as PCR duplicates. Fof high coverage sample increase this and ndepth commensurately.

spanT

span value tumor

spanA

span value autosomes

spanX

span value X

donorCounts

snp read count matrix for donor sample(s). Required columns: Chromosome Position Ref Alt and for each donor sample,i: RefDonoriR RefDonoriA RefDonoriE RefDonoriD RefDonoriDP

Details

The SNPs in a genome are not evenly spaced. Some regions have multiple SNPs in a small neighborhood. Thus using all loci will induce serial correlation in the data. To avoid it we sample loci such that only a single locus is used in an interval of length snp.nbhd. So in order to get reproducible results use set.seed to fix the random number generator seed.

Value

pmat

Read counts and other elements of all the loci

seg.tree

a list of matrices one for each chromosome. the matrix gives the tree structure of the splits. each row corresponds to a segment with the parent row as the first element the start-1 and end index of each segment and the maximal T^2 statistic. the first row is the whole chromosome and its parent row is by definition 0.

jointseg

The data that were segmented. Only the loci that were sampled within a snp.nbhd are present. segment results given.

hscl

scaling factor for logOR data.


rptashkin/facets2n documentation built on May 11, 2022, 1:34 p.m.