Model to detect and quantify mosaic aneuploidy

Share:

Description

Take in the heterozygous sites and coverage information, use different models (normal, monosomy, mitotic trisomy, meiotic trisomy, loss of heterozygosity) to fit the data, and select the model fit the data best according to BIC value and return estimation of the fraction of aneuploid cells.

Usage

1
2
3
runMadSeq(hetero, coverage, target_chr, adapt = 10000, burnin = 10000,
  nChain = 2, nStep = 10000, thinSteps = 2, checkConvergence = FALSE,
  plot = TRUE)

Arguments

hetero

A character specify the location of processed heterozygous table returned by prepareHetero function, or A GRanges object returned by prepareHetero function

coverage

A character specify the location of normalized coverage table returned by normalizeCoverage function, or A GRanges object from the GRangesList returned by normalizeCoverage function. Look up your sample by names(GRangesList), and subset your the normalized coverage for your sample by GRangesList["sample_name"]. For more details, please check the example.

target_chr

A character specify the chromosome number you want to detect. Note: Please check your assembly, use contig name "chr1" or "1" accordingly.

adapt

A integer indicate the adaption steps for the MCMC sampling. Default: 10000

burnin

A integer indicate burnin steps for the MCMC sampling. Default: 10000. If the posterior distribution is not converged, increasing burnin steps can be helpful.

nChain

A integer indicate the number of chains for the MCMC sampling. Default: 2. Note: More than 1 chain is required if checkConvergence is set to TRUE.

nStep

A integer indicate the number of steps to be recorded for the MCMC sampling. Default: 10000. Generally, the more steps you record, the more accurate the estimation is.

thinSteps

A integer indicate the number of steps to "thin" (thinSteps=1) means save everystep. Default: 2.

checkConvergence

A Boolean indicate whether to check the convergence of independent MCMC chains. If your data is not converged, you may increase adaption step and burnin step. Default: FALSE

plot

A Boolean. If TRUE, the alternative allele frequency (AAF) for each heterozygous site along the target chromosome will be plotted.

Value

An S4 object of class MadSeq containing the posterior distribution for the selected model, and deltaBIC between five models.

Note

1.If you didn't write normalized coverage into file, please subset the normalized coverage GRanges object from the GRangesList object returned from the normalizeCoverage funciton.
2. When specify target_chr, please make sure it consist with the contig names in your sequencing data, example: "chr1" and "1".
3. If checkConvergence set to TRUE, the nChain has to be >2
4. If it shows that your chains are not converged, helpful options are increasing the adapt and burnin steps.
5. Because the model is an MCMC sampling process, it can take a very long time to finish. Running in the background or HPC is recommended.

Author(s)

Yu Kong

References

Martyn Plummer (2016). rjags: Bayesian Graphical Models using MCMC. R package version 4-6.
https://CRAN.R-project.org/package=rjags

See Also

MadSeq, plotMadSeq, plotFraction, plotMixture

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## ------------------------------------------------------------------------
## The following example is for the case that normalized coverage and 
## processed heterozygous sites have not been written to files. For more 
## examples, please check the documentation.
## ------------------------------------------------------------------------
##------Prepare Heterozygous Sites
## specify the path to the vcf.gz file for the aneuploidy sample
aneuploidy_vcf = system.file("extdata","aneuploidy.vcf.gz",package="MADSEQ")
## specify the path to the bed file containing targeted region
target = system.file("extdata","target.bed",package="MADSEQ")
## prepare heterozygous sites
aneuploidy_hetero = prepareHetero(aneuploidy_vcf,target, writeToFile=FALSE)

##------Prepare Normalized Coverage
## specify the path to the bam file
aneuploidy_bam = system.file("extdata","aneuploidy.bam",package="MADSEQ")
normal_bam = system.file("extdata","normal.bam",package="MADSEQ")

## prepare coverage data for the samples
aneuploidy_cov_gc = prepareCoverageGC(target,aneuploidy_bam,"hg19")
normal_cov_gc = prepareCoverageGC(target,normal_bam,"hg19")

## normalize the coverage
normed = normalizeCoverage(aneuploidy_cov_gc,
                           control=normal_cov_gc,writeToFile=FALSE)
                           
##------subset normalized coverage GRanges object
aneuploidy_normed_cov = normed[["aneuploidy_cov_gc"]]
## check chromosome18
## (to speed up the example, we only run one chain and less steps here, 
##  but default settings are recommended in real case)
aneuploidy_chr18 = runMadSeq(aneuploidy_hetero, aneuploidy_normed_cov,
                             target_chr="chr18", adapt=100, burnin=200,
                             nChain =1, nStep = 1000, thinSteps=1)