README.md

Detect cross-contamination and source

Following scripts are used to run conta toolset:

First install conta library (outside conta folder, run): R CMD INSTALL --preclean --no-multiarch --with-keep.source conta

0) dbSNP file:

Full dbSNP file may be downloaded from: ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606_b151_GRCh37p13/VCF/common_all_20180423.vcf.gz

1) Run contamination analysis (scripts/conta_run.R):

A tsv or pileup file (containing allele counts for each SNP) is used as input, along with a dbSNP reference vcf file to call contamination events. The analysis reports contamination calls, levels, and plots. It will display cnv metrics and bincounts for Y chromosome if files are provided. Input files: - dbSNP file must contain CAF info field and rsid. - TSV files (two pileup formats are supported, see example inputs under test folders), must contain chr, pos, and counts for each allele

2) Run source detection (scripts/conta_find_source.R):

This mode requires a set of samples that were already run with the run with conta analysis. It will use the genotypes for each sample calculated by conta to find samples that have a likelihood higher than the general likelihood calculated with the population allele frqeuencies.

3) Genotype concordance (sample swap) analyses:

Samples that are sequenced from the same genetic donor should have the same genotypes across SNPs. Conta provides a genotype concordance function to assist in sample swap analyses. The output of the concordance function is a value between 0 and 1. Where concordance values close to 1 (above 0.7 in cases where one of the samples may be contaminated) are considered the same genetic donor.

Expand upon following code to perform pairwise genotype concordance analyses:

conta_gt1 <- load_conta_file("s3:/conta_runs/conta_1/conta_1.gt.tsv")
conta_gt2 <- load_conta_file("s3:/conta_runs/conta_2/conta_2.gt.tsv")
concordance <- genotype_concordance(conta_gt1, conta_gt2)

Output files:

Output format:

General Guidelines



grailbio/conta documentation built on March 9, 2020, 9:38 p.m.