README.md

facetsSuite

lifecycle Travis build status Coverage status

See the release notes for information on the new facet-suite version. Backwards compatibility is currently limited, as documented here.

facetsSuite is an R package with functions to run FACETS—an allele-specific copy-number caller for paired tumor-normal DNA-sequencing data from genome-wide and targeted assays. facetSuite both wraps the code to execute the FACETS algorithm itself as well as performs post-hoc analyses on the resulting data. This package was developed by members of the Taylor lab and the Computational Sciences group within the Center for Molecular Oncology at Memorial Sloan Kettering Cancer Center.

Installation

You can install facetsSuite in R from this repository with:

devtools::install_github("mskcc/facets-suite")

Also follow the instructions for installing FACETS.

Note: For the wrapper script snp-pileup-wrapper.R you need to specify the variable snp_pileup_path in the script to point to the installation path of snp-pileup or set the environment variable SNP_PILEUP. Alternatively, the docker image contains the executable.

Usage

R functions

The R functions in this package are documented and their description and usage is available in R by doing:

?facetsSuite::function_name

Central to most functionality in the package is the output from the run_facets, which runs the FACETS algorithm based on provided tumor-normal SNP pileup (i.e. genotyping). The output is a list object with the following named objects: - snps: SNPs used for copy-number segmentation, where het==1 indicates heterozygous loci. - segs: Inferred copy-number segmentation. – purity: Inferred sample purity, i.e. fraction of tumor cells of the total cellular population. - ploidy: Inferred sample ploidy. - diplogr: Inferred dipLogR, the sample-specific baseline corresponding to the diploid state. - alballogr: Alternative dipLogR value(s) at which a balanced solution was found. - flags: Warning flags from the naïve segmentation algorithm. - em_flags: Warning flags from the expectation-maximization segmentation algorithm. - loglik: Log-likelihood value of the fitted model.

Note that FACETS performs segmentation with two algorithms, the "naïve" base method and an expectation-maximization algorithm. The latter (columns suffixed .em) is used as a default for most of the functions in this package.

Wrapper scripts

Most use of this package can be done from the command line using three wrapper scripts: - snp-pileup-wrapper.R:\ This wraps the snp-pileup C++ script that genotypes sites across the genome in both normal and tumor samples. The output from this is the input to FACETS. Most default input arguments are appropriate regardless of usage, but --max-depth may need adjustment depending on the overall depth of the samples used.\ Example command: shell snp-pileup-wrapper.R \ --snp-pileup-path <path to snp-pileup executable> \ --vcf-file <path to SNP VCF> \ --normal-bam normal.bam \ --tumor-bam tumor.bam \ --output-prefix <prefix for output file, preferrably tumorSample__normalSample> The input VCF file should contain polymorphic SNPs, so that FACETS can infer changes in allelic configuration at genomic loci from changes in allele ratios. dbSNP is a good source for this. By default, snp-pileup also estimates the read depth in the input BAM files every 50th base.

All three wrappers use argparse for argument handling and can thus be run with --help to see the all input arguments.

Run wrappers from container

In order to run the containerized versions of the wrapper scripts, first pull the docker image:

## Docker
docker pull philipjonsson/facets-suite:dev

## Singularity
singularity pull --name facets-suite-dev.img docker://philipjonsson/facets-suite:dev

Then run either of the scripts as such:

## Docker
docker run -it -v $PWD:/work philipjonsson/facets-suite:dev run-facets-wrapper.R \
    --counts-file work/SampleA.snp_pileup.gz \
    --sample-id SampleA \
    --directory work

## Singularity
singularity run facets-suite-dev.img run-facets-wrapper.R \
    --counts-file SampleA.snp_pileup.gz \
    --sample-id SampleA

For Docker, note the binding (-v) of the current directory on the host to the directory named work inside the container. This is required for the input file, in the current directory, to be accessible inside the container. This, in its turn requires the output to be written to work inside the container so that it is available on the host once the script has executed. Singularity always mounts the directory from which it is being executed.

The image contains the snp-pileup executable used by snp-pileup-wrapper.R, so it can be run without specifying its path. Example for Singularity:

singularity run -B <path to BAMs> -B <path to VCF> facets-suite-dev.img snp-pileup-wrapper.R \
    --vcf-file <path to VCF>/dbsnp.vcf \
    --normal-bam <path to BAMs>/NormalA.bam \
    --tumor-bam <path to BAMs>/TumorA.bam \
    --output-prefix TumorA__NormalA

Note: The binding of full paths to any files outside of the run directory is necessary.



mskcc/facets-suite documentation built on Sept. 13, 2022, 4:14 a.m.