R/somaticWgsAnalysis.R

Defines functions somaticWgsAnalysis

Documented in somaticWgsAnalysis

#' somaticWgsAnalysis
#'
#' @description
#' somaticWgsAnalysis pipeline identifies somatic variants within hole genome sequencing (WGS) data.
#' The first pipeline starts with a reference alignment step followed by co-cleaning to increase the alignment quality.
#' Six different variant calling pipelines are then implemented separately to identify somatic mutations.
#' \itemize{
#'   \item \href{https://bioinformatics.mdanderson.org/public-software/muse/}{MuSE}
#'   \item \href{https://gatk.broadinstitute.org/hc/en-us/articles/360035531132--How-to-Call-somatic-mutations-using-GATK4-Mutect2}{MuTect2}
#'   \item \href{http://varscan.sourceforge.net/}{VarScan2}
#'   \item \href{https://github.com/genome/somatic-sniper}{SomaticSniper}
#'   \item \href{http://gmt.genome.wustl.edu/packages/pindel/}{Pindel}
#'   \item \href{https://github.com/Illumina/strelka}{Strelka2}
#'   \item \href{https://github.com/Illumina/manta}{Manta}
#' }
#'
#' Somatic-caller-identified variants are then annotated. Annotated VCF are converted into MAF file finally.
#'
#' @param tumor_file Tumor bam to file to perform the variant calling.
#' @param normal_file Normal bam to file to perform the variant calling.
#' @param threads Number of threads to use in the analysis.
#' @param ref Path for the reference genome to use for the alignment (fasta format) and the corresponding indexes
#' generated with bwa index and a dictionary index file generated by  CreateSequenceDictionary gatk tool.
#' @param out_path Path where the output of the analysis will be saved.
#' @param muse Path of MuSE binary.
#' @param gatk4 Path of GATK4 binary.
#' @param samtools_mpileup Path of samtools mpileup binary.
#' @param af_only_gnomad Genome aggregation database used as a germline resource. Have to be base on the same reference genome
#' as `ref`. \href{https://gnomad.broadinstitute.org/downloads}{gnomAD}
#' @param somatic_sniper Path of SomaticSniper binary.
#' @param sambamba Path of sambamba binary.
#' @param bwa Path of bwa binary.
#' @param samblaster Path of samblaster binary.
#' @param samtools Path of samtools binary.
#' @param sambamba Path of sambamba binary
#' @param indel_candidates For the somatic workflow, the best-practice recommendation is to run the Manta SV and indel caller on the same set of samples first,
#'  then supply Manta's candidate indels as input to Strelka. Defined sample name have to be the same in both, Strelka2 and manta for
#'  a correct detection of the indel candidates file.
#' @param centromeres_telomeres Bed file with the centromers and/or telomeres base on the same reference genome as `ref`.
#' @param varscan Path of Varscan2 binary.
#' @param manta Path of manta binary.
#' @param strelka2 Path of strelka2 binary.
#' @param pindel Path of Pindel binary.
#' @param perl Path of perl executable.
#' @param fastq Fastq file to carry the analysis. If paried-end type, `input_file` have to contain mate 1s and different pairs should
#'  be named "_R1" or "_R2". Allowed formats: fastq.gz, fq.gz, fastq, fq or bam.
#' @param python_radia Path to the python binary with all the RADIA \href{https://github.com/aradenbaugh/radia}{prerequisites}.
#' @param tumor_vcf_id Id of the tumor sample in vcf. By default 'TUMOR'.
#' @param bam Bam file to carry the analysis.

#'
#' @export
somaticWgsAnalysis <- function(){}
# library(devtools)
# devtools::load_all('/imppc/labs/lplab/share/marc/repos/ergWgsTools')
# input_file <- '/media/msubirana/IGTP20228/insulinomas/processed/hg38/bam/bwa/NET-10_TI_mkdup_sub_0005.bam'
# ref <- '/imppc/labs/lplab/share/marc/refgen/hg38/hg38.fa'
# threads <- 3
# out_path <- '/imppc/labs/lplab/share/marc/repos/ergWgsTools/proves/processed/bam'
#
# bwaAlignment(input_file,
#              ref,
#              out_path,
#              threads)
#
tumor_file <- '/imppc/labs/lplab/share/marc/repos/ergWgsTools/proves/processed/bam/NET-10_TI_mkdup_sub_0005.bam'
normal_file <- '/imppc/labs/lplab/share/marc/repos/ergWgsTools/proves/processed/bam/NET-10_BL_mkdup_sub_0005.bam'
sample_name <- 'NET-10'
ref <- '/imppc/labs/lplab/share/marc/refgen/hg38/hg38.fa'
threads <- 6
out_path <- '/imppc/labs/lplab/share/marc/repos/ergWgsTools/proves/processed/vcf'
#
# variantCalling(tumor_file = tumor_file,
#                normal_file = normal_file,
#                sample_name = sample_name,
#                ref = ref,
#                out_path = out_path,
#                threads = threads)
msubirana/ergWgsTools documentation built on June 8, 2020, 8:07 a.m.