run_gatk: Run GATK

View source: R/run_gatk.R

run_gatkR Documentation

Run GATK

Description

Runs the GATK suite of programs.

Usage

run_gatk(
  command = NULL,
  input = NULL,
  output = NULL,
  reference = NULL,
  normal.sample = NULL,
  intervals = NULL,
  known.sites = NULL,
  sample.map = NULL,
  erc = NULL,
  temp = NULL,
  batch = NULL,
  threads = NULL,
  database = NULL,
  bqsr = NULL,
  vqsr = NULL,
  tranches = NULL,
  resources = NULL,
  annotations = NULL,
  mode = NULL,
  tranches.file = NULL,
  sensitivity.filter = NULL,
  variant.index = TRUE,
  variant.type = NULL,
  select.method = NULL,
  filter.expression = NULL,
  filter.name = NULL,
  parallel = FALSE,
  cores = 4,
  execute = TRUE,
  gatk = NULL
)

Arguments

command

GATK command to run, required

input

List of sorted bam files, required

output

List of output files or empty/non-existant directory

reference

Path to the fasta formatted reference

normal.sample

sample name of normal

intervals

Path to the intrvals list file, usually the exome coordiantes file

known.sites

List of paths to the files containing known polymorphic sites

sample.map

Path to tab seperated file mapping sample names to the gvcf file

erc

Mode for emitting reference confidence scores, can be "NONE", "BP_RESOLUTION" and "GVCF"

temp

Path to temporary directory

batch

Batch size fior number of readers open at once,GenomicsDBImport only

threads

Number of threads for opening VCFs in batches

database

Name of the database directory

bqsr

List of base quality recalibration files

vqsr

Varian recalibration file

tranches

List of levels of truth sensitivity at which to slice the data

resources

Pre defined list of sites for which to apply a prior probability of being correct

annotations

List ofnames of the annotations to be used for calculations

mode

Recalibration mode to employ, SNP or INDEL

tranches.file

The input tranches file describing where to cut the data, from VariantRecalibrator

sensitivity.filter

The truth sensitivity level at which to start filtering

variant.index

Create a VCF index when writing a coordinate-sorted VCF file, boolean, default set to TRUE

variant.type

Variant type to include in output, SNP or INDEL.

select.method

Method to select filtered vaiants, to select only variant that pass all filters use 'vc.isNotFiltered()'

filter.expression

String of filters and values

filter.name

Name to identify the filtered variants

parallel

Run in parallel, default set to FALSE

cores

Number of cores/threads to use for parallel processing, default set to 4

execute

Whether to execute the commands or not, default set to TRUE

gatk

Path to the GATK suit of programs, required

Value

List of GATK commands

Examples


## Not run: 
known.site.list <- c("Mills_and_1000G_gold_standard.indels.GRCh38.vcf.gz",
                      "1000G_phase1.snps.high_confidence.GRCh38.vcf.gz")

recalibration.files <- gsub(".bam","_recal_data.table",rg.bam.files)
command <-  "BaseRecalibrator"
fasta <- Path the genome fats
# BaseRecalibration
base.recalibration.cmds <- run_gatk(command = command,
                                    input = rg.bam.files,
                                    output = recalibration.files,
                                    reference = fasta,
                                    intervals = intervals.file,
                                    known.sites = known.site.list,
                                    execute = TRUE,
                                    gatk = gatk)

## End(Not run)


GrahamHamilton/pipelineTools documentation built on March 5, 2024, 12:23 p.m.