run_gatk | R Documentation |
Runs the GATK suite of programs.
run_gatk(
command = NULL,
input = NULL,
output = NULL,
reference = NULL,
normal.sample = NULL,
intervals = NULL,
known.sites = NULL,
sample.map = NULL,
erc = NULL,
temp = NULL,
batch = NULL,
threads = NULL,
database = NULL,
bqsr = NULL,
vqsr = NULL,
tranches = NULL,
resources = NULL,
annotations = NULL,
mode = NULL,
tranches.file = NULL,
sensitivity.filter = NULL,
variant.index = TRUE,
variant.type = NULL,
select.method = NULL,
filter.expression = NULL,
filter.name = NULL,
parallel = FALSE,
cores = 4,
execute = TRUE,
gatk = NULL
)
command |
GATK command to run, required |
input |
List of sorted bam files, required |
output |
List of output files or empty/non-existant directory |
reference |
Path to the fasta formatted reference |
normal.sample |
sample name of normal |
intervals |
Path to the intrvals list file, usually the exome coordiantes file |
known.sites |
List of paths to the files containing known polymorphic sites |
sample.map |
Path to tab seperated file mapping sample names to the gvcf file |
erc |
Mode for emitting reference confidence scores, can be "NONE", "BP_RESOLUTION" and "GVCF" |
temp |
Path to temporary directory |
batch |
Batch size fior number of readers open at once,GenomicsDBImport only |
threads |
Number of threads for opening VCFs in batches |
database |
Name of the database directory |
bqsr |
List of base quality recalibration files |
vqsr |
Varian recalibration file |
tranches |
List of levels of truth sensitivity at which to slice the data |
resources |
Pre defined list of sites for which to apply a prior probability of being correct |
annotations |
List ofnames of the annotations to be used for calculations |
mode |
Recalibration mode to employ, SNP or INDEL |
tranches.file |
The input tranches file describing where to cut the data, from VariantRecalibrator |
sensitivity.filter |
The truth sensitivity level at which to start filtering |
variant.index |
Create a VCF index when writing a coordinate-sorted VCF file, boolean, default set to TRUE |
variant.type |
Variant type to include in output, SNP or INDEL. |
select.method |
Method to select filtered vaiants, to select only variant that pass all filters use 'vc.isNotFiltered()' |
filter.expression |
String of filters and values |
filter.name |
Name to identify the filtered variants |
parallel |
Run in parallel, default set to FALSE |
cores |
Number of cores/threads to use for parallel processing, default set to 4 |
execute |
Whether to execute the commands or not, default set to TRUE |
gatk |
Path to the GATK suit of programs, required |
List of GATK commands
## Not run:
known.site.list <- c("Mills_and_1000G_gold_standard.indels.GRCh38.vcf.gz",
"1000G_phase1.snps.high_confidence.GRCh38.vcf.gz")
recalibration.files <- gsub(".bam","_recal_data.table",rg.bam.files)
command <- "BaseRecalibrator"
fasta <- Path the genome fats
# BaseRecalibration
base.recalibration.cmds <- run_gatk(command = command,
input = rg.bam.files,
output = recalibration.files,
reference = fasta,
intervals = intervals.file,
known.sites = known.site.list,
execute = TRUE,
gatk = gatk)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.