Description Usage Arguments Details Value Examples
GATK Best Practices: recommended workflows for variant discovery analysis.
1 2 3 4 5 6 7 8 9 | run_GATK(inputdf, runbwa = TRUE, markDup = TRUE, addRG = FALSE,
rungatk = FALSE,
ref.fa = "~/dbcenter/Ecoli/reference/Ecoli_k12_MG1655.fasta",
gatkpwd = "$HOME/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar",
picardpwd = "$HOME/bin/picard-tools-2.1.1/picard.jar", minscore = 5,
realignInDels = FALSE, indels.vcf = "indels.vcf",
recalBases = FALSE, dbsnp.vcf = "dbsnp.vcf", shbase = NULL,
jobid = "runarray", email = NULL, runinfo = c(FALSE, "batch", 1,
"1.5", "10:00:00"))
|
inputdf |
An input data.frame for fastq files. Must contains fq1, fq2, out (and/or bam). If inputdf contained bam, bwa alignment will be escaped. Additional columns: group (group id), sample (sample id), PL (platform, i.e. illumina), LB (library id), PU (unit, i.e. unit1). These strings (or info) will pass to BWA mem through -R. |
runbwa |
Set up BWA-mem, default=TRUE. |
markDup |
Mark Duplicates, default=TRUE. |
addRG |
Add or replace Read Groups using Picard AddOrReplaceReadGroups, default=FALSE. |
rungatk |
Setup GATK, default=FALSE. |
ref.fa |
The full path of genome with bwa indexed reference fasta file. |
gatkpwd |
The absolute path of GenomeAnalysisTK.jar. |
picardpwd |
The absolute path of picard.jar. |
minscore |
Minimum score to output, default=5, [bwa 30]. It will pass to bwa mem -T INT. |
realignInDels |
Realign Indels, default=FALSE. IF TRUE, a golden indel.vcf file should be provided. |
indels.vcf |
The full path of indels.vcf. |
recalBases |
Recalibrate Bases, default=FALSE. IF TRUE, a golden snps.vcf file should be provided. |
dbsnp.vcf |
The full path of dbsnp.vcf. |
shbase |
Base for the shell id, i.e. "slurm-script/run_gatk_". [chr] |
jobid |
Job ID, default="runarray". [chr] |
email |
Your email address that farm will email to once the jobs were done/failed. |
runinfo |
Parameters specify the array job partition information.
A vector of c(FALSE, "bigmemh", "1"): 1) run or not, default=FALSE
2) -p partition name, default=bigmemh and 3) –cpus, default=1. 4) mem, default=1.5, in Gb.
It will pass to |
see more detail about GATK: https://www.broadinstitute.org/gatk/guide/bp_step.php?p=1
idxing: bwa index Zea_mays.AGPv2.14.dna.toplevel.fa
module load java/1.8 module load bwa/0.7.9a
local programs: bwa Version: 0.7.5a-r405 picard-tools-2.1.1 GenomeAnalysisTK-3.5/
return a batch of shell scripts.
1 2 3 4 5 6 7 8 9 10 11 12 | inputdf <- data.frame(fq1="fq_1.fq", fq2="f1_2.fq", out="mysample",
group="g1", sample="s1", PL="illumina", LB="lib1", PU="unit1")
run_GATK(inputdf, runbwa=TRUE, markDup=TRUE, addRG=FALSE,rungatk=FALSE,
ref.fa="~/dbcenter/Ecoli/reference/Ecoli_k12_MG1655.fasta",
gatkpwd="$HOME/bin/GenomeAnalysisTK-3.5/GenomeAnalysisTK.jar",
picardpwd="$HOME/bin/picard-tools-2.1.1/picard.jar",
minscore=5,
realignInDels=FALSE, indels.vcf="indels.vcf",
recalBases=FALSE, dbsnp.vcf="dbsnp.vcf",
shbase=NULL, jobid="runarray",
email=NULL, runinfo = c(FALSE, "batch", 1, "1.5", "10:00:00"))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.