workflow/inputs.md

HowTo UNMASC

Outline

flowchart LR

tbam{{tumor.bam}} --> nbam1{{normal_1.bam}} & nbam2{{normal_2.bam}} --> caller{{Variant Caller}}
tbam{{tumor.bam}} --> dots{{...}} & nbamZ{{normal_Z.bam}} --> caller
fasta{{reference.fasta}} --> caller
caller --> GATK{{GATK MergeVcfs}} --> vcf1{{vc_1.vcf}} & vcf2{{vc_2.vcf}} & dotsv{{... .vcf}} & vcfZ{{vc_Z.vcf}}
cosmic{{COSMIC.vcf}} & vcf1 & vcf2 & dotsv & vcfZ & fasta --> VEP --> anno{{anno.vcf}}
vcf1 & vcf2 & dotsv & vcfZ & anno & targ{{target.bed}} --> prepU{{"prep_UNMASC_VCF()"}}
centro{{centromere.bed}} & dictchrom{{dict_chrom.txt}} & prepU --> UNMASC{{"run_UNMASC()"}}

%% class definitions
classDef myred fill:#f44336,stroke:#f3f6f4,stroke-width:2px
classDef myblue fill:#19daf8,stroke:#f3f6f4,stroke-width:2px
classDef mygreen fill:#79d50d,stroke:#f3f6f4,stroke-width:2px
classDef mymagenta fill:#fc9ffc,stroke:#f3f6f4,stroke-width:2px
classDef myyellow fill:#f6fa13,stroke:#f3f6f4,stroke-width:2px
classDef myorange fill:#f89d3e,stroke:#f3f6f4,stroke-width:2px

%% assign classes to nodes
class tbam myred
class nbam1,nbam2,dots,nbamZ myblue
class fasta mygreen
class vcf1,vcf2,dotsv,vcfZ,anno mymagenta
class cosmic,targ,centro,dictchrom myyellow
class caller,GATK,VEP,prepU,UNMASC myorange

%% clickable nodes
click GATK "https://github.com/broadinstitute/gatk"
click cosmic "https://github.com/pllittle/UNMASC/blob/main/workflow/inputs.md#get-cosmic-vcf"
click VEP "https://uswest.ensembl.org/info/docs/tools/vep/index.html"

Setting Directories and Variables

Click to expand!

Below are fixed variables to specify.

gatk_dir=; [ -z "$gatk_dir" ] \
    && echo "Set gatk_dir, GATK directory" >&2 \
    && return 1

git_dir=; [ -z "$git_dir" ] \
    && echo "Set git_dir, location to store GitHub repos" >&2 \
    && return 1
[ ! -d $git_dir ] && mkdir $git_dir

stk2_dir=; [ -z "$stk2_dir" ] \
    && echo "Set stk2_dir, location of Strelka2 dir!" >&2 \
    && return 1

vep_dir=; [ -z "$vep_dir" ] \
    && echo "Set vep_dir, VEP directory" >&2 \
    && return 1

vep_rel=; [ -z "$vep_rel" ] \
    && echo "Set vep_rel, VEP release number, preferably 105 with GRCh37 and GRCh38 supported" >&2 \
    && return 1

vep_cache=; [ -z "$vep_cache" ] \
    && echo "Set vep_cache, VEP homo_sapiens cache, should be vep, refseq, or merged" >&2 \
    && return 1

hts_dir=; [ -z "$hts_dir" ] \
    && echo "Set hts_dir, the HTS directory" >&2 \
    && return 1

cosm_dir=; [ -z "$cosm_dir" ] \
    && echo "Set cosm_dir, directory containing COSMIC vcf" >&2 \
    && return 1

cosm_ver=; [ -z "$cosm_ver" ] \
    && echo "Set cosm_ver, COSMIC version number, e.g. 95, 101" >&2 \
    && return 1

cosm_muts=; [ -z "$cosm_muts" ] \
    && echo "Set cosm_muts, e.g. coding, noncoding, all" >&2 \
    && return 1

fasta_fn=; [ -z "$fasta_fn" ] \
    && echo "Set fasta_fn, the reference FASTA" >&2 \
    && return 1

nthreads=; [ -z "$nthreads" ] \
    && echo "Set nthreads, number of threads or cores" >&2 \
    && return 1

genome=; [ -z "$genome" ] \
    && echo "Set genome, like GRCh37 or GRCh38" >&2 \
    && return 1

Sample-specific Variables

out_dir=; [ -z "$out_dir" ] \
    && echo "Set out_dir, output directory" >&2 \
    && return 1

nbams=; [ -z "$nbams" ] \
    && echo "Set nbams, file listing all normal bam full paths" >&2 \
    && return 1
echo -e "Detected $(cat $nbams | wc -l) normal controls." >&2

tbam=; [ -z "$tbam" ] \
    && echo "Set tbam, tumor bam full path" >&2 \
    && return 1

Source scripts

Click to expand!

You are welcome to install your own programs and dependencies. I have provided below the steps and scripts I use to automate the programs related to UNMASC.

# Load dependencies

odir=$(pwd)

for repo in baSHic UNMASC somdna; do

    repo_dir="$git_dir/$repo"

    if [ ! -d "$repo_dir" ]; then
        cd "$git_dir"
        git clone https://github.com/pllittle/$repo.git >&2
        [ $? -eq 0 ] && continue
    else
        cd "$repo_dir"
        git pull >&2
        [ $? -eq 0 ] && continue
    fi
    echo -e "Error cloning/pulling $repo" >&2
    return 1

done

# Source functions

for fn in genomic callingAllVariants; do
    . "$git_dir/somdna/scripts/$fn.sh"
    [ $? -eq 0 ] && continue
    echo -e "Some error in sourcing somdna's $fn.sh" >&2
    return 1
done

. "$git_dir/UNMASC/workflow/tumor_only.sh"
[ ! $? -eq 0 ] && echo "Some error in sourcing tumor_only.sh" >&2 && return 1

cd "$odir"
unset odir repo repo_dir

Custom local installations

Click to expand!

One can control where these programs are installed by adding -a $apps_dir to the code below. apps_dir is just wherever you would like to install these programs.

install_gcc
install_libtool
install_perl
install_bzip2
install_xz
install_zlib
install_curl
install_expat
install_db
install_htslib
install_VEP -r $vep_rel
install_strelka2

If you rely on these functions for installations, these will determine hts_dir, stk2_dir, and vep_dir definitions.

Get COSMIC VCF

Click to expand!

Run the following code to obtain the appropriate COSMIC VCF. The function below will prompt the user to input COSMIC website's login and password. The generated file is needed for TO_workflow() below.

get_COSMIC_canonical -g $genome \
    -v $cosm_ver -h $hts_dir \
    -c $cosm_dir

The function definition is located here.

TO_workflow()

Click to expand!

Currently, TO_workflow is designed for Strelka2 and VEP. If others have alternate workflows, you are welcome to inspect this function's definitions to extract specific steps to execute :smile:.

TO_workflow -c $nthreads -f $fasta_fn -g $genome \
    -d $cosm_dir -e $cosm_ver -h $hts_dir -k $gatk_dir \
    -n $nbams -o $out_dir -s $stk2_dir -t $tbam \
    -v $vep_dir -r $vep_rel -a $vep_cache \
    --cosm_muts <coding,noncoding,all>

Preparing UNMASC's vcf

Click to expand!

Assuming UNMASC is successfully installed and the above TO_workflow() was run successfully, the instructions below describes the inputs to construct UNMASC's main input vcf. Otherwise, the user needs to construct vcf ensuring all required columns are available and formatted correctly (refer to ?UNMASC::run_UNMASC).

vcf = UNMASC::prep_UNMASC_VCF(
    outdir = outdir,
    DAT = DAT,
    FILTER = NULL,
    target_fn = target_fn,
    anno_fn = anno_fn,
    nlines = 100,
    ncores = 1)

Execution

Click to expand!

Arguments for run_UNMASC() with template inputs.

# Package main function
UNMASC::run_UNMASC(
    tumorID = tumorID,
    outdir = outdir,
    vcf = vcf,
    tBAM_fn = tBAM_fn,
    bed_centromere_fn = bed_centromere_fn,
    dict_chrom_fn = dict_chrom_fn,
    qscore_thres = qscore_thres,
    exac_thres = exac_thres,
    ad_thres = ad_thres,
    rd_thres = rd_thres,
    cut_BAF = cut_BAF,
    minBQ = minBQ,
    minMQ = minMQ,
    eps_thres = eps_thres,
    psi_thres = psi_thres,
    hg = hg,
    binom = binom,
    gender = gender,
    ncores = ncores)

Outputs

Click to expand!

Below are the main outputs from UNMASC to determine if all steps ran smoothly and to assess if the package or input files requires debugging. If any of these directories or files are missing, check the corresponding Rout file for any error or warning messages or flags for low quality sample information.



pllittle/UNMASC documentation built on June 1, 2025, 1 p.m.