README.md

gdiPipeline

R package to calculate the genealogical divergence index (GDI) for species delimitation analyses using BPP. Requires a maximally-split guide tree. GDI described in Jackson et al. 2017.

Tested with BPP v.4.1.4

To download and install this package

install.packages("devtools")
library(devtools)
install_github("dmacguigan/gdiPipeline")
library(gdiPipeline)

Example script

# GDI pileine with Percina kusha example data
# https://github.com/dmacguigan/gdiPipeline/tree/master/example

library(gdiPipeline)

###### PIPELINE PARAMETERS ######

###### general parameters

wd="./gdiPipeline/example"
plotColors = c(brewer.pal(12, "Paired"), "black") # for later plots of GDI
nLoci = 10
threads = 8
nreps = 4

###### input files 

# maximally split guide tree
# newick format, include semicolon after tree
treefile="Pkus.tree" 

# list of prior combinations
# one line per set of priors
# each line should contain: tau_alpha tau_beta theta_alpha theta_beta
priors="priors.txt" 

###### BPP-specific input files
# see the BPP user manual for details on how each file should be formatted
# https://github.com/bpp/bpp

map="Pkus.Imap.txt" 
heredity = "heredity.txt"
loci = "m100p_10loci.phy.txt"

# name for BPP template control file, this will be created in step 1
ctl = "ctlTemplate.ctl"

###### PIPELINE STEPS ######

# STEP 1
# create BPP control file template for bppInputs function
BPPCtlTemplate(wd)

# STEP 2
# Using a text editor, modify parameters like "burnin", "samplefreq", and/or "nsample" 
# in the ctlTemplate.ctl file if necessary before proceeding. You may also want to 
# specify "finetune" values based on preliminary runs if mixing/convergence is an issue.

# STEP 3
# create directories and files for BPP runs
# specify priors and maximally split tree
bppInputs(wd, treefile, map,
          priors,
          heredity, loci,
          ctl, nLoci,
          threads, nreps)

# STEP 4
# create task file, each line with a different command to run BPP
bppTaskFile(wd)

# STEP 5
# run the task file generated by bppTaskFile
# ideally parallelize using a job array setup
# for example: https://docs.ycrc.yale.edu/clusters-at-yale/job-scheduling/dsq/
# you may need to change the line endings of "BPPTaskFile.txt" to Unix
# if you ran step 4 on a Windows machine
# and plan to use a Linux machine for running BPP

# STEP 6
# assess mixing and convergence of BPP analyses
# diagnostic files are written in each model directory
# this may take a while to run depending on the number of analyses and the MCMC chain lengths
burnin = 0.1 # proportion of MCMC generations to discard as burnin
checkConvergence(wd=wd, nreps=nreps, burnin=burnin)

# STEP 7
# summarize the bpp results
# caluclate GDI
# plot GDI for each replicate run and binning replicate runs
burnin = 0.1 # proportion of MCMC generations to discard as burnin
gdi <- bppSummarizeGDI(wd, plotColors, nreps, burnin)

# plot GDI estimates for all species in one figure
plotByPrior(gdi, wd, nreps, priors, plotWidth=8, plotHeight=5)



dmacguigan/gdiPipeline documentation built on Oct. 31, 2023, 10:30 a.m.