README.md

parallelnewhybrid

Travis-CI Build Status AppVeyor Build Status DOI

parallelnewhybrid is an R package designed to parallelize NewHybrids analyses

Please ensure that you have the correct version of NewHybrids installed. The source code and instructions for installation can be found at (https://github.com/eriqande/newhybrids)

Package Installation

parallelenewhybrid can be installed from GitHub using the R package devtools and the function install_github:

devtools::install_github("bwringe/parallelnewhybrid")

NOTE: : parallelnewhybrid relies on functions from the R packages parallel, plyr, stringdist, stringr, and tidyr. The user should ensure these are installed from CRAN prior to installing parallelnewhybrid.

if (!require("pacman")) install.packages("pacman")
pacman::p_load(parallel, plyr, stringr, tidyr)

Function descriptions

parallelnh_xx.R

Allows NewHybrids (Anderson and Thompson 2002) to be run in parallel. A job (NewHybrids analysis) is assigned to each of the c cores available in the computer. As each task finishes, a new analysis is assigned to the idled core. parallelnewhybrid will attempt to analyze all NewHybrids format files in the folder specified by the user through the folder.data argument. Therefore, it is essential this folder contain only the files the user wishes to analyze, and optionally their associated individual file(s). The user can must also specify the length of the Markov chain Monte Carlo (MCMC) burn-in and subsequent run length using the burnin and sweeps parameters. NOTE: There are three operating system-specific versions of the parallelnh_xx function because of the different ways in which the operating systems handle forking of processes.

parallelnh version|Operating system ------------|---------- parallelnh_OSX | OS X parallelnh_WIN | Windows parallelnh_LINUX | Linux (Ubuntu)

How to use:

Example datasets:

Example datasets have been provided as R images (.rda files). These can be loaded into your workspace using the data() function.

Example dataset | Contents ------------|--------------------------------------------------------------- SimPops_S1R1_NH | A NewHybrids format file. To analyze this file using the function parallelnh_xx, save it with the extension ".txt" to an empty folder on your hard drive, then provide parallelnh_xx with the file path to the folder. To run in parallel, after saving the file, copy it and give the copies unique names. parallelnh_xx will attempt to analyze all files which do not contain "individuals.txt" within the file name, so it is essential that only NewHybrids formatted files, and their associated individual files be present in the folder provided to parallelnh_xx. SimPops_S1R1_NH_individuals | The individual file associated with SimPops_S1R1_NH. A single copy of this file should be saved to the same folder in which SimPops_S1R1_NH is saved. The filename must end in "individuals.txt".

parallelnh_xx

Parameter | Description ------------|--------------------------------------------------------------- folder.data| A file path to the folder in which the NewHybrids formatted files to be analyzed, and their associated individual file reside. where.NH | A file path to the NewHybrids installation folder. NOTE: The name of this folder must be named "newhybrids". If it is named anything else the function will fail. burnin | An integer specifying how many burn-in steps NewHybrids is to run sweeps | An integer specifying how many sweep steps NewHybrids is to run


### ANALYSIS OF EXAMPLE DATA

## Get the file path to the working directory, will be used to allow a universal example
path.hold <- getwd()

## Get the individual file included along with the parallelnewhybrid package and make it an object
sim_inds <- parallelnewhybrid::SimPops_S1R1_NH_individuals

## Get the genotype data file included along with the parallelnewhybrid package and make it an object
sim_data <- parallelnewhybrid::SimPops_S1R1_NH

## Gave the individual data to the working directory as a file called "SimPops_S1R1_NH_individuals.txt"
write.table(x = sim_inds, file = paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE)

## Save the genotype data to the working directory as a file called "SimPops_S1R1_NH.txt"
write.table(x = sim_data, file = paste0(path.hold, "/SimPops_S1R1_NH.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE)

## Create an empty folder within the working directory. Recall, parallelnewhybrids will analyze all files within the folder it is specified, but if there are files that are not NewHybrids format, or individual files, it will fail.
dir.create(paste0(path.hold, "/parallelnewhybrids example"))

## Copy the individual file to the new folder
file.copy(from = paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"), to = paste0(path.hold, "/parallelnewhybrids example"))

## Copy the genotype data file to the new folder
file.copy(from = paste0(path.hold, "/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example"))

## Create two copies of the genotype data file to act as technical replicates of the NewHybrids simulation based analysis. This will also serve demonstrate the parallel capabilities of parallelnewhybrid.
file.copy(from = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R2_NH.txt"))

file.copy(from = paste0(path.hold, "/parallelnewhybrids example/SimPops_S1R1_NH.txt"), to = paste0(path.hold, "/parallelnewhybrids example/SimPops_S2R3_NH.txt"))

## Clean up the working directory by deleting the two files
file.remove(paste0(path.hold, "/SimPops_S1R1_NH_individuals.txt"))

file.remove(paste0(path.hold, "/SimPops_S1R1_NH.txt"))

## Create an object that is the file path to the folder in which NewHybrids is installed. Note: this folder must be named "newhybrids"
your.NH <- "YOUR PATH/newhybrids/"

## Execute parallelnh. NOTE: "xx" must be replaced by the correct designation for your operating system. burnin and sweep values have been chosen for demonstration only.
parallelnh_xx(folder.data = paste0(path.hold, "/parallelnewhybrids example/"), where.NH = your.NH, burnin = 100, sweeps = 100)


## Clean up everything by deleting the example folder. Note: comment characters have been added to prevent this command being run accidently.
unlink(paste0(path.hold, "/parallelnewhybrids example/"), recursive = TRUE)


Important Notes:

parallelnewhybrid Contributors

Reference Anderson EC, Thompson EA. A model-based method for identifying species hybrids using multilocus genetic data. Genetics. Genetics Society of America; 2002;160: 1217-1229.

To cite the current version of parallelnewhybrid, please refer to our zenodo DOI: DOI



bwringe/parallelnewhybrid documentation built on May 13, 2019, 9:24 a.m.