knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) devtools::install_github("IDEELResearch/vcfR2plsmdmcalc") library(vcfR2plsmdmcalc) data("pfcross_subset", package = "vcfR2plsmdmcalc") pfcross_subset data("pf3k_subset", package = "vcfR2plsmdmcalc")
This vignette provides basic functionality for the function block_jackknife. Of note, this function is largely redundant with the block_jackknife function in the [vcfdo](https://github.com/IDEELResearch/vcfdo) package.
The expectation is that this function will be used to iterate through samples in a vcf. As a result, each sample is a vector. As a result, we can capitalize on vectorization with R and use the apply suite of functions. For example, below we calculate within-sample He.
gtmat <- vcfR::extract.gt(pfcross_subset, element = "GT") gtmat[gtmat == "0/0"] <- 0 gtmat[gtmat == "0/1"] <- 0.5 gtmat[gtmat == "1/1"] <- 1 gtmat <- apply(gtmat, 2, as.numeric) HeCalc <- function(gt){ 1 - ((sum(gt == 0)/sum(!is.na(gt)))^2 + (sum(gt == 1)/sum(!is.na(gt)))^2) } He <- apply(gtmat, 2, block_jackknife, statistic = HeCalc, block_size = 6)
THEREALMcCOIL ModelThis function makes use of the McCOILR package by OJ Watson which is a wrapper for the Hsiao-Han Chang's THEREALMcCOIL package (PMC5300274). However, given this structure, we cannot use the block_jackknife structure above which expects numeric or integer vectors as the input. Instead, we pass a dataset that is ready for a THEREALMcCOIL analysis (see here for file structure) and perform jackknife re-sampling on the loci. In each jackknife-replication, a specific value of MOI is drawn from the posterior distribution of the THEREALMcCOIL MCMC sampling phase output.
Below we explore this function.
gtmat <- vcfR::extract.gt(pf3k_subset, element = "GT") gtmat <- gtmat[1:100,1:40] gtmat[is.na(gtmat)] <- -1 gtmat[gtmat == "0/0"] <- 0 gtmat[gtmat == "0/1"] <- 0.5 gtmat[gtmat == "1/1"] <- 1 gtmat <- t( apply(gtmat, 2, as.numeric) ) rownames(gtmat) <- colnames(pf3k_subset@gt)[2:41] colnames(gtmat) <- paste0(pf3k_subset@fix[,1], "_", vcf@fix[,2])[1:100] ret <- sample_posterior_mccoilr_categorical(gtmat, path = paste0(getwd(), "/scratch/mccoiltests/"), output="output_catMOI.txt", maxCOI=25, threshold_ind=20, threshold_site=20, totalrun=1000, burnin=100, M0=15, e1=0.05, e2=0.05, err_method=1) ret %>% DT::datatable(., extensions='Buttons', options = list( searching = T, pageLength = 5, dom = 'Bfrtip', buttons = c('csv')))
Issues are the mccoilr_categorical function writes out. If you want to keep all of the raw data (and not overwrite it) need to loop through the different potential output files. otherwise can overwrite it but will be left with the final jackknife iteration
directory must be empty took output over from user... annoying but should fix
ret <- block_jackknife4sample_posterior_mccoilr_categorical( data = gtmat, path = path, block_size = 1, maxCOI=25, threshold_ind=20, threshold_site=20, totalrun=1000, burnin=100, M0=15, e1=0.05, e2=0.05, err_method=1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.