knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

devtools::install_github("IDEELResearch/vcfR2plsmdmcalc")
library(vcfR2plsmdmcalc)
data("pfcross_subset", package = "vcfR2plsmdmcalc")
pfcross_subset

data("pf3k_subset", package = "vcfR2plsmdmcalc")

This vignette provides basic functionality for the function block_jackknife. Of note, this function is largely redundant with the block_jackknife function in the [vcfdo](https://github.com/IDEELResearch/vcfdo) package.

Quick Start

The expectation is that this function will be used to iterate through samples in a vcf. As a result, each sample is a vector. As a result, we can capitalize on vectorization with R and use the apply suite of functions. For example, below we calculate within-sample He.

gtmat <- vcfR::extract.gt(pfcross_subset, element = "GT")
gtmat[gtmat == "0/0"] <- 0
gtmat[gtmat == "0/1"] <- 0.5
gtmat[gtmat == "1/1"] <- 1
gtmat <- apply(gtmat, 2, as.numeric)

HeCalc <- function(gt){
  1 - ((sum(gt == 0)/sum(!is.na(gt)))^2 + (sum(gt == 1)/sum(!is.na(gt)))^2)
}

He <- apply(gtmat, 2, block_jackknife, statistic = HeCalc, block_size = 6)

Running Jackknife under the THEREALMcCOIL Model

This function makes use of the McCOILR package by OJ Watson which is a wrapper for the Hsiao-Han Chang's THEREALMcCOIL package (PMC5300274). However, given this structure, we cannot use the block_jackknife structure above which expects numeric or integer vectors as the input. Instead, we pass a dataset that is ready for a THEREALMcCOIL analysis (see here for file structure) and perform jackknife re-sampling on the loci. In each jackknife-replication, a specific value of MOI is drawn from the posterior distribution of the THEREALMcCOIL MCMC sampling phase output.
Below we explore this function.

A single iteration

gtmat <- vcfR::extract.gt(pf3k_subset, element = "GT")
gtmat <- gtmat[1:100,1:40]
gtmat[is.na(gtmat)] <- -1
gtmat[gtmat == "0/0"] <- 0
gtmat[gtmat == "0/1"] <- 0.5
gtmat[gtmat == "1/1"] <- 1
gtmat <- t( apply(gtmat, 2, as.numeric) )
rownames(gtmat) <- colnames(pf3k_subset@gt)[2:41]
colnames(gtmat) <- paste0(pf3k_subset@fix[,1], "_", vcf@fix[,2])[1:100]


ret <- sample_posterior_mccoilr_categorical(gtmat, 
                                            path = paste0(getwd(), "/scratch/mccoiltests/"),
                                            output="output_catMOI.txt",
                                            maxCOI=25, threshold_ind=20, threshold_site=20,
                                            totalrun=1000, burnin=100, M0=15, 
                                            e1=0.05, e2=0.05,
                                            err_method=1)

ret %>% 
  DT::datatable(., extensions='Buttons',
                options = list(
                searching = T,
                pageLength = 5,
                dom = 'Bfrtip', 
                buttons = c('csv')))

Jackknife Resampling

Issues are the mccoilr_categorical function writes out. If you want to keep all of the raw data (and not overwrite it) need to loop through the different potential output files. otherwise can overwrite it but will be left with the final jackknife iteration

directory must be empty took output over from user... annoying but should fix

ret <- block_jackknife4sample_posterior_mccoilr_categorical( 
        data = gtmat, 
        path = path, 
        block_size = 1,
        maxCOI=25, threshold_ind=20, threshold_site=20,
        totalrun=1000, burnin=100, M0=15, 
        e1=0.05, e2=0.05,
        err_method=1)


IDEELResearch/vcfR2plsmdmcalc documentation built on May 6, 2019, 1 a.m.