knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

devtools::install_github("IDEELResearch/vcfR2plsmdmcalc")
library(vcfR2plsmdmcalc)
data("pfcross_subset", package = "vcfR2plsmdmcalc")
pfcross_subset

library(vcfR)
library(rehh)

Setup

There are two main files need to setup rehh.

haplotype file

Note, EHH expects you to have biallelic sites that are split by chromosome. Although you can have a map file with multiple chromosomes, it would be better to split it out. Either way, the package structure for rehh makes you write out to disk a lot... Could manipulate their haplohh class to avoid this...but for now we will write out as below. You will need the vcfR2plsmdmcalc::vcfR2thap wrapper function to make a "haplotype" file.

map file

In the map file, rehh requires that we have ancestral/derived alleles (i.e. that we have polarized our alleles). This is needed for within population statistics where we are trying to characterize the strength of LD (i.e. or really IBS block length) around a "focal marker". As such, polarization of the ancestral vs. derived allele matters. However, if you want to compare between populations, XP-EHH the ancestral and derived allele doesn't matter (it can be assigned at random). Or you can ignore the ancestral/derived state all together (i.e. EHHS).

Importing and Wrangling

Note, I am making a big assumption here that the referent allele can be polarized as the ancestral allele. This is almost certainly not the case for every position...but as a temporary fix. I also know that all of this data just came from Chromosome 1, so can make another heuristic by saying chrom=1. Note, rehh will make you write out to your disk-space, so you will need to change the paths here.

thap <- as.data.frame(vcfR2thap(pfcross_subset)) 
playmap <- data.frame(name = paste0("name", seq(1:nrow(pfcross_subset@gt))),
                      chrom = 1,
                      pos = as.numeric(pfcross_subset@fix[,2]),
                      anc = pfcross_subset@fix[,4],
                      der = pfcross_subset@fix[,5])

write.table(x = thap, file = "~/Desktop/test/vcfRrest/test.thap",
            sep = " ", quote = F, row.names =T, col.names = F)

write.table(x = playmap, file = "~/Desktop/test/vcfRrest/playmap.inp",
            sep = " ", quote = F, row.names =F, col.names = F)

hapobj <- data2haplohh(hap_file="~/Desktop/test/vcfRrest/test.thap",
                  map_file="~/Desktop/test/vcfRrest/playmap.inp",
                  haplotype.in.columns=TRUE,
                  recode.allele=T,
                  chr.name=1, 
                  min_perc_geno.hap=0,
                  min_perc_geno.snp=0
                  )

Taking rehh for a Spin

Not a good example since this is cross data. Change to subset of Pf3k...

calc_ehh(haplohh = hapobj,
         mrk = 50,
         limhaplo = 2,
         limehh = 0.05,
         plotehh = T
         )
calc_ehhs(haplohh = hapobj,
         mrk = 50,
         limhaplo = 2,
         limehh = 0.05,
         plotehh = T
         )


IDEELResearch/vcfR2plsmdmcalc documentation built on May 6, 2019, 1 a.m.