README.md

HaplotypeMiner

What is HaplotypeMiner?

HaplotypeMiner is an R package developed for exploring allelic diversity at genes of interest in a plant breeding context. The program minimally takes as input a dataset of SNP markers generated through various methods (e.g. genotyping-by-sequencing [GBS] or SNP arrays) and the genomic position of a gene of interest, and outputs a set of possible haplotypes defined by the genotypes of a reduced number of neighboring SNPs. The kinship and structure of the population assessed can also by used as input to HaplotypeMiner to yield more robust results.

The haplotyping model of HaplotypeMiner implies the following steps: - SNP markers in a window of a given size surrounding the central position of the gene are extracted and optionally filtered according to user-specified settings. - Linkage disequilibrium (LD) blocks are identified on each side (5' and 3') of the gene center. One SNP is selected from each block in order to avoid redundancy and reduce the dataset to a set of informative SNPs. - Pairs of SNPs that are in significant LD across the gene center are selected and used for defining haplotypes. - Every unique combination of alleles at the SNPs selected during the preceding step is considered as a haplotype and is output by the program. The program also assigns a haplotype to every sample for which it is possible to do so unambiguously.

The underlying assumption of HaplotypeMiner is that even though GBS or SNP array datasets may not provide a comprehensive view of the variation in gene-coding or other functionally relevant sequences, they may provide polymorphic markers that are in LD with variants having a functional impact. The model implemented by HaplotypeMiner assumes that if two markers are in LD across the central position of the gene, then they are even more likely to be in LD with variants located in the gene sequence. By identifying unique combinations of such marker pairs, our hope is therefore to identify haplotypes that have a one-to-one correspondence with functionally relevant alleles of genes of interest. HaplotypeMiner is therefore primarily intended as a tool to allow plant breeders to assess allelic diversity at specific genes in a germplasm collection and assist them in decision-making.

Installation

HaplotypeMiner can be installed directly in R by calling devtools::install_github("malemay/HaplotypeMiner", build_vignettes = TRUE). This will directly fetch the package from the GitHub, install it on your computer, and build the vignette. This requires devtools to be installed on your computer. This package is available from CRAN through the usual ├Čnstall.packages() interface. You do not strictly have to use the build_vignettes = TRUE option to install the package, but you really should do so as the vignette is the the main resource for understanding how to use the package and the built HTML version is not provided on this repository.

HaplotypeMiner requires a few Bioconductor packages to be installed on your computer. You can install them by running the following commands in R:

{
source("https://bioconductor.org/biocLite.R")
biocLite()
biocLite(c("GenomeInfoDb", "snpStats", "SummarizedExperiment", "VariantAnnotation"))
}

Typical usage

Using HaplotypeMiner will typically involve three steps for most users of the package:

The vignette describing the approach used by HaplotypeMiner as well as a detailed usage example can be accessed by running vignette("HaplotypeMiner-Overview") in R; we strongly advise users not to use the package before reading this vignette. Users interested in gaining more control over the workflow and output of HaplotypeMiner will also find more information on this topic in the vignette. Full documentation has yet to be completed for some of the functions; until then, the vignette should provide enough information for you to get started.

Known issues

Notes

If you use HaplotypeMiner as part of your work, please cite it as numerous hours have been invested in its development. A manuscript describing the underlying model of HaplotypeMiner and results obtained with soybean will be submitted for publication as a research paper shortly. Until then, the recommended citation for the package can be obtained by running the command citation("HaplotypeMiner") in R.

This software is provided without any guarantee. HaplotypeMiner has only been thoroughly tested with soybean. We do not know to what extent the model will hold for polyploid species, mainly outcrossing species, or species in which linkage disequilibrium decays more rapidly with physical distance than in soybean. If you test HaplotypeMiner with a different species, we would be happy to know about the results and provide some advice if requested.

Issues, bugs reports and questions can be shared on the GitHub page of the project or addressed to the package maintainer (see the package DESCRIPTION for contact information).



malemay/HaplotypeMiner documentation built on May 28, 2019, 2:48 p.m.