README.md

tasselr

Build Status

Tassel outputs HDF5 files full of GBS genotyping data. This package is an R interface to a subset of the information in these files, so users can quickly load and work with these data.

Required Packages

From CRAN:

From Bioconductor:

Install these packages, then using devtools (which you can install via CRAN), do:

install_github("vsbuffalo/tasselr")

Loading Tassel HDF5 GBS into R

First, we initialize the HDF5 file with initTasselHDF5. By default, initTasselHDF5() assumes the HDF5 file is in Tassel5's schema; set the version='4' to change this. initTasselHDF5() loads in the loci positions as a GRanges object, and stores reference and alternate alleles (which you can access with the ref and alt accessor functions, respectively):

> gbs <- initTasselHDF5("path/to/mygbs.h5")
> gbs
    Tassel HDF5 object at 'path/to/mygbs.h5'
    955690 loci x 2060 samples
    Number of chromosomes: 11
    Object size: 127.022 Mb

> head(alt(gbs), 20)
  3   4   5   6   8  11  13  14  15  17  18
 "T" "A" "C" "G" "T" "C" "T" "T" "G" "G" "T"   [...]

> head(ref(gbs))
[1] "C" "C" "C" "C" "C" "A"

Loading Genotypes

Genotypes are loaded and decoded into the number of alternate alleles they have (0, 1, 2) for biallelic loci by the method loadBiallelicGenotypes():

> gbs <- loadBiallelicGenotypes(gbs)

The conversion methods are written in C++ with Rcpp so they're fast-ish.

The accessor function geno() can be used to extract this genotype matrix. Note that the number of loci, and the reference and alternate alleles will change, as only biallelic loci are kept. The object will always have internal consistency, and you should always use accessor functions to access data in slots.

Accessor functions:

Warnings

Todo

Dev Notes



vsbuffalo/tasselr documentation built on May 3, 2019, 7:08 p.m.