gdlc/dMatrix: Memory Mapped Matrices and Data Structures for Genomic Data

Genetic data can be very large and often holding data in RAM is not feasible. In a Memory Mapped File the actual data is stored on the HD. The ff R package implements memory mapped arrays and provides a very fast implementation of indexing operations, this allows accessing cells of the array almost at the same speed as accessing those cells in a regular matrix object (held in RAM). However, with ff the array size is limited by the largest integer supported by the system; with genomic data we often exceed this. Therefore we are developing new classes (rDMatrix and cDMatrix) which are essentially collections of ff objects. In these classes we distribute a matrix either by rows (rDMatrix) or columns (cDMatrix) into multiple ff objects. We have developed indexing and many other methods that allows the user to deal with these objects as if they were regular matrices. In addition we have developed methods that can take rDMatrix or cDMatrix as input to compute genomic relationship matrices, etc. The classes cDMatrix and rDMatrix were designed to hold genotype data (integer coding), the class genData contains three slots @geno, @map and @pheno and can be used to hold GWAS data.

Getting started

Package details

LicenseMIT + file LICENSE
Package repositoryView on GitHub
Installation Install the latest version of this package by entering the following in R:
gdlc/dMatrix documentation built on May 17, 2019, 12:12 a.m.