README.md

gdsfmt: R Interface to CoreArray Genomic Data Structure (GDS) files

LGPLv3 GNU Lesser General Public License, LGPL-3

Availability Years-in-BioC Build Status Build status Comparison is done across all Bioconductor packages over the last 6 months codecov.io

Features

This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.

Bioconductor:

Release Version: v1.26.0

http://www.bioconductor.org/packages/release/bioc/html/gdsfmt.html

Help Documents

News: v1.26.0

Package Vignettes

http://bioconductor.org/packages/release/bioc/vignettes/gdsfmt/inst/doc/gdsfmt.html

Citations

Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.

Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.

Package Maintainer

Dr. Xiuwen Zheng (zhengxwen@gmail.com)

URL

http://github.com/zhengxwen/gdsfmt

http://www.bioconductor.org/packages/gdsfmt

Installation

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("gdsfmt")
library("devtools")
install_github("zhengxwen/gdsfmt")

The install_github() approach requires that you build from source, i.e. make and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.

Copyright Notice

GDS Command-line Tools

In the R environment,

install.packages("getopt", repos="http://cran.r-project.org")
install.packages("optparse", repos="http://cran.r-project.org")
install.packages("crayon", repos="http://cran.r-project.org")

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("gdsfmt")

See More...

viewgds

viewgds is a shell script written in R (viewgds.R), to view the contents of a GDS file. The R packages gdsfmt, getopt and optparse should be installed before running viewgds, and the package crayon is optional.

Usage: viewgds [options] file

Installation with command line,

curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds
chmod +x viewgds

## Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds
chmod +x viewgds

diffgds

diffgds is a shell script written in R (diffgds.R), to compare two files GDS files. The R packages gdsfmt, getopt and optparse should be installed before running diffgds.

Usage: diffgds [options] file1 file2

Installation with command line,

curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds
chmod +x diffgds

## Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds
chmod +x diffgds

Examples

library(gdsfmt)

# create a GDS file
f <- createfn.gds("test.gds")

add.gdsn(f, "int", val=1:10000)
add.gdsn(f, "double", val=seq(1, 1000, 0.4))
add.gdsn(f, "character", val=c("int", "double", "logical", "factor"))
add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50))
add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC")))
add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")

# list and data.frame
add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25)))
add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))

folder <- addfolder.gdsn(f, "folder")
add.gdsn(folder, "int", val=1:1000)
add.gdsn(folder, "double", val=seq(1, 100, 0.4))

# show the contents
f

# close the GDS file
closefn.gds(f)
File: test.gds (1.1K)
+    [  ]
|--+ int   { Int32 10000, 39.1K }
|--+ double   { Float64 2498, 19.5K }
|--+ character   { Str8 4, 26B }
|--+ logical   { Int32,logical 150, 600B } *
|--+ factor   { Int32,factor 3, 12B } *
|--+ bit2   { Bit2 1000, 250B }
|--+ list   [ list ] *
|  |--+ X   { Int32 10, 40B }
|  \--+ Y   { Float64 37, 296B }
|--+ data.frame   [ data.frame ] *
|  |--+ X   { Int32 19, 76B }
|  \--+ Y   { Float64 19, 152B }
\--+ folder   [  ]
   |--+ int   { Int32 1000, 3.9K }
   \--+ double   { Float64 248, 1.9K }

Also See

pygds: Python interface to CoreArray Genomic Data Structure (GDS) files

jugds.jl: Julia interface to CoreArray Genomic Data Structure (GDS) files



Try the gdsfmt package in your browser

Any scripts or data that you put into this service are public.

gdsfmt documentation built on Dec. 26, 2020, 6 p.m.