GNU Lesser General Public License, LGPL-3
This package provides a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which are portable across platforms with hierarchical structure to store multiple scalable array-oriented data sets with metadata information. It is suited for large-scale datasets, especially for data which are much larger than the available random-access memory. The gdsfmt package offers the efficient operations specifically designed for integers of less than 8 bits, since a diploid genotype, like single-nucleotide polymorphism (SNP), usually occupies fewer bits than a byte. Data compression and decompression are available with relatively efficient random access. It is also allowed to read a GDS file in parallel with multiple R processes supported by the package parallel.
Release Version: v1.26.0
http://www.bioconductor.org/packages/release/bioc/html/gdsfmt.html
News: v1.26.0
http://bioconductor.org/packages/release/bioc/vignettes/gdsfmt/inst/doc/gdsfmt.html
Zheng X, Levine D, Shen J, Gogarten SM, Laurie C, Weir BS (2012). A High-performance Computing Toolset for Relatedness and Principal Component Analysis of SNP Data. Bioinformatics. DOI: 10.1093/bioinformatics/bts606.
Zheng X, Gogarten S, Lawrence M, Stilp A, Conomos M, Weir BS, Laurie C, Levine D (2017). SeqArray -- A storage-efficient high-performance data format for WGS variant calls. Bioinformatics. DOI: 10.1093/bioinformatics/btx145.
Dr. Xiuwen Zheng (zhengxwen@gmail.com)
http://github.com/zhengxwen/gdsfmt
http://www.bioconductor.org/packages/gdsfmt
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("gdsfmt")
library("devtools")
install_github("zhengxwen/gdsfmt")
The install_github()
approach requires that you build from source, i.e. make
and compilers must be installed on your system -- see the R FAQ for your operating system; you may also need to install dependencies manually.
In the R environment,
install.packages("getopt", repos="http://cran.r-project.org")
install.packages("optparse", repos="http://cran.r-project.org")
install.packages("crayon", repos="http://cran.r-project.org")
if (!requireNamespace("BiocManager", quietly=TRUE))
install.packages("BiocManager")
BiocManager::install("gdsfmt")
viewgds
is a shell script written in R (viewgds.R), to view the contents of a GDS file. The R packages gdsfmt
, getopt
and optparse
should be installed before running viewgds
, and the package crayon
is optional.
Usage: viewgds [options] file
Installation with command line,
curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds
chmod +x viewgds
## Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/viewgds.R > viewgds
chmod +x viewgds
diffgds
is a shell script written in R (diffgds.R), to compare two files GDS files. The R packages gdsfmt
, getopt
and optparse
should be installed before running diffgds
.
Usage: diffgds [options] file1 file2
Installation with command line,
curl -L https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds
chmod +x diffgds
## Or
wget -qO- --no-check-certificate https://raw.githubusercontent.com/zhengxwen/Documents/master/Program/diffgds.R > diffgds
chmod +x diffgds
library(gdsfmt)
# create a GDS file
f <- createfn.gds("test.gds")
add.gdsn(f, "int", val=1:10000)
add.gdsn(f, "double", val=seq(1, 1000, 0.4))
add.gdsn(f, "character", val=c("int", "double", "logical", "factor"))
add.gdsn(f, "logical", val=rep(c(TRUE, FALSE, NA), 50))
add.gdsn(f, "factor", val=as.factor(c(NA, "AA", "CC")))
add.gdsn(f, "bit2", val=sample(0:3, 1000, replace=TRUE), storage="bit2")
# list and data.frame
add.gdsn(f, "list", val=list(X=1:10, Y=seq(1, 10, 0.25)))
add.gdsn(f, "data.frame", val=data.frame(X=1:19, Y=seq(1, 10, 0.5)))
folder <- addfolder.gdsn(f, "folder")
add.gdsn(folder, "int", val=1:1000)
add.gdsn(folder, "double", val=seq(1, 100, 0.4))
# show the contents
f
# close the GDS file
closefn.gds(f)
File: test.gds (1.1K)
+ [ ]
|--+ int { Int32 10000, 39.1K }
|--+ double { Float64 2498, 19.5K }
|--+ character { Str8 4, 26B }
|--+ logical { Int32,logical 150, 600B } *
|--+ factor { Int32,factor 3, 12B } *
|--+ bit2 { Bit2 1000, 250B }
|--+ list [ list ] *
| |--+ X { Int32 10, 40B }
| \--+ Y { Float64 37, 296B }
|--+ data.frame [ data.frame ] *
| |--+ X { Int32 19, 76B }
| \--+ Y { Float64 19, 152B }
\--+ folder [ ]
|--+ int { Int32 1000, 3.9K }
\--+ double { Float64 248, 1.9K }
pygds: Python interface to CoreArray Genomic Data Structure (GDS) files
jugds.jl: Julia interface to CoreArray Genomic Data Structure (GDS) files
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.