GDSArray is a Bioconductor package that represents GDS files as
objects derived from the DelayedArray package and
class. It converts a GDS node in the file to a
data structure. The rich common methods and data operations defined on
GDSArray makes it more R-user-friendly than working with the GDS
file directly. The array data from GDS files are always returned with
the first dimension being
variants/snps and the second dimension
samples. This feature is consistent with the assay data saved
SummarizedExperiment, and makes the
interoperable with other established Bioconductor data
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GDSArray")
The Bioconductor package gdsfmt has provided a high-level R interface to CoreArray Genomic Data Structure (GDS) data files, which is designed for large-scale datasets, especially for data which are much larger than the available random-access memory.
The GDS format has been widely used in genetic/genomic research for
high-throughput genotyping or sequencing data. There are two major
classes that extends the
SNPGDSFileClass suited for
genotyping data (e.g., GWAS), and
SeqVarGDSClass that are designed
specifically for DNA-sequencing data. The file format attribute in
each data class is set as
SEQ_ARRAY. There are rich
functions written based on these data classes for common data
operation and statistical analysis.
GDSArray represents GDS files as
DelayedArray instances. It has
dimnames defined, and it inherits array-like
operations and methods from
DelayedArray, e.g., the subsetting
GDSArray() constructor takes as arguments the file path and the
GDS node inside the GDS file. The
GDSArray() constructor always
returns the object with rows being features (genes / variants / snps)
and the columns being "samples". This is consistent with the assay
file <- SeqArray::seqExampleFileName("gds") GDSArray(file, "genotype")
GDSMatrix is a 2-dimensional
GDSArray, and will be returned from
GDSArray() constructor automatically if the input GDS node is
GDSFile is a light-weight class to represent GDS files. It
$ completion method to complete any possible gds nodes. It
could be used as a convenient
GDSArray constructor if the slot of
GDSFile object represents a valid gds node.
Otherwise, it will return the
GDSFile object with an updated
gf <- GDSFile(file) gf$annotation$info gf$annotation$info$AC
Try typing in
gf$ann and pressing
tab key for the completion.
ga <- GDSArray(file, "genotype") seed(ga)
gdsfilereturns the file path of the corresponding GDS file.
gdsnodes() takes the GDS file path or
GDSFile object as input, and
returns all nodes that can be converted to
GDSArray instances. The
returned GDS node names can be used as input for the
gdsnodes(file) identical(gdsnodes(file), gdsnodes(gf)) GDSArray(file, name=gdsnodes(file))
dimnames(GDSArray) returns an unnamed list, with the length of
each element to be the same as return from
ga <- GDSArray(file, "annotation/format/DP") dim(ga) class(dimnames(ga)) lengths(dimnames(ga))
GDSArray instances can be subset, following the usual R
conventions, with numeric or logical vectors; logical vectors are
recycled to the appropriate length.
ga[1:3, 10:15] ga[c(TRUE, FALSE), ]
dp <- GDSArray(file, "annotation/format/DP") dp log(dp) dp[rowMeans(dp) < 60, ]
GDSArraySeed class represents the 'seed' for the
object. It is not exported from the GDSArray package. Seed objects
should contain the GDS file path, and are expected to satisfy the
“seed contract” i.e. to support dim() and dimnames().
seed <- GDSArray:::GDSArraySeed(file, "genotype") seed
The seed can be used to construct a
DelayedArray() constructor with
GDSArraySeed object as
argument will return the same content as the
over the same
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.