README.md
In amcdavid/PenultimateGEXContainer: Penultimate gene expression container: An interface for interchange of gene expression data

PenultimateGEXContainer

Although there are several good classes for storing gene expression data, in many cases it's not ideal to share serialized versions of these classes. The package is meant for intermediate storage/exchange of gene expression data, sample, feature covariates and pointers to raw data. It maybe (eventually) will support idempotent conversion to/from HDF5, text files and GEO formats. Emphasis is placed on harmonizing covariates between studies, so a controlled vocabulary is available and use encouraged. Its design case was for single cell gene expression experiments, but is hoped that it will be useful in other contexts.

Good question. MAGE-TAB is getting pretty creaky. GEO/SOFT format almost works, but only for sample-level covariates. It requires some abuse to model cells, and many datasets only offer a link to the processed data now. Both of these are also missing some important (to me) experiment and sample-level covariates.

Status quo: ad hockery abounds. Text formats with various headers or whatever happened to be uploaded onto GEO.
GEO/SOFT format. This is pretty close to what we need for sample and platform covariates, but lacks vocab we'd like to have access to. ArrayExpress has competing format IDF, SDRF and imports GEO experiments weekly, so we should just use that, if it's easy to parse
We want more enumerated types.
SummarizedExperiment: this maybe could work, but lacks controlled vocab and important/export outside of R, so we'd need to subclass.
MIAME: A set of reported dimensions and metrics, not an interchange format (though MAGE-Tab...)

Intended to describe: - technical aspects of the assay such as platform, chemistry in greater detail than GEO - upstream computational aspects, such as aligner, read trimming, deduplication. These two might be scrapable from the Protocol. - cellular covariates such as batch, treatment, sort info - sample covariates such as organism, tissue, cell line, age, sex. Many available in GEO. Use MeSH/EFO where appropriate. Package ontoCat can read them. Key:value rather than tabular? - feature covariates: Genome/transcriptome, id type (ENTREZGENE, ENSEMBL, ...) These should be dynamically crowd-sourced with a google docs sheet. Then validated and incorporated into namespace as data when package is built using data-raw.

Use GEO/SOFT, IDF/SDRF or if present, possibly re-writing if incorrect
Make new field, preserving GEO/SOFT

ReadPGEX -> .txt, .hdf5 WritePGEX -> .txt, .hdf5 GuessPlatform(character; vocab) GuessSample(character, vocab) GuessCell(character, vocab)

amcdavid/PenultimateGEXContainer documentation built on May 12, 2019, 2:35 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

amcdavid/PenultimateGEXContainer
Penultimate gene expression container: An interface for interchange of gene expression data

README.md
In amcdavid/PenultimateGEXContainer: Penultimate gene expression container: An interface for interchange of gene expression data

PenultimateGEXContainer

How dreadfully banal. Why attempt to define another architecture?

Alternatives

Controlled vocab

Use of GEO/SOFT

API:

PenultimateGEXContainer: We'll reinvent this wheel at least once more.

R Package Documentation

Browse R Packages

We want your feedback!

amcdavid/PenultimateGEXContainer Penultimate gene expression container: An interface for interchange of gene expression data

README.md In amcdavid/PenultimateGEXContainer: Penultimate gene expression container: An interface for interchange of gene expression data

PenultimateGEXContainer

How dreadfully banal. Why attempt to define another architecture?

Alternatives

Controlled vocab

Use of GEO/SOFT

API:

PenultimateGEXContainer: We'll reinvent this wheel at least once more.

R Package Documentation

Browse R Packages

We want your feedback!

amcdavid/PenultimateGEXContainer
Penultimate gene expression container: An interface for interchange of gene expression data

README.md
In amcdavid/PenultimateGEXContainer: Penultimate gene expression container: An interface for interchange of gene expression data