README.md

PenultimateGEXContainer

Although there are several good classes for storing gene expression data, in many cases it's not ideal to share serialized versions of these classes. The package is meant for intermediate storage/exchange of gene expression data, sample, feature covariates and pointers to raw data. It maybe (eventually) will support idempotent conversion to/from HDF5, text files and GEO formats. Emphasis is placed on harmonizing covariates between studies, so a controlled vocabulary is available and use encouraged. Its design case was for single cell gene expression experiments, but is hoped that it will be useful in other contexts.

How dreadfully banal. Why attempt to define another architecture?

Good question. MAGE-TAB is getting pretty creaky. GEO/SOFT format almost works, but only for sample-level covariates. It requires some abuse to model cells, and many datasets only offer a link to the processed data now. Both of these are also missing some important (to me) experiment and sample-level covariates.

Alternatives

Controlled vocab

Intended to describe: - technical aspects of the assay such as platform, chemistry in greater detail than GEO - upstream computational aspects, such as aligner, read trimming, deduplication. These two might be scrapable from the Protocol. - cellular covariates such as batch, treatment, sort info - sample covariates such as organism, tissue, cell line, age, sex. Many available in GEO. Use MeSH/EFO where appropriate. Package ontoCat can read them. Key:value rather than tabular? - feature covariates: Genome/transcriptome, id type (ENTREZGENE, ENSEMBL, ...) These should be dynamically crowd-sourced with a google docs sheet. Then validated and incorporated into namespace as data when package is built using data-raw.

Use of GEO/SOFT

  1. Use GEO/SOFT, IDF/SDRF or if present, possibly re-writing if incorrect
  2. Make new field, preserving GEO/SOFT

API:

ReadPGEX -> .txt, .hdf5 WritePGEX -> .txt, .hdf5 GuessPlatform(character; vocab) GuessSample(character, vocab) GuessCell(character, vocab)

PenultimateGEXContainer: We'll reinvent this wheel at least once more.



amcdavid/PenultimateGEXContainer documentation built on May 12, 2019, 2:35 a.m.