knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", out.width = '100%' ) options(width=70)
This vignette is aimed at those who've already familiarized themselves with RAPToR
reference building.
It's also for us to keep track of guidelines to continue improving RAPToR
with new references.
If you've already built some references and want to make them available to the world (or use them more easily yourself), you're in the right place.
You will need some basic knowledge of R package development. Data-packages are, after all, packages.
This document only details how to set up your data-package to interact properly with RAPToR
.
By definition, a "data-package" is an R package in which one stores large datasets (over a few Mo). This is a good practice for several reasons.
Hadley Wickam gives thorough advice on organizing data in packages in his R packages book.
RAPToR
uses references which can sometimes be tedious to build, so we want to give users access to pre-built references as much as possible.
References are stored as .RData
objects, which must include everything needed to make the interpolated reference:
There can be as many references as needed in a data-package; we group packages by organism.
Being consistent, clear and concise with naming can help users (or yourself !) find their way around references.
For example, wormRef
references are named with an organism code Cel
(C. elegans) followed by the developmental period covered by the reference e.g. larval
.
Document the references. It is standard practice to document data. What is the data? Where is the data from? Publication? etc.
The structure of the Cel_larval
reference object is detailed below as an example.
str(wormRef::Cel_larval, vec.len = 2)
With
g
The gene expression matrix (genes as rows, samples as columns).p
A dataframe of phenotypic data on the samples :sname
sample names,age
developmental age of the samples (scaled),cov
covariate, factor indicating which of 3 time series,age_ini
chronological age of the samples,accession
sample accession ID for GEO.geim_params
A list with necessary parameters for interpolationt.unit
the time unit.cov.levels
A named list with covariate levels to interpolate as.metadata
A named list with any extra metadataThe documentation for Cel_larval
can be accessed with ?Cel_larval
.
For a user to access data-package information and references directly from RAPToR
, we've set up a standard system using reference names.
A few objects are necessary for this interface to work.
.prepref_
functions.prepref_
functions (note the "dot") are the key of the interface : they prepare the reference for the user.
They must respect the naming convention .prepref_ref_name()
(e.g. .prepref_Cel_larval()
) and take n.inter
and/or by.inter
as arguments.
These functions are the backbone called by prepare_refdata()
when fetching a reference, and thus should
output the reference ref
object with the specified parameters.
This means building the GEIM model, and calling make_ref()
with the appropriate parameters and metadata.
We have made a function factory to generate these functions. It inputs the reference data object described above, and returns the corresponding prepref function.
.prepref_skel <- function(data, from=NULL, to=NULL){ # .prepref function factory f <- function(n.inter=NULL, by.inter=NULL){ m <- RAPToR::ge_im( X = data$g, p = data$p, formula = data$geim_params$formula, method = data$geim_params$method, dim_red = data$geim_params$dim_red, nc = data$geim_params$nc ) return(RAPToR::make_ref(m, n.inter = n.inter, by.inter = by.inter, from = from, to = to, t.unit = data$t.unit, cov.levels = data$cov.levels, metadata = data$metadata) ) } return(f) }
To make .prepref_Cel_larval()
, we simply include the following code in the data-package, along with the function factory.
.prepref_Cel_larval <- .prepref_skel(wormRef::Cel_larval)
ref_list
objectRAPToR
expects a ref_list
object in the data-package.
This is what's displayed when calling the list_refs(datapkg)
function.
library(RAPToR) library(wormRef) list_refs(datapkg = "wormRef")
The form/layout of this object is free, but reference names should be included somewhere, as the user needs them to access the reference through prepare_refdata()
.
.plot_refs()
functionThis function is optional, but very useful to guide users to the correct reference for their samples.
Including a .plot_refs()
function (note the “dot”) in a data-package will allow it to be called by the plot_refs(datapkg)
function in RAPToR
.
plot_refs(datapkg = "wormRef")
You're free to include any extra objects in your data-package that may be useful.
For example, the wormRef
package has a Cel_devstages
object with information on key developmental stages of C. elegans (which is used for building the plot in .plot_refs()
above).
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.