knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.align = "center",
  out.width = '100%'
)

options(width=70)

Introduction

This vignette is aimed at those who've already familiarized themselves with RAPToR reference building. It's also for us to keep track of guidelines to continue improving RAPToR with new references.

If you've already built some references and want to make them available to the world (or use them more easily yourself), you're in the right place. You will need some basic knowledge of R package development. Data-packages are, after all, packages. This document only details how to set up your data-package to interact properly with RAPToR.

What's a data-package ?

By definition, a "data-package" is an R package in which one stores large datasets (over a few Mo). This is a good practice for several reasons.

  1. If your data rarely or never changes, updates to the data-package (and thus, download of the data) will be minimal. If included in a standard package, large data can be a burden during install.
  2. The data may never be used. Why have users download data they won't need ?
  3. CRAN standards limit package size to 5MB (documentation included). A large dataset is better off separated from methods and functions that may need it.

Hadley Wickam gives thorough advice on organizing data in packages in his R packages book.

Reference data

RAPToR uses references which can sometimes be tedious to build, so we want to give users access to pre-built references as much as possible. References are stored as .RData objects, which must include everything needed to make the interpolated reference:

There can be as many references as needed in a data-package; we group packages by organism.

Being consistent, clear and concise with naming can help users (or yourself !) find their way around references. For example, wormRef references are named with an organism code Cel (C. elegans) followed by the developmental period covered by the reference e.g. larval.

Document the references. It is standard practice to document data. What is the data? Where is the data from? Publication? etc.

The structure of the Cel_larval reference object is detailed below as an example.

str(wormRef::Cel_larval, vec.len = 2)

With

The documentation for Cel_larval can be accessed with ?Cel_larval.

Data-package interface with RAPToR

For a user to access data-package information and references directly from RAPToR, we've set up a standard system using reference names.

A few objects are necessary for this interface to work.

.prepref_ functions

.prepref_ functions (note the "dot") are the key of the interface : they prepare the reference for the user. They must respect the naming convention .prepref_ref_name() (e.g. .prepref_Cel_larval()) and take n.inter and/or by.inter as arguments.

These functions are the backbone called by prepare_refdata() when fetching a reference, and thus should output the reference ref object with the specified parameters. This means building the GEIM model, and calling make_ref() with the appropriate parameters and metadata.

We have made a function factory to generate these functions. It inputs the reference data object described above, and returns the corresponding prepref function.

.prepref_skel <- function(data, from=NULL, to=NULL){
  # .prepref function factory
  f <- function(n.inter=NULL, by.inter=NULL){
    m <- RAPToR::ge_im(
      X = data$g,
      p = data$p,
      formula = data$geim_params$formula,
      method = data$geim_params$method,
      dim_red = data$geim_params$dim_red,
      nc = data$geim_params$nc
    )
    return(RAPToR::make_ref(m, 
                            n.inter = n.inter,
                            by.inter = by.inter,
                            from = from, 
                            to = to,
                            t.unit = data$t.unit,
                            cov.levels = data$cov.levels,
                            metadata = data$metadata)
    )
  }
  return(f)
}

To make .prepref_Cel_larval(), we simply include the following code in the data-package, along with the function factory.

.prepref_Cel_larval <- .prepref_skel(wormRef::Cel_larval)

ref_list object

RAPToR expects a ref_list object in the data-package. This is what's displayed when calling the list_refs(datapkg) function.

library(RAPToR)
library(wormRef)

list_refs(datapkg = "wormRef")

The form/layout of this object is free, but reference names should be included somewhere, as the user needs them to access the reference through prepare_refdata().

.plot_refs() function

This function is optional, but very useful to guide users to the correct reference for their samples. Including a .plot_refs() function (note the “dot”) in a data-package will allow it to be called by the plot_refs(datapkg) function in RAPToR.

plot_refs(datapkg = "wormRef")

Other objects

You're free to include any extra objects in your data-package that may be useful. For example, the wormRef package has a Cel_devstages object with information on key developmental stages of C. elegans (which is used for building the plot in .plot_refs() above).

SessionInfo {.unnumbered}

sessionInfo()


LBMC/RAPToR documentation built on April 6, 2023, 12:26 p.m.