Sample modifiers in pepr: imply and derive
In pepr: Reading Portable Encapsulated Projects

Learn how to combine implied and derived attributes in `pepr`

This vignette will show you how and why to use the derieved attributes and implied attributes functionalities concurrently of the pepr package.

For the basic information about the PEP concept on the project website
Make sure to study the dedicated derived attributes and implied attributes vignettes prior to reading this one

Problem/Goal

While either derived attributes or implied attributes functionalities alone are often sufficient to efficiently describe your samples in PEP, the example below demonstrates how to use the derived attributes to simplify and unclutter the columns of the sample_table.csv file, after implying the attributes for samples that follow certain patterns. The two functionalities combined provide you with the way of building complex, yet flexible sample annotation tables effortlessly. Note that the attributes implication is always performed first - before the attributes are derived. This means that the newly created attributes (implied ones) can be used to construct the attributes in the column derivation process. Please consider the example below for reference:

branch = "master"
library(knitr)
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive_imply",
"sample_table_pre.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
knitr::kable(sampleAnnotationDF, format = "html")

Solution

The specification of detailed file paths/names (as presented above) is cumbersome. In order to make your life easier just find the patterns that the file names in file_path column of sample_table.csv follow, imply needed attributes and derive the file names. This multi step process is orchestrated by the project_config.yaml file via the sample_modifiers.derive and sample_modifiers.imply sections:

library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive_imply",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))

The *_untreated files are clearly associated with the samples that are labeled with time 0. Therefore the untreated attribute is implied for the samples which have 0 in the time columns. Similarly, the codes susScr11 and xenTro9 are associated with the attributes in the oragnism column. Therefore, the column condion that consists of those two codes is implied from the attributes in the organism column according to the project_config.yaml.

Let's introduce a few modifications to the original sample_table.csv file to imply the attributes genome and condition and subsequently map the appropriate data sources from the project_config.yaml with attributes in the derived column - [file_path]:

sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_derive_imply",
  "sample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  knitr::kable(sampleAnnotationDF, format = "html")

Code

Load pepr and read in the project metadata by specifying the path to the project_config.yaml:

library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive_imply",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)

And inspect it:

sampleTable(p)

As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file (enriched with the genome and condition attributes that were implied).