knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Learn sample subannotations in pepr

This vignette will show you how and why to use the subsample table functionality of the pepr package.

Problem/Goal

This series of examples below demonstrates how and why to use sample subannoatation functionality in multiple cases to provide multiple input files of the same type for a single sample.

Solutions

Example 1: basic sample subannotation table

This example demonstrates how the sample subannotation functionality is used. In this example, 2 samples have multiple input files that need merging (frog_1 and frog_2), while 1 sample (frog_3) does not. Therefore, frog_3 specifies its file in the sample_table.csv file, while the others leave that field blank and instead specify several files in the subsample_table.csv file.

This example is made up of these components:

branch = "master"
library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
library(knitr)
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable1",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 

Let's create the Project object and see if multiple files are present

projectConfig1 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable1",
"project_config.yaml",
package = "pepr"
)
p1 = Project(projectConfig1)
# Check the files
p1Samples = sampleTable(p1)
p1Samples$file
# Check the subsample names
p1Samples$subsample_name

And inspect the whole table in p1@samples slot

kable(p1Samples)

You can also access a single subsample if you call the getSubsample method with appropriate sample_name - subsample_name attribute combination. Note, that this is only possible if the subsample_name column is defined in the sub_annotation.csv file.

sampleName = "frog_1"
subsampleName = "sub_a"
getSubsample(p1, sampleName, subsampleName)

Example 2: subannotations and derived attributes

This example uses a subsample_table.csv file and a derived attributes to point to files. This is a rather complex example. Notice we must include the file_id column in the sample_table.csv file, and leave it blank; this is then populated by just some of the samples (frog_1 and frog_2) in the subsample_table.csv, but is left empty for the samples that are not merged.

This example is made up of these components:

projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable2",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 

Let's load the project config, create the Project object and see if multiple files are present

projectConfig2 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable2",
"project_config.yaml",
package = "pepr"
)
p2 = Project(projectConfig2)
# Check the files
p2Samples = sampleTable(p2)
p2Samples$file

And inspect the whole table in p2@samples slot

kable(p2Samples)

Example 3: subannotations and expansion characters

This example gives the exact same results as Example 2, but in this case, uses a wildcard for frog_2 instead of including it in the subsample_table.csv file. Since we can't use a wildcard and a subannotation for the same sample, this necessitates specifying a second data source class (local_files_unmerged) that uses an asterisk (*). The outcome is the same.

This example is made up of these components:

projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(projectConfig))
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable3",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 

Let's load the project config, create the Project object and see if multiple files are present

projectConfig3 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable3",
"project_config.yaml",
package = "pepr"
)
p3 = Project(projectConfig3)
# Check the files
p3Samples = sampleTable(p3)
p3Samples$file

And inspect the whole table in p3@samples slot

kable(p3Samples)

Example 4: subannotations and multiple (separate-class) inputs

Merging is for same class inputs (like, multiple files for read1). Different-class inputs (like read1 vs read2) are handled by different attributes (or columns). This example shows you how to handle paired-end data, while also merging within each.

This example is made up of these components:

project_config = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"project_config.yaml",
package = "pepr"
)
.printNestedList(yaml::read_yaml(project_config))
sampleAnnotation = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"sample_table.csv",
package = "pepr"
)
sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
kable(sampleAnnotationDF, format = "html") 
sampleAnnotation = system.file(
  "extdata",
  paste0("example_peps-", branch),
  "example_subtable4",
  "subsample_table.csv",
  package = "pepr"
  )
  sampleAnnotationDF = read.table(sampleAnnotation, sep = ",", header = T)
  kable(sampleAnnotationDF, format = "html") 

Let's load the project config, create the Project object and see if multiple files are present

projectConfig4 = system.file(
"extdata",
paste0("example_peps-", branch),
"example_subtable4",
"project_config.yaml",
package = "pepr"
)
p4 = Project(projectConfig4)
# Check the read1 and read2 columns
p4Samples = sampleTable(p4)
p4Samples$read1
p4Samples$read2

And inspect the whole table in p4@samples slot

kable(p4Samples)


pepkit/pepr documentation built on Nov. 23, 2023, 5:54 a.m.