Reference Semantics
In mpactr: Correction of Preprocessed MS Data

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(mpactr)

Reference semantics

mpactr is built on an R6 class-system, meaning it operates on reference semantics in which data is updated in-place. Compared to a shallow copy, where only data pointers are copied, or a deep copy, where the entire data object is copied in memory, any changes to the original data object, regardless if they are assigned to a new object, result in changes to the original data object. We can see this below.

data2 <- import_data(
  example_path("cultures_peak_table.csv"),
  example_path("cultures_metadata.csv"),
  format = "Progenesis"
)


get_peak_table(data2)[, 1:5]

Where the raw data object has r nrow(get_peak_table(data2)) ions in the feature table.

data2_mispicked <- filter_mispicked_ions(data2,
  ringwin = 0.5,
  isowin = 0.01, trwin = 0.005,
  max_iso_shift = 3, merge_peaks = TRUE,
  merge_method = "sum",
  copy_object = FALSE
)

get_peak_table(data2_mispicked)[, 1:5]

Running the filter_mispicked_ions filter, with default setting copy_object = FALSE (operates on reference semantics) results in r nrow(get_peak_table(data2_mispicked)) ions in the feature table.

Even though we created an object called data2_mispicked, the original data2 object was also updated and now has r nrow(get_peak_table(data2)) ions in the feature table:

get_peak_table(data2)[, 1:5]

We recommend using the default copy_object = FALSE as this makes for an extremely fast and memory-efficient way to chain mpactr filters together (see the Filter article); however, if you would like to run the filters individually with traditional R style objects, you can set copy_object to TRUE as shown in the filter examples.

Deep copy: specifics of the R6 class

The R6 class-system operates on reference semantics in which data is updated in-place. Compared to a shallow copy, where only data pointers are copied, or a deep copy, where the entire data object is copied in memory, any changes to the original data object, regardless if they are assigned to a new object, result in changes to the original data object. R6 Accomplishes this by taking advantaging of the environment system in R. Inside R, everything is created inside a base R environment. This contains all functions, saved variables, libraries, references, etc. Using R6 classes allows us to easily add this functionality to our R package.

In general, R relies on reference semantics to store data away from the outside because R environments are a container for a copious amount of data. In a normal R session, the base R environment is the outermost environment, allowing you to access to everything you need.

Reference semantics become noticeable when you send an environmental variable to a function. In R, functions rely on call-by-value semantics. Call-by-value is described as functions treating parameterized values (values specified when calling the function) as local variables when in the function. Anything you do to the variable in the function will have no effect on the variables outside. This follows traditional copy by value semantics. However, R does not allow you to send over variables by reference due to this idea. So, you can think of functions as temporary environments in R. What makes these environments so powerful, is the fact that you can send an environment to a function, and it will not copy the environment. This allows you to send variables by reference to functions. R6 classes rely on this, and mpactr uses this for speedy execution.

Memory usage really shines when you use R6 classes vs. a traditional workflow, such as copy by value. In a traditional workflow, all of the data must be copied to call functions and compute operations, using R6 classes we can minimize that problem, improving performance for large datasets.