knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(s3examples)
R is traditionally a functional programming language, meaning that when we code in R, we tend to think in terms of functions. For instance, given a data set, what function can I create to plot a certain variable? Object oriented programming introduces the concept of "classes." Classes are used extensively in other programming languages such as Java and are fascinating as they can enforce a specific structure or behavior of an object.
In some instances, in R, you might want to create a function that works for a specific type of data format. Perhaps there you are interested in a specific column and want to ensure your data has this column present. For instance, let us take a look at the following dataset.
ref_file <- miplicorn::miplicorn_example("reference_AA_table.csv") alt_file <- miplicorn::miplicorn_example("alternate_AA_table.csv") cov_file <- miplicorn::miplicorn_example("coverage_AA_table.csv") data <- miplicorn::read_tbl_ref_alt_cov( ref_file, alt_file, cov_file, gene == "atp6" | gene == "crt" ) data
Let us say we want to create a function mutation_prevalence
which determines
the prevalence of each mutation in the dataset. We additionally wanted to plot
this function.
prevalence <- mutation_prevalence(data, 5) prevalence
While creating this function may require some thought, it makes intuitive sense
that the result will contain four columns as seen above. Two of these columns,
n_total
and n_mutant
, are used to compute the final column, prevalence
.
Therefore, if we wanted to visualize or data, it is not very important for us
to consider the two columns n_total
and n_mutant
. Rather, we care about
the mutation_name
and the prevalence
.
If we were to create an function, plot_prevalence
, the easiest way to code it
would be to give the function a data
argument:
plot_prevalence <- function(data) { # code for plotting goes here }
Notice that in the given format, we could feed any dataset into our function.
In other words, there are no checks to make sure we are feeding in the output
of mutation_prevalence()
. There are a couple of strategies we could employ to
solve this issue.
First, we could check that some key columns exist.
plot_prevalence <- function(data) { if (!"mutation_name" %in% colnames(data)) { stop("Missing key column!", call. = FALSE) } # code for plotting goes here }
Another strategy would be to use classes! If we assigned a class mut_prev
to
the output of mutation_prevalence()
, we could easily check if the input is of
type mut_prev
.
plot_prevalence <- function(data) { if (!inherits(data, "mut_prev")) { stop("Wrong class input!", call. = FALSE) } # code for plotting goes here }
Now while using classes to solve this issue might seem like a bit of overkill.
After all, why create an entire new class when you can just have an if()
statement? The real power of object-oriented programming is the ability to use
polymorphism^[The term polymorphism has been taken from Advanced
R.]. What we mean by this is the
ability to use the same function for many types of input. An example of this
behavior is the base R function print()
which behaves differently depending on
what it is printing.
print(c(1, 2)) print(tibble::as_tibble(c(1, 2)))
Using OOP, the developer can customize how certain functions behave on certain
objects. Revisiting our mut_prev
class, we could even change the way the table
is printed! Another common example is plotting a dataset. Say we have developed
a package that introduces three different type of datasets. We can then create
a plot method for each type of class.
plot.class1 <- function(x) { # plotting code } plot.class2 <- function(x) { # plotting code } plot.class3 <- function(x) { # plotting code }
Revisiting our class, mut_prev
, when we call plot()
it creates a custom plot
specific to only our class!
plot(prevalence)
There is a lot more to the world of OOP and many important packages leverage various OOP techniques. In fact, in R, there are even multiple OOP systems. For example, the Tidyverse is built using the S3 system whereas the Bioconductor project uses S4. Of all the available systems, S3 is regarded as the easiest system to learn.
To learn more about OOP, I would highly recommend reading the OOP chapter of
Advanced R and visiting some of the articles
in the {vctrs}
package. The
code for this post can be found in my {s3examples}
package where I have
written a simple S3 subclass of the tibble()
subclass.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.