Take a look at the following two main.R versions:
# main.R ## Load house prices data temp_env <- new.env() load(file = usethis::proj_path("data", "train_set", ext = "rda"), envir = temp_env) data <- temp_env$train_set rm(temp_env) ## Plot important amenities par(mfrow = c(1,2)) plot(data$mpg , data$cyl, type = "p") boxplot(mpg ~ cyl, data = data)
# main.R data <- load_house_prices_data() plot_important_amenities(data)
Both code snippets have the same intent: they load the house prices dataset and provide plots for data exploration. Notice how much cognitive load the first snippet requires as the human brain compiles the code. The situation aggravates further if the reader is not familiar with the R syntax. In contrast, the second snippet hides the implementation details by wrapping the details in functions. The high-level abstractions communicate that there are two events happening in main.R: loading and plotting of data. As a result, the code is simpler to read and understand.
load_mtcars_data <- function(){ mtcars <- datasets::mtcars return(mtcars) }
Furthermore, the second snippet is easier to maintain and develop. These
qualities are desirable in any software application. This is because software
systems evolve as programmers acquire new knowledge and understanding of the
problem the software is set to solve. Importantly, analytic applications are the
result of scattershot and serendipitous explorations. As data scientist discover
new findings and signals, they incorporate them in the analytic application. For
example, plot_important_attributes
original implementations is:
plot_important_attributes <- function(data){ par(mfrow = c(1,2)) plot(data$mpg , data$cyl, type = "p") boxplot(mpg ~ cyl, data = data) }
Imagine a data scientist discovers, whether by client feedback or other mean,
that there is another important attribute to include in the data analysis.
Moreover, to reduce confusion, the data scientist decides to modify the plots
aesthetics such that they contain titles. Then, plot_important_attributes
mutates to:
plot_important_attributes <- function(data){ par(mfrow = c(1,3)) plot(data$mpg , data$hp, type = "p", main = "MPG ~ Horsepower") plot(data$mpg , data$cyl, type = "p", main = "MPG ~ Cylinders") boxplot(mpg ~ cyl, data = data, main = "MPG ~ Cylinders") }
With encapsulation, the data scientist was able to modify and extend the rendered plots without making any changes in main.R.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.