freezr automates irritating aspects of data analysis project organization

Data analysis projects require moderately complex organization, which can be daunting, especially for newcomers to the field. To make a project reproducible, the data analyst incurs even more overhead. Every time a result merits recording, their future self (or a colleague taking over) will need more than just the current graphical or text output. They'll need, at a minimum, working code and notes about the analysis. Better still would be to save package versions and other inputs or dependencies.

(examples of daunting project organization:
example 1, example 2, example 3 )

Fortunately, much of this extra work can be automated.

freeze helps record analyses for later

First, some preliminaries: this will create some fake code that you can pretend you've written.

fake_proj = file.path( getwd(), "fake_project" )
dir.create(fake_proj)
dir.create(file.path( fake_proj, "results" ))
cat("my_fun = function(x) plot(x)", file = file.path( fake_proj, "my_functions.R" ) )
cat("my_fun(1:5);print('blah')", file = file.path( fake_proj, "my_script.R" ) )

Now let's freeze it.

library(freezr)
freeze( analyses_to_run = c( "my_functions.R", "my_script.R" ),
        destination = file.path( fake_proj, "results" ) )

In this line of code, freeze has:

Now you can go add notes to the file it created, and then you're free to keep iterating on your analysis.

inventory_ functions help track results and move them downstream

The problem with freeze is that your results folder ends up having both "spaghetti" issues (time-stamped folders everywhere with small variations on the same analysis and unclear relationships) and "lasagna" issues (results buried two or three folders deeper than where you would save them by hand). The inventory_ functions help manage this big ol' scrap-heap. They let you highlight important results, assigning them short tags. Later on, you can retrieve the path to a key file or folder by referring to its tag.

Note that the inventory is basically a list that helps keep track of files. It does not contain any files -- they stay wherever you put them originally. The only inventory_-family function that actually touches files is inventory_transfer, which copies them. Every other inventory_ function only manipulates a list that keeps track of their locations.

I recommend one or two inventory_add calls per freeze call. Zero inventory calls is appropriate if nothing will ever happen downstream of that frozen analysis.

Let's move into another demo folder.

inv_demo_path = file.path(fake_proj, "inventory_demo")
dir.create(path = inv_demo_path)

Imagine you have a bunch of crap built up, including the alphabet.

ugly_folders = paste0( "long_ugly_results_folder_2017JAN11", 1:2)
ugly_folders = file.path( inv_demo_path, ugly_folders )
lapply( ugly_folders, dir.create )
write.table(letters, file.path(ugly_folders[1], "alphabet.txt"))

You're about to add Spanish characters, and you wanted to easily retrieve the alphabet. So you put it in the inventory with an easy-to-remember tag and some notes to yourself.

inventory_add( inv_location = inv_demo_path, 
               tag = "abc",
               force = T,
               filename = file.path(ugly_folders[1], "alphabet.txt"),
               extra = "This is just the alphabet.")
inventory_show( inv_location = inv_demo_path )

Then you fetch it from the inventory and save a new version somewhere else.

old_alpha_file = inventory_get( inv_location = inv_demo_path,
                                tag = "abc")
old_alpha = read.table(old_alpha_file, stringsAsFactors = F)[[1]]
write.table(sort(c(old_alpha, "rr", "ch", "ll", "ñ")), file.path(ugly_folders[2], "alphabet.txt"))

You can add a new record with language awareness and delete the old one:

inventory_add( inv_location = inv_demo_path,
               tag = "abc_eng",
               force = T,
               filename = file.path(ugly_folders[1], "alphabet.txt"),
               extra = "This is the English alphabet.")
?inventory_rm
inventory_rm( inv_location = inv_demo_path,
              tag = "abc" )
inventory_show( inv_location = inv_demo_path )

Finally, you add a record for the Spanish alphabet. Using a parent_tag, you can include the fact that the construction of the Spanish alphabet version depended on the English alphabet version.

inventory_add( inv_location = inv_demo_path,
               tag = "abc_esp",
               parent_tag = "abc_eng",
               filename = file.path(ugly_folders[2], "alphabet.txt"),
               force = T,
               extra = "This is the Spanish alphabet.")
inventory_show( inv_location = inv_demo_path )

You can check that your inventory points to files that actually exist.

?inventory_check
inventory_integrity_data = inventory_check(inv_location = inv_demo_path)

You can also transfer the inventory to a new location.

?inventory_transfer
inventory_transfer(inv_location = inv_demo_path,
                   target_location = file.path(inv_demo_path, "..", "newly_copied_inv"))


ekernf01/freezr documentation built on Feb. 8, 2022, 5:22 a.m.