assertable Template"

library(assertable); library(data.table)

Data

We will use the CO2 dataset, which has 64 rows and 5 columns of data from an experiment related to the cold tolerances of plants. First, we take in the CO2 dataset and save the whole dataset three times into three separate csv files as data/file_#.csv, with a unique id_var.

for(i in 1:3) {
  data <- CO2
  data$id_var <- i
  write.csv(data,file=paste0("../data/file_",i,".csv"),row.names=F)
}

File Check and Import

First, use check_files to make sure the files exist. We can use the system.file command to locate them within the assertable package. Then, run import_files to bring them in. We'll call the combined data object master_data.

filenames <- paste0("file_",c(1:3),".csv")
filenames <- system.file("extdata", filenames, package = "assertable")

filenames
check_files(filenames)
master_data <- import_files(filenames,FUN=fread)
head(master_data)

Checking Dimensions

This dataset should have 84 * 3 rows, and six columns: Plant, Type, Treatment, conc, uptake, and id_var.

assert_nrows(master_data,(84*3))
assert_colnames(master_data,c("plant","type","treatment","conc","uptake","id_var"))

Oops, forgot to capitalize the column names. Trying again.

assert_nrows(master_data,(84*3))
assert_colnames(master_data,c("Plant","Type","Treatment","conc","uptake","id_var"))

Checking IDs

We believe the dataset should be unique by Plant, conc, and id_var (where id_var just represents the replication number of the dataset). Let's check this.

plants <- unique(master_data$Plant)
concs <- unique(master_data$conc)
id_vars <- unique(master_data$id_var)

id_list <- list(Plant=plants, conc=concs, id_var=id_vars)
assert_ids(master_data,id_list)

Now, let's make sure that there are only two values in Type: Quebec and Mississippi. Let's also make sure that uptake and conc are more than 0 and less than 1500.

assert_values(master_data, colnames = "Type", test="in", test_val = c("Quebec","Mississippi"))
assert_values(master_data, colnames = c("uptake","conc"), test="gt", test_val = 0)
assert_values(master_data, colnames = c("uptake","conc"), test="lt", test_val = 1500)

Finally, let's assert that all values of conc must be at least 6 times the value of uptake

assert_values(master_data, colnames = "conc", test="gt", test_val = master_data$uptake * 6)

Bummer. Let's finally do some subsetting of our data.

new_data <- master_data[master_data$Type == "Quebec" & master_data$Plant %in% c("Qn2","Qn3") & uptake > 20,]

Now, let's see if our values of concs can uniquely identify our observations.

assert_ids(new_data, list(Plant=c("Qn2","Qn3"), conc=concs))

Rough, let's take 95 out of our concs level and try it again.

new_concs <- c(175,250,350,500,675,1000)
assert_ids(new_data, list(Plant=c("Qn2","Qn3"),conc=new_concs))

Let's first get the actual rows and look at them.

new_concs <- c(175,250,350,500,675,1000)
vetting_data <- assert_ids(new_data, list(Plant=c("Qn2","Qn3"),conc=new_concs), 
                           ids_only=F, warn_only=T)
print(head(vetting_data))

Hmm, we forgot to include the values of id_var in the actual id_vars argument. Now, let's try it the last time with the id_var character vector included.

new_concs <- c(175,250,350,500,675,1000)
assert_ids(new_data, list(Plant=c("Qn2","Qn3"), conc=new_concs, id_var=id_vars))

Awesome! Now you're a data wizard!



Try the assertable package in your browser

Any scripts or data that you put into this service are public.

assertable documentation built on Jan. 27, 2021, 9:07 a.m.