library(knitr)
knitr::opts_chunk$set(
  comment = "#>",
  error = FALSE,
  tidy = FALSE
)

library(vartors)

A tutorial of vartors

Raw material and main problem : a simple database

To be useful, vartors need a simple database. Simple databases are defined as a single table database, with one variable by column, one observation by line and the name of each variable in the first line (header).

As a example, we will use the bad_database.csv given in the vartors package.

# Load the database
raw_data <- read.csv(file = paste0(path.package("vartors"),"/examples/bad_database.csv"))

This database have 10 variables of differents types and 100 observations. It seem's to be OK but if we check the class of each variable, there is some troubles.

str(raw_data)

We observe these issues :

If you want to import this dataframe properly in R you will have to transform manually each variable, for example by writing a script like this

clean_data <- raw_data
clean_data$initial <- as.character(raw_data$initial)
clean_data$birth <- as.Date(raw_data$birth, format = "%Y-%m-%d")

And this for each variable. Here it's easy because there is only 10 variables but it become quickly boring, time consuming and error prone for 50 variables. Furthermost, we have no information about the labels for study_levels, Q1 and Q2. Then we need more information about these variables.

Create a description of the variable

Create a skeleton with descvars_skeleton

The idea is to be explict about each variable. To achieve this, we could create a variable description table. vartors have the function descvars_skeleton to help you to create a skeleton of this a variables description table.

# Because limits in detection of factors not implemented yet
raw_data$birth <- as.character(raw_data$birth)
raw_data$initial <- as.character(raw_data$initial)
raw_data$height <- as.character(raw_data$height)
raw_data$weight <- as.character(raw_data$weight)
raw_data$siblings <- as.character(raw_data$siblings)
library(vartors)
desc_skeleton <- descvars_skeleton(raw_data)
kable(desc_skeleton[,1:12])

Now, you have to ask the person who give you the database to explain each variable and fill this description of variable table. Just edit this, by using for example edit

desc_complete <- edit(desc_skeleton)

or in a more handy way, by saving the data.frame in .csv use a spreadsheet software like LibreOffice. This way, you should send it to the person who send you this database and ask him to fill it or do it with him.

write.csv(desc_skeleton, file = "variables_description.csv")

Fulfill this file is the most time consumming part in the vartors process but normaly if the database was well formated, it's easy to do it.

Import the variable description with import_vardesc

The next step is to import this table with variables description to a format that vartors should handle.

Import the table in R as a dataframe

# Path to csv in the vartors package. 
# It's a specific case. In real usage, use the path to your file instead
path_to_vardesc <- paste0(path.package("vartors"),
                          "/examples/variables_description_bad_database.csv")
# Import the csv
complete_vardesc <- read.csv(file = path_to_vardesc)

The result is show below in two parts

Complete variable description, first 8 columns :

kable(complete_vardesc[, 1:8])

Complete variable description, first 2 columns and last columns :

kable(complete_vardesc[, c(1:2,9:17)])

This way each variable is explicit. Note that you should use the type not_used in order to discard variable.

Then you have to transform this data.frame to a DatabaseDef object, which could be understood by vartors. To to this, use import_vardef

suppressWarnings(
database_def_object <- import_vardef(complete_vardesc)
)

If you don't suppress the warnings, you will show some message that's say your rnames are not perfect but will work.

If you want, you could show this object.

database_def_object

You see that import_vardef parsed the table of variable definition. For example if you don't give rname in your table of variable definition, it will find one by reading the varlabel column or originalname if there is no varlabel.

Create the script with create_script

It's time to create a script with this. Just use the create_script method

simple_script <- create_script(var_desc = database_def_object)

That's that simple! You have a script you can explore

simple_script

and you can write it in the this script in a file

write_file(object = simple_script, filepath = "my_import_script1.R")

The fast way

Once you have your table with variables definitions loaded in a data.frame, it's possible to do all the process in a single line.

Remember, before we just imported it to a data.frame called complete_vardesc.

write_file(create_script(var_desc = complete_vardesc), filepath = "my_import_script1.R")

What about templates?

Choose a built-in template with import_template

One of the strength of vartors, is its template system. Just before we created a script in R without selecting a template, the the create_script function choosed the default one.

But maybe you want it in .Rmd and then produce a report using knitr? And let's say you are a french user and you want a template in french.

To see what are the builtin template available, read the documentation of import_template function

?import_template

In the Details section, one can show there is a template that should match our needs template_fr.Rmd. To import it, use import_template function and put the name of the builtin template in the builtin argument

rmd_template <- import_template(builtin = "template_fr.Rmd")

Then recreate a script with this template

rmd_script <- create_script(var_desc = database_def_object, template = rmd_template)

And you have your script in Rmd !

rmd_script

Create your own template with export_template

If you don't find any template that feet your need in the builtin ones, create one! Template are just R and Rmd files with some delimiters. More information about how to produce your template in the documentation

?template

Basicaly, the idea is to export a builtin template and change it. To to this, use export_template. For example, if you want to modify the preceding template, export it with :

export_template(builtin = "template_fr.Rmd", to = "mytemplate.Rmd")

When you think your template fit your needs, import it with import_template.

my_template <- import_template(path = "mytemplate.Rmd")

and then use it to produce your script

create_script(var_desc = database_def_object, template = my_template)

What to do with this script skeleton?

The job of vartors ends with the script skeleton creation. We choose to produce a script with this package and not directly the cleaned database to give to the user the availabilty to perfecly adapt the importation phase to his needs. Then what to do with this script?

  1. Run the script line by line to check if it work and adapt it when necessary and then save the well formated data.frame
  2. When you have a clean script, use tools like knitr to produce a report in HTML, PDF or other format.


jomuller/vartors documentation built on May 19, 2019, 7:26 p.m.