library(knitr) knitr::opts_chunk$set( comment = "#>", error = FALSE, tidy = FALSE ) library(vartors)
To be useful, vartors need a simple database. Simple databases are defined as a single table database, with one variable by column, one observation by line and the name of each variable in the first line (header).
As a example, we will use the bad_database.csv
given in the vartors package.
# Load the database raw_data <- read.csv(file = paste0(path.package("vartors"),"/examples/bad_database.csv"))
This database have 10 variables of differents types and 100 observations. It seem's to be OK but if we check the class of each variable, there is some troubles.
str(raw_data)
We observe these issues :
birth
was reconized as a factor but in reality, it's a data. read.csv
don't detect dates.height
and weight
were recognized as factor too but are numerics. It's because there is multiple definition for missing data (NA, ?, empty cell and some comments)study_levels
Q1
and Q2
initial
was recognized as a factor but in reality, must be a character. It's because read.csv
have the argument stringsAsFactors = TRUE
by default.If you want to import this dataframe properly in R you will have to transform manually each variable, for example by writing a script like this
clean_data <- raw_data clean_data$initial <- as.character(raw_data$initial) clean_data$birth <- as.Date(raw_data$birth, format = "%Y-%m-%d")
And this for each variable. Here it's easy because there is only 10 variables but it become quickly boring, time consuming and error prone for 50 variables. Furthermost, we have no information about the labels for study_levels, Q1 and Q2. Then we need more information about these variables.
descvars_skeleton
The idea is to be explict about each variable. To achieve this, we could create a variable description table. vartors have the function descvars_skeleton
to help you to create a skeleton of this a variables description table.
# Because limits in detection of factors not implemented yet raw_data$birth <- as.character(raw_data$birth) raw_data$initial <- as.character(raw_data$initial) raw_data$height <- as.character(raw_data$height) raw_data$weight <- as.character(raw_data$weight) raw_data$siblings <- as.character(raw_data$siblings)
library(vartors) desc_skeleton <- descvars_skeleton(raw_data) kable(desc_skeleton[,1:12])
Now, you have to ask the person who give you the database to explain each variable and fill this description of variable table. Just edit this, by using for example edit
desc_complete <- edit(desc_skeleton)
or in a more handy way, by saving the data.frame in .csv
use a spreadsheet
software like LibreOffice. This way, you should send it to the person who send you this database and ask him to fill it or do it with him.
write.csv(desc_skeleton, file = "variables_description.csv")
Fulfill this file is the most time consumming part in the vartors process but normaly if the database was well formated, it's easy to do it.
import_vardesc
The next step is to import this table with variables description to a format that vartors should handle.
Import the table in R as a dataframe
# Path to csv in the vartors package. # It's a specific case. In real usage, use the path to your file instead path_to_vardesc <- paste0(path.package("vartors"), "/examples/variables_description_bad_database.csv") # Import the csv complete_vardesc <- read.csv(file = path_to_vardesc)
The result is show below in two parts
Complete variable description, first 8 columns :
kable(complete_vardesc[, 1:8])
Complete variable description, first 2 columns and last columns :
kable(complete_vardesc[, c(1:2,9:17)])
This way each variable is explicit. Note that you should use the type not_used in order to discard variable.
Then you have to transform this data.frame
to a DatabaseDef
object, which could be understood by vartors. To to this, use import_vardef
suppressWarnings( database_def_object <- import_vardef(complete_vardesc) )
If you don't suppress the warnings, you will show some message that's say your rnames are not perfect but will work.
If you want, you could show this object.
database_def_object
You see that import_vardef
parsed the table of variable definition. For example if you don't give rname in your table of variable definition, it will find one by reading the varlabel column or originalname if there is no varlabel.
create_script
It's time to create a script with this. Just use the create_script
method
simple_script <- create_script(var_desc = database_def_object)
That's that simple! You have a script you can explore
simple_script
and you can write it in the this script in a file
write_file(object = simple_script, filepath = "my_import_script1.R")
Once you have your table with variables definitions loaded in a data.frame, it's possible to do all the process in a single line.
Remember, before we just imported it to a data.frame called complete_vardesc.
write_file(create_script(var_desc = complete_vardesc), filepath = "my_import_script1.R")
import_template
One of the strength of vartors, is its template system. Just before we created a script in R without selecting a template, the the create_script
function choosed the default one.
But maybe you want it in .Rmd and then produce a report using knitr
?
And let's say you are a french user and you want a template in french.
To see what are the builtin template available, read the documentation of import_template
function
?import_template
In the Details section, one can show there is a template that should match our needs template_fr.Rmd. To import it, use import_template
function and put the name of the builtin template in the builtin argument
rmd_template <- import_template(builtin = "template_fr.Rmd")
Then recreate a script with this template
rmd_script <- create_script(var_desc = database_def_object, template = rmd_template)
And you have your script in Rmd !
rmd_script
export_template
If you don't find any template that feet your need in the builtin ones, create one!
Template are just R
and Rmd
files with some delimiters. More information about how to produce your template in the documentation
?template
Basicaly, the idea is to export a builtin template and change it. To to this, use export_template
. For example, if you want to modify the preceding template, export it with :
export_template(builtin = "template_fr.Rmd", to = "mytemplate.Rmd")
When you think your template fit your needs, import it with import_template
.
my_template <- import_template(path = "mytemplate.Rmd")
and then use it to produce your script
create_script(var_desc = database_def_object, template = my_template)
The job of vartors ends with the script skeleton creation. We choose to produce a script with this package and not directly the cleaned database to give to the user the availabilty to perfecly adapt the importation phase to his needs. Then what to do with this script?
knitr
to produce a report in HTML, PDF or other format.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.