An introduction to formhub.R ========================================================
formhub.R makes is easy to download and work with datasets on formhub. After downloading, formhub.R post-processes your dataset to convert the different columns to the correct type, which it derives from the type
you specified during the creation of your XLSform. It is distributed as an R package called formhub
which is not in CRAN yet, and can be installed in the following way:
install.packages("devtools") library(devtools) install_github("formhub.R", username="SEL-Columbia")
The install_github
line will need to be re-run every time you need to update the package, which will be frequent for now, as the package is in early testing. After installation, it can be loaded like you load any other R package:
library(formhub)
At this point, we should be ready to get started, and use some of the formhub functions. Likely the most useful, and the most basic, one is called formhubDownload
. Try typing in help(formhubDownload)
in your R terminal to see what it does. We'll use it to download the good_eats
form from mberg's account in formhub, which is a public dataset and doesn't require a password. (To download data from an account with a password, simply pass it along as the third parameter).
good_eats <- formhubDownload("good_eats", "mberg")
Question: what kind of beast did we just download?
str(good_eats)
R tells us something like 'data.frame': 78 obs. of 19 variables:
as well as Formal class 'formhubData' [package ".GlobalEnv"] with 5 slots
. What this means is that formhubData objects can be dealt with data.frames (which makes them very convenient!) and well as "objects" with more properties (such as form
, which is derived from your XLSform). The form
gives formhub.R information about the exact question that was asked, and the type of the question asked (was it text
or select one
? or was it a date
?), which lets the library change the types of the values to make them right, which is basically the power of this package.
For simplicity, if you want just a data frame and not this complicated formhubData object, you can always use the data.frame
method.
good_eats_pure_data_frame <- data.frame(good_eats)
So the part where R downloaded your data for you was pretty cool. But there is more to the formhubDownload
function than just downloading. In the background, the types of each of the columns is converted according to how the data was collected.
# lets inspect the types of the first 10 columns of our downloaded data str(data.frame(good_eats)[1:10])
Notice that submit_data
and submit_date
, both of which were today
(ie, date) questions in your form, are converted to POXIXct
, which is R's date type. What does this mean? That means that we can do date-time calculations, for example, to check how long mberg has been collecting data:
max(good_eats$submit_data) - min(good_eats$submit_date)
Over a year... awesome!
Similiarly, things like select one
, imei
, and others are converted to factors, integers
and decimals
to numbers. Lets see how this compares with if we had simply just read the file as a csv without any type conversions:
good_eats2 <- read.csv("~/Downloads/good_eats_2013_05_05.csv") # lets inspect the types of the first 10 columns of our downloaded data str(good_eats2[1:10])
Everything is a factor! Why is that bad? Well, see the plots below for yourself:
# install.packages("ggplot2") if you don't have ggplot2 installed yet library(ggplot2) qplot(data=good_eats2, x=amount) # from data read in without formhub.R qplot(data=good_eats, x=amount) # from data read in using formhub.R
Okay, hopefully by now, you are sold on the usefulness of formhub.R, and see some value in it. Since this is a "basics of" document, I'll end by describing a couple of other high-level functions in formhub.R (lower-level functions will be documented over time).
formhubDownload
-- download data directly from formhub by passing form name, username, and password for private dataformhubRead
-- create a formhubData object from pre-downloaded files. The first file argument is the csv file, the second is the form.json file (which you can download from the form page on formhub). Note: unexpected things will happen if the files aren't the right ones. See the full documentation by using help(formhubRead)
.replaceHeaderNamesWithLabels
-- get a version of the data where the header row is re-written as the actual question asked.And thats really the gist of it!
This is software that has been tested by only a couple of use cases so far, and writing good code in R is pretty tricky, so there are probably bugs! If you encounter one, please go to your form page, and under "Sharing", give the username "prabhasp" "View" privileges, and file an issue on github
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.