Codebook Demo

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%"
)
ggplot2::theme_set(ggplot2::theme_bw())
set.seed(8675309)
library(faux)

Codebooks in faux follow the Psych-DS 0.1.0 format, which is still in development.

Simulated Data

When you simulate data with sim_design, the codebook function reads some parameters of the design.

pet_data <- sim_design(
  between = list(pet = c(cat = "Cat Owners", 
                         dog = "Dog Owners")),
  n = c(4, 6),
  dv = list(happy = "Happiness Score"),
  id = list(id = "Subject ID"),
  mu = c(10, 12),
  sd = 4,
  plot = FALSE
)

You can set up a codebook with the function codebook(). If you don't specify the name, it defaults to the variable name (pet_data).

cb <- codebook(pet_data)

If you just type the codebook object into the console, you'll see the info in JSON format like this.

cb

If you set return to "list", you get the codebook in an R list that prints like this.

cb <- codebook(pet_data, return = "list")
cb

But the codebook is actually a nested list formatted like this:

str(cb)

Variable parameters

Above you saw messages about the data type that codebook guesses for each column. You can override this by setting the values manually. Below, we'll create a new column called followup consisting of 0 and 1 values, change the data type of the column from integer to boolean (T/F) and also set descriptions for id, pet and followup. The id column had a description of "Subject ID" from the sim_design function, but properties set in using vardesc will override this. You can also add unobserved levels to a factor.

pet_data$followup <- sample(0:1, nrow(pet_data), TRUE)

vardesc <- list(
  dataType = list(followup = "b"),
  description = c(id = "Pet ID",
                  pet = "Pet Type",
                  followup = "Followed up 2 weeks later"
                  ),
  levels = list(pet = c(cat = "Cat Owners",
                        dog = "Dog Owners", 
                        ferret = "Ferret Owners"),
                followup = c("0" = "No", "1" = "Yes")
            )
)
cb <- codebook(pet_data, name = "pets", vardesc, return = "list")

cb

You can change the data type of a column in the codebook, but this won't affect the data itself unless you set the return argument to "data". This runs type conversion on each column and gives you a warning if type can't be converted.

Note how we had to change the levels for the variable followup because we're converting them to boolean (logical) values and how the names in the levels vector have to be strings.

vardesc$levels$followup <- c("FALSE" = "No", "TRUE" = "Yes")
converted_data <- codebook(pet_data, "pets", vardesc, return = "data")

head(converted_data)

The codebook is attached to the returned converted data as an attribute and can be accessed as follows.

attr(converted_data, "codebook")

You can set other variable parameters than name, type, description, and levels. The variable parameters that Psych-DS currently supports are: "description", "privacy", "dataType", "propertyID", "minValue", "maxValue", "levels", "ordered", "na", "naValues", "alternateName", and "unitCode". See the technical specs for descriptions of these properties. You can add custom parameters, but will get a warning.

cb <- codebook(pet_data, vardesc = list(new_param = c(id = "YES")))

If you have a column that is an ordered factor, the codebook will look like this:

dat <- data.frame(
  initial = sample(LETTERS, 10)
)

dat$initial <- factor(dat$initial, levels = LETTERS, ordered = TRUE)

alevels <- paste("The letter", LETTERS)
names(alevels) <- LETTERS

codebook(dat, vardesc = list(levels = list(initial = alevels)), return = "list")

Dataset Parameters

You can add extra parameters to the whole data set. Psych-DS supports the following: "license", "author", "citation", "funder", "url", "sameAs", "keywords", "temporalCoverage", "spatialCoverage", "datePublished", "dateCreated". As with variable parameters, you can add custom parameters and will just get a warning.

codebook(pet_data, license = "CC-BY 3.0", author = "Lisa DeBruine", source = "faux")

You can also add parameters as lists.

dat <- sim_design(plot= FALSE)

author_list <- list(
  list(
    "@type" = "Person",
    "name" = "Melissa Kline"
  ),
  list(
    "@type" = "Person",
    "name" = "Lisa DeBruine"
  )
)


codebook(dat, return = "list", 
         author = author_list,
         keywords = c("test", "demo"))

Existing Data

You can run the codebook function on existing data not created in faux, but will need to manually input column descriptions and factor levels.

vardesc <- list(
  description = list(
    mpg = "Miles/(US) gallon",
    cyl = "Number of cylinders",
    disp = "Displacement (cu.in.)",
    hp = "Gross horsepower",
    drat = "Rear axle ratio",
    wt = "Weight (1000 lbs)",
    qsec = "1/4 mile time",
    vs = "Engine",
    am = "Transmission",
    gear = "Number of forward gears",
    carb = "Number of carburetors"
  ),
  # min and max values can be set manually or from data
  # min and max are often outside the observed range
  minValue = list(mpg = 0, cyl = min(mtcars$cyl)),
  maxValue = list(cyl = max(mtcars$cyl)),
  dataType = list(
    cyl = "integer",
    hp = "integer",
    vs = "integer",
    am = "integer", 
    gear = "integer",
    carb = "integer"
  ),
  # supply levels to mark factors
  levels = list(
    vs = c("0" = "V-shaped", "1" = "straight"),
    am = c("0" = "automatic", "1" = "manual")
  )
)

codebook(mtcars, "Motor Trend Car Road Tests", 
         vardesc, return = "list")

Interactive Editing

There is an experimental argument to edit the codebook interactively. It runs the codebook function, then asks you to confirm types and edit descriptions. Only run this in the console, not in an RMarkdown file or a script meant to be run non-interactively.

cb <- codebook(mtcars, interactive = TRUE)


Try the faux package in your browser

Any scripts or data that you put into this service are public.

faux documentation built on April 20, 2023, 9:13 a.m.