Preparing data for finalfit

    collapse = TRUE,
    comment = "#>"

This vignette shows you how to upload and prepare any dataset for use with finalfit. The demonstration will use the boot::melanoma. Use ?boot::melanoma to see the help page with data description. I will use library(tidyverse) methods. First I'll write_csv() the data just to demonstrate reading it.

Read data

Note the various options in read_csv(), including providing column names, variable type, missing data identifier etc.


# Save example
write_csv(boot::melanoma, "boot.csv")

# Read data
melanoma = read_csv("boot.csv")

Column types

Note the output shows how the columns/variables have been parsed. For full details see ?readr::cols().

Continuous data

Categorical data

Dates and times

Check data

ff_glimpse() provides a convenient overview of all data in a tibble or data frame. It is particularly important that factors are correctly specified. Hence, ff_glimpse() separates variables into continuous and categorcial. As expected, no factors are yet specified in the melanoma dataset.


If you wish to see the variables in the order in which they appear in the data frame or tibble, missing_glimpse() or tibble::glimpse() are useful.


Specify factors

Use an original description of the data (often called a data dictionary) to correctly assign and label any factor variables. This can be done in a single pipe.

melanoma %>% 
    status.factor = factor(status, levels = c(1, 2, 3), 
      labels = c("Died from melanoma", "Alive", "Died from other causes")) %>% 
    sex.factor = factor(sex, levels = c(1, 0),
      labels = c("Male", "Female")) %>% 
    ulcer.factor = factor(ulcer, levels = c(1, 0),
      labels = c("Present", "Absent")) %>% 
  ) -> melanoma


Everything looks good and you are ready to start analysis.

Try the finalfit package in your browser

Any scripts or data that you put into this service are public.

finalfit documentation built on Nov. 17, 2023, 1:09 a.m.