process_data: Process the data
In EQUALPrognosis: Analysing Prognostic Studies

View source: R/process_data.R

process_data

R Documentation

Process the data

Description

This takes a dataset and the metadata for the dataset and creates R data frames in a format required for the subsequent steps.

Usage

process_data(data_file_path, metadata_file_path)

Arguments

`data_file_path`	Path to the dataset
`metadata_file_path`	Path to the metadata file

Details

The metadata should contain the following information as a minimum. variable: this is the name of the variable and should match the column names of the dataset. data_type: 'numerical' for continuous variables, 'count' for count variables, 'binary for binary categorical variables, 'nominal' for unordered categorical variables with more than 2 levels, 'ordinal' for ordered categorical variables, 'date' for variables stored as date, and 'time' for variables stored containing the time of the day.

Optional information includes the following. reference: Reference category for binary and nominal variables. This should be a category existing in the variable. ordinal levels: the levels of ordinal data from lower to higher order, separated by ";". This must include all the levels in the data.

You can use guess_data_types as a starting point for the metadata, which is included in the output list of the guess_data_types function.

Value

`outcome`	Whether the operation was successfully performed
`message`	Any information, particularly when the operation fails.
`data_processed`	The data which has been modifed according to the metadata when correct parameters are provided

.

`any_type`	All fields.
`quantitative`	Fields recognised as quantitative.
`numerical`	Fields recognised as continuous.
`count`	Fields recognised as count.
`categorical`	Fields recognised as categorical data.
`nominal`	Fields recognised as nominal data
`binary`	Fields recognised as binary data.
`ordinal`	Fields recognised as ordinal data.
`date`	Fields recognised as date.
`time`	Fields recognised as time.

Author(s)

Kurinchi Gurusamy

Examples

library(survival)
# Use the dataset colon as example
# Select only the survival for these examples (etype == 2)
data_file_path <- paste0(tempdir(), "/df.csv")
write.csv(colon[colon$etype == 2, ], data_file_path, row.names = FALSE, na = "")
metadata <- {data.frame(
  variable = c("id","study","rx","sex","age",
               "obstruct","perfor","adhere","nodes","status",
               "differ","extent","surg","node4","time",
               "etype"),
  data_type = c("nominal", "nominal", "nominal", "binary", "numerical",
                "binary", "binary", "binary", "count", "binary",
                "ordinal", "ordinal", "binary", "binary", "numerical",
                "nominal"),
  reference = c(NA, NA, "Obs", 0, NA,
                0, 0, 0, NA, 0,
                NA, NA, 0, 0, NA,
                NA),
  ordinal_levels = c(NA, NA, NA, NA, NA,
                     NA, NA, NA, NA, NA,
                     "1;2;3", "1;2;3;4", NA, NA, NA,
                     NA),
  comments = NA
)}
metadata_file_path <- paste0(tempdir(), "/metadata.csv")
write.csv(metadata, metadata_file_path, row.names = FALSE, na = "")
processed_data <- process_data(data_file_path, metadata_file_path)

EQUALPrognosis documentation built on Feb. 4, 2026, 5:15 p.m.