| process_data | R Documentation |
This takes a dataset and the metadata for the dataset and creates R data frames in a format required for the subsequent steps.
process_data(data_file_path, metadata_file_path)
data_file_path |
Path to the dataset |
metadata_file_path |
Path to the metadata file |
The metadata should contain the following information as a minimum. variable: this is the name of the variable and should match the column names of the dataset. data_type: 'numerical' for continuous variables, 'count' for count variables, 'binary for binary categorical variables, 'nominal' for unordered categorical variables with more than 2 levels, 'ordinal' for ordered categorical variables, 'date' for variables stored as date, and 'time' for variables stored containing the time of the day.
Optional information includes the following. reference: Reference category for binary and nominal variables. This should be a category existing in the variable. ordinal levels: the levels of ordinal data from lower to higher order, separated by ";". This must include all the levels in the data.
You can use guess_data_types as a starting point for the metadata, which is
included in the output list of the guess_data_types function.
outcome |
Whether the operation was successfully performed |
message |
Any information, particularly when the operation fails. |
data_processed |
The data which has been modifed according to the metadata when correct parameters are provided |
.
any_type |
All fields. |
quantitative |
Fields recognised as quantitative. |
numerical |
Fields recognised as continuous. |
count |
Fields recognised as count. |
categorical |
Fields recognised as categorical data. |
nominal |
Fields recognised as nominal data |
binary |
Fields recognised as binary data. |
ordinal |
Fields recognised as ordinal data. |
date |
Fields recognised as date. |
time |
Fields recognised as time. |
Kurinchi Gurusamy
guess_data_types
library(survival)
# Use the dataset colon as example
# Select only the survival for these examples (etype == 2)
data_file_path <- paste0(tempdir(), "/df.csv")
write.csv(colon[colon$etype == 2, ], data_file_path, row.names = FALSE, na = "")
metadata <- {data.frame(
variable = c("id","study","rx","sex","age",
"obstruct","perfor","adhere","nodes","status",
"differ","extent","surg","node4","time",
"etype"),
data_type = c("nominal", "nominal", "nominal", "binary", "numerical",
"binary", "binary", "binary", "count", "binary",
"ordinal", "ordinal", "binary", "binary", "numerical",
"nominal"),
reference = c(NA, NA, "Obs", 0, NA,
0, 0, 0, NA, 0,
NA, NA, 0, 0, NA,
NA),
ordinal_levels = c(NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA,
"1;2;3", "1;2;3;4", NA, NA, NA,
NA),
comments = NA
)}
metadata_file_path <- paste0(tempdir(), "/metadata.csv")
write.csv(metadata, metadata_file_path, row.names = FALSE, na = "")
processed_data <- process_data(data_file_path, metadata_file_path)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.