| ternP | R Documentation |
ternP() cleans a raw data frame loaded from a CSV or XLSX file,
applying a standardized set of transformations and performing validation
checks before the data is passed to ternG or
ternD.
ternP(data)
data |
A data frame or tibble as loaded from a CSV or XLSX file (e.g.
via |
A named list with three elements:
clean_dataA tibble containing the fully cleaned dataset,
ready to pass to ternG() or ternD().
sparse_rowsA tibble of rows from clean_data where
more than 50% of values are NA. These rows are retained
in clean_data but extracted here for optional review or download.
An empty tibble if no sparse rows exist.
feedbackA named list of feedback items. Each element is
NULL if the corresponding transformation was not triggered, or a
value describing what changed:
string_na_convertedA named list with elements
total (integer count of values converted) and cols
(character vector of affected column names), or NULL if no
string NA values were found.
blank_rows_removedA named list with elements
count (integer) and row_indices (integer vector of
original row positions removed), or NULL if none.
sparse_rows_flaggedA named list with elements
count (integer) and row_indices (integer vector of
row positions in clean_data with >50% missingness),
or NULL if none.
case_normalized_varsA named list with elements
cols (character vector of affected column names) and
detail (a named list per column, each with
changed_from and changed_to character vectors
showing the exact value changes), or NULL if none.
dropped_empty_colsCharacter vector of column names
(or "" for unnamed columns) that were dropped because they
were 100% empty, or NULL if none.
String NA values ("NA", "na", "Na",
"unk") are converted to NA.
Leading and trailing whitespace is trimmed from all character columns.
Columns that are 100% empty (all NA) are silently dropped.
Rows where every cell is NA are removed.
Character columns where values differ only by capitalization
(e.g. "Male" vs "MAle") are standardized to title case.
ternP() stops with a descriptive error if:
Any column name matches a protected health information (PHI) pattern
(e.g. MRN, DOB, FirstName). De-identified research
identifiers such as patient_id, subject_id, and
participant_id are explicitly excluded, as are clinical-event
dates (admission date, discharge date, visit date, etc.). Only
personal-identity dates such as DOB and DOD are flagged.
Any column with a blank or whitespace-only header contains data. Completely empty unnamed columns are silently dropped and do not trigger this error.
ternG for grouped comparisons, ternD for descriptive statistics.
# Load a messy CSV and preprocess it
path <- system.file("extdata/csv", "tern_colon_messy.csv",
package = "TernTables")
raw <- read.csv(path, stringsAsFactors = FALSE)
result <- ternP(raw)
# Access cleaned data
result$clean_data
# Review preprocessing feedback
result$feedback
# Sparse rows flagged (>50% missing), retained but not removed
result$sparse_rows
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.