int_encoding_errors: Encoding Errors
In dataquieR: Data Quality in Epidemiological Research

int_encoding_errors

R Documentation

Encoding Errors

Description

Detects errors in the character encoding of string variables

Indicator

Usage

int_encoding_errors(
  resp_vars = NULL,
  study_data,
  label_col,
  meta_data_dataframe = "dataframe_level",
  item_level = "item_level",
  ref_encs,
  meta_data = item_level,
  meta_data_v2,
  dataframe_level
)

Arguments

`resp_vars`	variable the names of the measurement variables, if missing or `NULL`, all variables will be checked
`study_data`	data.frame the data frame that contains the measurements
`label_col`	variable attribute the name of the column in the metadata with labels of variables
`meta_data_dataframe`	data.frame the data frame that contains the metadata for the data frame level
`item_level`	data.frame the data frame that contains metadata attributes of study data
`ref_encs`	reference encodings (names are `resp_vars`)
`meta_data`	data.frame old name for `item_level`
`meta_data_v2`	character path to workbook like metadata file, see `prep_load_workbook_like_file` for details. ALL LOADED DATAFRAMES WILL BE PURGED, using `prep_purge_data_frame_cache`, if you specify `meta_data_v2`.
`dataframe_level`	data.frame alias for `meta_data_dataframe`

Details

Strings are stored based on code tables, nowadays, typically as UTF-8. However, other code systems are still in use, so, sometimes, strings from different systems are mixed in the data. This indicator checks for such problems and returns the count of entries per variable, that do not match the reference coding system, which is estimated from the study data (addition of metadata field is planned).

If not specified in the metadata (columns ENCODING in item- or data-frame- level, the encoding is guessed from the data). Otherwise, it may be any supported encoding as returned by iconvlist().

Value

a list with:

SummaryTable: data.frame with information on such problems
SummaryData: data.frame human readable version of SummaryTable
FlaggedStudyData: data.frame tells for each entry in study data if its encoding is OK. has the same dimensions as study_data