acc_margins | R Documentation |
margins does calculations for quality indicator Unexpected distribution wrt location (link). Therefore we pursue a combined approach of descriptive and model-based statistics to investigate differences across the levels of an auxiliary variable.
CAT: Unexpected distribution w.r.t. location
Marginal means
Marginal means rests on model based results, i.e. a significantly different marginal mean depends on sample size. Particularly in large studies, small and irrelevant differences may become significant. The contrary holds if sample size is low.
Indicator
acc_margins(
resp_vars = NULL,
group_vars = NULL,
co_vars = NULL,
threshold_type = NULL,
threshold_value,
min_obs_in_subgroup = 5,
study_data,
meta_data,
label_col
)
resp_vars |
variable the name of the continuous measurement variable |
group_vars |
variable list len=1-1. the name of the observer, device or reader variable |
co_vars |
variable list a vector of covariables, e.g. age and sex for adjustment |
threshold_type |
enum empirical | user | none. In case empirical is chosen a multiplier of the scale measure is used, in case of user a value of the mean or probability (binary data) has to be defined see Implementation and use of thresholds. In case of none, no thresholds are displayed and no flagging of unusual group levels is applied. |
threshold_value |
numeric a multiplier or absolute value see Implementation and use of thresholds |
min_obs_in_subgroup |
integer from=0. optional argument if a "group_var" is used. This argument specifies the minimum no. of observations that is required to include a subgroup (level) of the "group_var" in the analysis. Subgroups with less observations are excluded. The default is 5. |
study_data |
data.frame the data frame that contains the measurements |
meta_data |
data.frame the data frame that contains metadata attributes of study data |
label_col |
variable attribute the name of the column in the metadata with labels of variables |
Limitations
Selecting the appropriate distribution is complex. Dozens of continuous, discrete or mixed distributions are conceivable in the context of epidemiological data. Their exact exploration is beyond the scope of this data quality approach. The function above uses the help function util_dist_selection which discriminates four cases:
continuous data
binary data
count data with <= 20 categories
count data with > 20 categories
Nonetheless, only three different plot types are generated. The fourth case is treated as continuous data. This is in fact a coarsening of the original data but for the purpose of clarity this approach is chosen.
a list with:
SummaryTable: data frame underlying the plot
SummaryData: data frame
SummaryPlot: ggplot2 margins plot
## Not run:
# runs spuriously slow on rhub
load(system.file("extdata/study_data.RData", package = "dataquieR"))
load(system.file("extdata/meta_data.RData", package = "dataquieR"))
acc_margins(resp_vars = "DBP_0",
study_data = study_data,
meta_data = meta_data,
group_vars = "USR_BP_0",
label_col = LABEL,
co_vars = "AGE_0")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.