knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = FALSE, message = FALSE ) options(rmarkdown.html_vignette.check_title = FALSE)
<summarised_result>
formatThe <summarised_result>
format is a standard output defined in omopgenerics. The fact that it is standardised output make it a very powerful tool so multiple functions can export on the same format and built functionalities on top of it, as it can be seen in tables and plots vignettes. This standard output it can be some times hard to manipulate to do your custom analysis. visOmopResults
contains tools to tidy your <summarised_result>
object that are covered in this vignette.
<summarised_result>
visOmopResults
defines the method tidy for <summarised_result>
object, what this function does is to:
The <summarised_result>
object has the following pair columns: group_name-group_level, strata_name-strata_level, and additional_name-additional_level. These pairs use the &&&
separator to combine multiple fields, for example if you want to combine cohort_name and age_group in group_name-group_level pair: group_name = "cohort_name &&& age_group"
and group_level = "my_cohort &&& <40"
. By default if no aggregation is produced in group_name-group_level pair: group_name = "overall"
and group_level = "overall"
.
ORIGINAL FORMAT:
dplyr::tibble( group_name = c("cohort_name", c("cohort_name &&& sex"), c("sex &&& age_group")), group_level = c("acetaminophen", c("acetaminophen &&& Female"), c("Male &&& <40")) ) |> gt::gt()
The tidy format puts each one of the values as a columns. Making it easier to manipulate but at the same time the output is not standardised anymore as each <summarised_result>
object will have a different number and names of columns. Missing values will be filled with the "overall" label.
TIDY FORMAT:
dplyr::tibble( group_name = c("cohort_name", c("cohort_name &&& sex"), c("sex &&& age_group")), group_level = c("acetaminophen", c("acetaminophen &&& Female"), c("Male &&& <40")) ) |> visOmopResults::splitGroup() |> gt::gt()
<summarised_result>
object as columns:Each <summarised_result>
object has a setting attribute that relates the 'result_id' column with each different set of settings. The columns 'result_type', 'package_name' and 'package_version' are always present in settings, but then we may have some extra parameters depending how the object was created. So in the <summarised_result>
format we need to use these settings()
functions to see those variables:
ORIGINAL FORMAT:
settings
:
dplyr::tibble( result_id = c(1L, 2L), my_setting = c(TRUE, FALSE), package_name = "visOmopResults" ) |> gt::gt()
<summarised_result>
:
dplyr::tibble( result_id = c("1", "...", "2", "..."), cdm_name = c("omop", "...", "omop", "..."), " " = c("..."), additional_name = c("overall", "...", "overall", "...") ) |> gt::gt()
But in the tidy format we add the settings as columns, making that their value is repeated multiple times (there is only one row per result_id in settings, whereas there can be multiple rows in the <summarised_result>
object). The column 'result_id' is eliminated as it does not provide information anymore. Again we loose on standardisation (multiple different settings), but we gain in flexibility:
TIDY FORMAT:
dplyr::tibble( cdm_name = c("omop", "...", "omop", "..."), " " = c("..."), additional_name = c("overall", "...", "overall", "..."), my_setting = c("TRUE", "...", "FALSE", "..."), package_name = c("visOmopResults", "...", "visOmopResults", "...") ) |> gt::gt()
In the <summarised_result>
format estimates are displayed in 3 columns:
r omopgenerics::estimateTypeChoices()
.<character>
.ORIGINAL FORMAT:
dplyr::tibble( variable_name = c("number individuals", "age", "age"), estimate_name = c("count", "mean", "sd"), estimate_type = c("integer", "numeric", "numeric"), estimate_value = c("100", "50.3", "20.7") ) |> gt::gt()
In the tidy format we pivot the estimates, creating a new column for each one of the 'estimate_name' values. The columns will be casted to 'estimate_type'. If there are multiple estimate_type(s) for same estimate_name they won't be casted and they will be displayed as character (a warning will be thrown). Missing data are populated with NAs.
TIDY FORMAT:
dplyr::tibble( variable_name = c("number individuals", "age"), count = c(100L, NA), mean = c(NA, 50.3), sd = c(NA, 20.7) ) |> gt::gt()
Let's see a simple example with some toy data:
library(visOmopResults) result <- mockSummarisedResult() result |> tidy()
We have several functions to customise the tidy version of the <summarised_result>
object.
The functions split are provided independent:
splitGroup()
only splits the pair group_name-group_level columns.splitStrata()
only splits the pair strata_name-strata_level columns.splitAdditional()
only splits the pair additional_name-additional_level columns.There is also the function:
- splitAll()
that splits any pair x_name-x_level that is found on the data.
splitAll(result)
pivotEstimates()
can be used to pivot the variables that we are interested in.
The argument pivotEstimatesBy
specifies which are the variables that we want to use to pivot by, there are four options:
NULL/character()
to not pivot anything.c("estimate_name")
to pivot only estimate_name.c("variable_level", "estimate_name")
to pivot estimate_name and variable_level.c("variable_name", "variable_level", "estimate_name")
to pivot estimate_name, variable_level and variable_name.Note that variable_level
can contain NA values, these will be ignored on the naming part.
pivotEstimates( result, pivotEstimatesBy = c("variable_name","variable_level", "estimate_name") )
addSettings()
is used to add the settings that we want as new columns to our <summarised_result>
object.
The settingsColumns
argument is used to choose which are the settings we want to add.
addSettings( result, settingsColumns = "result_type" )
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.