knitr::opts_chunk$set( collapse = TRUE, warning = FALSE, message = FALSE, comment = "#>", out.width = "100%", fig.height = 6, fig.width = 8, fig.showtext = TRUE )
In Modeling and Simulation (M&S) workflows, we often need to work with multiple observed and/or simulated datasets. Some of these datasets can be linked together in a group (e.g., the simulated and the corresponding observed data) for analysis and visualization.
The DataCombined
class in {ospsuite}
provides a container to store
data, to group them, and to transform them (e.g. scale or offset). The
class accepts data coming from SimulationResults
or as a DataSet
object
(see Observed data).
Let's first generate some simulation results and load observed data.
library(ospsuite) # Simulation results simFilePath <- system.file("extdata", "Aciclovir.pkml", package = "ospsuite") sim <- loadSimulation(simFilePath) simResults <- runSimulations(sim)[[1]] outputPath <- "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)" # observed data obsData <- lapply( c("ObsDataAciclovir_1.pkml", "ObsDataAciclovir_2.pkml", "ObsDataAciclovir_3.pkml"), function(x) loadDataSetFromPKML(system.file("extdata", x, package = "ospsuite")) ) # Name the elements of the list according to the names of the loaded `DataSet`. names(obsData) <- lapply(obsData, function(x) x$name)
Typically, DataCombined
is used to create figures using the plotXXX()
functions.
The different plot types are described in Visualizations with DataCombined
.
To visualize the different functionalities of DataCombined
in this document,
we will use the plotIndividualTimeProfile()
function.
DataCombined
objectFirst, we create a new instance of DataCombined
class.
myDataCombined <- DataCombined$new()
Then we add simulated results to this object using $addSimulationResults()
:
myDataCombined$addSimulationResults( simulationResults = simResults, quantitiesOrPaths = outputPath, names = "Aciclovir Plasma", groups = "Aciclovir PVB" )
Next we add observed data to this object using $addDataSets()
:
myDataCombined$addDataSets( obsData$`Vergin 1995.Iv`, groups = "Aciclovir PVB" )
Every data, be it simulated results or from DataSet
, must have a unique name
within DataCombined
. If not specified by user, the path of simulated results
or the $name
property of the DataSet
are used as the name. Alternatively, we
can define the name when adding the data, as in the above example adding simulated
results.
There are a few things to keep in mind here:
DataSet
objects in $addDataSets()
method call.SimulationResults
in $addSimulationResults()
method call.Since the DataCombined
object can store many datasets, some of them may naturally form a grouping and you would wish to track this in the object. Both $addDataSets()
and $addSimulationResults()
allow group specification via groups
argument.
When being plotted, data sets without a grouping will appear with distinct color, line style (for simulated results), or symbol style (for observed data) with a separate entry in the legend, with the name of the data set being the legend entry.
Though it is possible to group data with different dimensions, it makes sense to
only group data sets that have the same dimension (or dimensions that can be
transformed into each other, like Amount
and Mass
) in order to be able to compare the
data. Grouping data sets with different dimensions most likely will result in an
error when trying to create plots or calculating residuals with such a DataCombined
.
Let's create a DataCombined
and add simulation results and observed data without
specifying their grouping and create a time profile:
myDataCombined <- DataCombined$new() myDataCombined$addSimulationResults( simulationResults = simResults, quantitiesOrPaths = outputPath, names = "Aciclovir Plasma" ) myDataCombined$addDataSets( obsData$`Vergin 1995.Iv` ) plotIndividualTimeProfile(dataCombined = myDataCombined)
If you do not specify groups
when you add datasets, and wish to update
groupings later, you can use the $setGroups()
method. All data within one
group will get one entry in the legend with the name of the group, and will be
plotted with the same color.
myDataCombined$setGroups( names = c("Aciclovir Plasma", obsData$`Vergin 1995.Iv`$name), groups = c("Aciclovir PVB", "Aciclovir PVB") ) plotIndividualTimeProfile(dataCombined = myDataCombined)
At any point, you can check the current names and groupings with the following active field:
myDataCombined$groupMap
Sometimes the raw data included in DataCombined
needs to be transformed using specified
offset and scale factor values.
This is supported via $setDataTransformations()
method, where you can specify the names of datasets, and offsets and scale factors. If these arguments are scalar (i.e., of length 1), then the same value will be applied to all specified names. Scale factors are unitless values by which raw data values will
be multiplied, offsets are added to the raw values without any conversion and must
therefore be in the same units as the raw data.
The internal data frame in DataCombined
will be transformed with the specified parameters, with the new columns computed as:
newXValue = (rawXValue + xOffset) * xScaleFactor
newYValue = (rawYValue + yOffset) * yScaleFactor
newErrorValue = rawErrorValue * abs(yScaleFactor)
At any point, you can check the applied offsets and scale factors with the following active field:
myDataCombined$dataTransformations
Now lets take a closer look on possible situations where you would apply any transformations. Consider the observed data from the examples above. It reports concentrations of aciclovir after single intravenous administration.
myDataCombinedTranformations <- DataCombined$new() myDataCombinedTranformations$addDataSets( obsData$`Vergin 1995.Iv` ) plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
However, we might want to use this data set with a simulation where aciclovir is administered 24 hours after simulation begin. To be able to compare simulation results with the data set, we can offset the observed data time by 24 hours. Keep in mind that the offset must be given in the same unit as the data set values are.
# Check the units of the observed time values obsData$`Vergin 1995.Iv`$xUnit myDataCombinedTranformations$setDataTransformations( forNames = obsData$`Vergin 1995.Iv`$name, xOffsets = 24 ) plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
In the next step, we want to normalize observed concentrations to a dose. We
can easily achieve this with the scale factor. In the next example, we normalize
observed values to 250 mg dose by setting the yScaleFactor
to 1/250:
myDataCombinedTranformations$setDataTransformations( forNames = obsData$`Vergin 1995.Iv`$name, yScaleFactors = 1 / 250 ) plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
Finally, offsetting the observation values might be useful when working with measurements of endogenous substrates, such as the hormone glucagon, and want to correct for the individual specific baseline levels of the hormone.
The data frame (also sometimes called as a table) data structure is central to R-based workflows, and, thus, we may wish to extract a data frame for datasets present in the object.
Internally, DataCombined
extracts data frames for observed and simulated datasets and combines them.
myDataCombined$toDataFrame()
This function returns a tibble data frame. If you wish to modify how it is printed, you can have a look at the available options here. In fact, let's change a few options and print the data frame again.
options( pillar.width = Inf, # show all columns pillar.min_chars = Inf # to turn off truncation of column titles ) myDataCombined$toDataFrame()
# change these settings only temporarily withr::with_options( list( pillar.width = Inf, # show all columns pillar.min_chars = Inf # to turn off truncation of column titles ), code = { print(myDataCombined$toDataFrame()) } )
{ospsuite}
also provides a few helper functions to modify the data frame further.
When multiple (observed and/or simulated) datasets are present in a data frame, they are likely to have different units. convertUnits()
function helps to convert them to a common unit. The function
will not modify the DataCombined
object but return a new data frame with unified
units.^[Note that if you are using DataCombined
object for plotting functions, you don't need to this conversion; the plotting functions will take care of this internally.].
convertUnits( myDataCombined, xUnit = ospUnits$Time$s, yUnit = ospUnits$`Concentration [mass]`$`µg/l` )
Grouping of simulated results with observed data inside DataCombined
also allows
to calculate the error between simulated results and observations using the
calculateResiduals()
function. The function calculates the distance between
each observed data point within the group and the simulated results. Simulation
results interpolated if no simulated value is available for a certain observed
time point. Observed data beyond simulated time is ignored, i.e., no extrapolation
is performed.
There are two ways to calculate the distance (residuals) - linear and logarithmic -
specified by the scaling
argument. For linear scaling (scaling = tlf::Scaling$lin
):
Residuals are calculated as: Simulation value - Observed value. This means that the residuals are defined by absolute differences.
For logarithmic scaling (scaling = tlf::Scaling$log
):
Residuals are calculated as: log(Simulation value) - log(Observed value) = log (Simulation Value / Observed Value). This means that the ratio of values is considered which is independent of the magnitude of the value. But for very small observed values, in particular close to 0 values, this can lead to problems, because log(10E-N) = -N can becomes large.
# Linear residuals calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues # Logarithmic residuals calculateResiduals(myDataCombined, scaling = tlf::Scaling$log)$residualValues
To quickly calculate the total error of the DataCombined
, one can sum up the
absolute values of the residuals:
# Linear residuals totalError <- sum(abs(calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues)) print(totalError)
DataCombined
See Visualizations with DataCombined
describing functions that can visualize data stored in DataCombined
object.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.