Introduction
In Modeling and Simulation (M&S) workflows, we often need to work with multiple observed and/or simulated datasets. Some of these datasets can be linked together in a group (e.g., the simulated and the corresponding observed data) for analysis and visualization.
The DataCombined
class in ospsuite provides a container to store data, to group them, and to transform them (e.g. scale or offset). The class accepts data coming from SimulationResults
or as a DataSet
object (see Observed data).
Let’s first generate some simulation results and load observed data.
library(ospsuite)
# Simulation results
simFilePath <- system.file("extdata", "Aciclovir.pkml", package = "ospsuite")
sim <- loadSimulation(simFilePath)
simResults <- runSimulation(sim)
outputPath <- "Organism|PeripheralVenousBlood|Aciclovir|Plasma (Peripheral Venous Blood)"
# observed data
obsData <- lapply(
c("ObsDataAciclovir_1.pkml", "ObsDataAciclovir_2.pkml", "ObsDataAciclovir_3.pkml"),
function(x) loadDataSetFromPKML(system.file("extdata", x, package = "ospsuite"))
)
# Name the elements of the list according to the names of the loaded `DataSet`.
names(obsData) <- lapply(obsData, function(x) x$name)
Typically, DataCombined
is used to create figures using the plotXXX()
functions. The different plot types are described in Visualizations with DataCombined
. To visualize the different functionalities of DataCombined
in this document, we will use the plotIndividualTimeProfile()
function.
Creating DataCombined
object
First, we create a new instance of DataCombined
class.
myDataCombined <- DataCombined$new()
Then we add simulated results to this object using $addSimulationResults()
:
myDataCombined$addSimulationResults(
simulationResults = simResults,
quantitiesOrPaths = outputPath,
names = "Aciclovir Plasma",
groups = "Aciclovir PVB"
)
Next we add observed data to this object using $addDataSets()
:
myDataCombined$addDataSets(
obsData$`Vergin 1995.Iv`,
groups = "Aciclovir PVB"
)
Every data, be it simulated results or from DataSet
, must have a unique name within DataCombined
. If not specified by user, the path of simulated results or the $name
property of the DataSet
are used as the name. Alternatively, we can define the name when adding the data, as in the above example adding simulated results.
There are a few things to keep in mind here:
- It doesn’t matter in which order observed or simulated datasets are added.
- You can add multiple
DataSet
objects in$addDataSets()
method call. - You can add only a single instance of
SimulationResults
in$addSimulationResults()
method call. - If you add a dataset with the same name as existing (e.g., adding simulation results from multiple simulations and not specifying the name, so the path of the results is used as the name), the new data will replace the existing.
Grouping
Since the DataCombined
object can store many datasets, some of them may naturally form a grouping and you would wish to track this in the object. Both $addDataSets()
and $addSimulationResults()
allow group specification via groups
argument.
When being plotted, data sets without a grouping will appear with distinct color, line style (for simulated results), or symbol style (for observed data) with a separate entry in the legend, with the name of the data set being the legend entry.
Though it is possible to group data with different dimensions, it makes sense to only group data sets that have the same dimension (or dimensions that can be transformed into each other, like Amount
and Mass
) in order to be able to compare the data. Grouping data sets with different dimensions most likely will result in an error when trying to create plots or calculating residuals with such a DataCombined
.
Let’s create a DataCombined
and add simulation results and observed data without specifying their grouping and create a time profile:
myDataCombined <- DataCombined$new()
myDataCombined$addSimulationResults(
simulationResults = simResults,
quantitiesOrPaths = outputPath,
names = "Aciclovir Plasma"
)
myDataCombined$addDataSets(
obsData$`Vergin 1995.Iv`
)
plotIndividualTimeProfile(dataCombined = myDataCombined)
If you do not specify groups
when you add datasets, and wish to update groupings later, you can use the $setGroups()
method. All data within one group will get one entry in the legend with the name of the group, and will be plotted with the same color.
myDataCombined$setGroups(
names = c("Aciclovir Plasma", obsData$`Vergin 1995.Iv`$name),
groups = c("Aciclovir PVB", "Aciclovir PVB")
)
plotIndividualTimeProfile(dataCombined = myDataCombined)
At any point, you can check the current names and groupings with the following active field:
myDataCombined$groupMap
#> # A tibble: 2 x 3
#> group name dataType
#> <chr> <chr> <chr>
#> 1 Aciclovir PVB Aciclovir Plasma simulated
#> 2 Aciclovir PVB Vergin 1995.Iv observed
Transformations
Sometimes the raw data included in DataCombined
needs to be transformed using specified offset and scale factor values.
This is supported via $setDataTransformations()
method, where you can specify the names of datasets, and offsets and scale factors. If these arguments are scalar (i.e., of length 1), then the same value will be applied to all specified names. Scale factors are unitless values by which raw data values will be multiplied, offsets are added to the raw values without any conversion and must therefore be in the same units as the raw data.
The internal data frame in DataCombined
will be transformed with the specified parameters, with the new columns computed as:
- For x values:
newXValue = (rawXValue + xOffset) * xScaleFactor
- For y values:
newYValue = (rawYValue + yOffset) * yScaleFactor
- For arithmetic error:
newErrorValue = rawErrorValue * abs(yScaleFactor)
- For geometric error: no transformation.
At any point, you can check the applied offsets and scale factors with the following active field:
myDataCombined$dataTransformations
#> # A tibble: 2 x 5
#> name xOffsets yOffsets xScaleFactors yScaleFactors
#> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 Aciclovir Plasma 0 0 1 1
#> 2 Vergin 1995.Iv 0 0 1 1
Now lets take a closer look on possible situations where you would apply any transformations. Consider the observed data from the examples above. It reports concentrations of aciclovir after single intravenous administration.
myDataCombinedTranformations <- DataCombined$new()
myDataCombinedTranformations$addDataSets(
obsData$`Vergin 1995.Iv`
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
However, we might want to use this data set with a simulation where aciclovir is administered 24 hours after simulation begin. To be able to compare simulation results with the data set, we can offset the observed data time by 24 hours. Keep in mind that the offset must be given in the same unit as the data set values are.
# Check the units of the observed time values
obsData$`Vergin 1995.Iv`$xUnit
#> [1] "h"
myDataCombinedTranformations$setDataTransformations(
forNames = obsData$`Vergin 1995.Iv`$name,
xOffsets = 24
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
In the next step, we want to normalize observed concentrations to a dose. We can easily achieve this with the scale factor. In the next example, we normalize observed values to 250 mg dose by setting the yScaleFactor
to 1/250:
myDataCombinedTranformations$setDataTransformations(
forNames = obsData$`Vergin 1995.Iv`$name,
yScaleFactors = 1 / 250
)
plotIndividualTimeProfile(dataCombined = myDataCombinedTranformations)
Finally, offsetting the observation values might be useful when working with measurements of endogenous substrates, such as the hormone glucagon, and want to correct for the individual specific baseline levels of the hormone.
Extracting a combined data frame
The data frame (also sometimes called as a table) data structure is central to R-based workflows, and, thus, we may wish to extract a data frame for datasets present in the object.
Internally, DataCombined
extracts data frames for observed and simulated datasets and combines them.
myDataCombined$toDataFrame()
#> # A tibble: 504 x 27
#> name group dataT~1 xValues xUnit xDime~2 yValues yUnit yDime~3 yErro~4
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 Aciclovir ~ Acic~ simula~ 0 min Time 0 µmol~ Concen~ NAm
#> 2 Aciclovir ~ Acic~ simula~ 1 min Time 3.25 µmol~ Concen~ NAm
#> 3 Aciclovir ~ Acic~ simula~ 2 min Time 9.10 µmol~ Concen~ NAm
#> 4 Aciclovir ~ Acic~ simula~ 3 min Time 15.0 µmol~ Concen~ NAm
#> 5 Aciclovir ~ Acic~ simula~ 4 min Time 20.7 µmol~ Concen~ NAm
#> 6 Aciclovir ~ Acic~ simula~ 5 min Time 26.2 µmol~ Concen~ NAm
#> 7 Aciclovir ~ Acic~ simula~ 6 min Time 31.4 µmol~ Concen~ NAm
#> 8 Aciclovir ~ Acic~ simula~ 7 min Time 36.4 µmol~ Concen~ NAm
#> 9 Aciclovir ~ Acic~ simula~ 8 min Time 41.1 µmol~ Concen~ NAm
#> 10 Aciclovir ~ Acic~ simula~ 9 min Time 45.5 µmol~ Concen~ NAm
#> # ... with 494 more rows, 17 more variables: yErrorType <chr>,
#> # yErrorUnit <chr>, IndividualId <int>, molWeight <dbl>, lloq <dbl>,
#> # Source <chr>, File <chr>, Sheet <chr>, Molecule <chr>, Species <chr>,
#> # Organ <chr>, Compartment <chr>, `Study Id` <chr>, Gender <chr>, Dose <chr>,
#> # Route <chr>, `Patient Id` <chr>, and abbreviated variable names
#> # 1: dataType, 2: xDimension, 3: yDimension, 4: yErrorValues
This function returns a tibble data frame. If you wish to modify how it is printed, you can have a look at the available options here. In fact, let’s change a few options and print the data frame again.
options(
pillar.width = Inf, # show all columns
pillar.min_chars = Inf # to turn off truncation of column titles
)
myDataCombined$toDataFrame()
#> # A tibble: 504 x 27
#> name group dataType xValues xUnit xDimension yValues
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 Aciclovir Plasma Aciclovir PVB simulated 0 min Time 0
#> 2 Aciclovir Plasma Aciclovir PVB simulated 1 min Time 3.25
#> 3 Aciclovir Plasma Aciclovir PVB simulated 2 min Time 9.10
#> 4 Aciclovir Plasma Aciclovir PVB simulated 3 min Time 15.0
#> 5 Aciclovir Plasma Aciclovir PVB simulated 4 min Time 20.7
#> 6 Aciclovir Plasma Aciclovir PVB simulated 5 min Time 26.2
#> 7 Aciclovir Plasma Aciclovir PVB simulated 6 min Time 31.4
#> 8 Aciclovir Plasma Aciclovir PVB simulated 7 min Time 36.4
#> 9 Aciclovir Plasma Aciclovir PVB simulated 8 min Time 41.1
#> 10 Aciclovir Plasma Aciclovir PVB simulated 9 min Time 45.5
#> yUnit yDimension yErrorValues yErrorType yErrorUnit IndividualId
#> <chr> <chr> <dbl> <chr> <chr> <int>
#> 1 µmol/l Concentration (molar) NA NA NA 00
#> 2 µmol/l Concentration (molar) NA NA NA 00
#> 3 µmol/l Concentration (molar) NA NA NA 00
#> 4 µmol/l Concentration (molar) NA NA NA 00
#> 5 µmol/l Concentration (molar) NA NA NA 00
#> 6 µmol/l Concentration (molar) NA NA NA 00
#> 7 µmol/l Concentration (molar) NA NA NA 00
#> 8 µmol/l Concentration (molar) NA NA NA 00
#> 9 µmol/l Concentration (molar) NA NA NA 00
#> 10 µmol/l Concentration (molar) NA NA NA 00
#> molWeight lloq Source File Sheet Molecule Species Organ Compartment
#> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 225. NA NA NA NA NA NA NA NA
#> 2 225. NA NA NA NA NA NA NA NA
#> 3 225. NA NA NA NA NA NA NA NA
#> 4 225. NA NA NA NA NA NA NA NA
#> 5 225. NA NA NA NA NA NA NA NA
#> 6 225. NA NA NA NA NA NA NA NA
#> 7 225. NA NA NA NA NA NA NA NA
#> 8 225. NA NA NA NA NA NA NA NA
#> 9 225. NA NA NA NA NA NA NA NA
#> 10 225. NA NA NA NA NA NA NA NA
#> `Study Id` Gender Dose Route `Patient Id`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 NA NA NA NA NA
#> 2 NA NA NA NA NA
#> 3 NA NA NA NA NA
#> 4 NA NA NA NA NA
#> 5 NA NA NA NA NA
#> 6 NA NA NA NA NA
#> 7 NA NA NA NA NA
#> 8 NA NA NA NA NA
#> 9 NA NA NA NA NA
#> 10 NA NA NA NA NA
#> # ... with 494 more rows
ospsuite also provides a few helper functions to modify the data frame further.
When multiple (observed and/or simulated) datasets are present in a data frame, they are likely to have different units. convertUnits()
function helps to convert them to a common unit. The function will not modify the DataCombined
object but return a new data frame with unified units.1.
convertUnits(
myDataCombined,
xUnit = ospUnits$Time$s,
yUnit = ospUnits$`Concentration [mass]`$`µg/l``
)
#> # A tibble: 504 x 27
#> name group dataT~1 xValues xUnit xDime~2 yValues yUnit yDime~3 yErro~4
#> <chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
#> 1 Aciclovir ~ Acic~ simula~ 0 s Time 0 µg/l Concen~ NAm
#> 2 Aciclovir ~ Acic~ simula~ 60 s Time 733. µg/l Concen~ NAm
#> 3 Aciclovir ~ Acic~ simula~ 120 s Time 2050. µg/l Concen~ NAm
#> 4 Aciclovir ~ Acic~ simula~ 180 s Time 3382. µg/l Concen~ NAm
#> 5 Aciclovir ~ Acic~ simula~ 240 s Time 4668. µg/l Concen~ NAm
#> 6 Aciclovir ~ Acic~ simula~ 300 s Time 5901. µg/l Concen~ NAm
#> 7 Aciclovir ~ Acic~ simula~ 360 s Time 7077. µg/l Concen~ NAm
#> 8 Aciclovir ~ Acic~ simula~ 420 s Time 8194. µg/l Concen~ NAm
#> 9 Aciclovir ~ Acic~ simula~ 480 s Time 9249. µg/l Concen~ NAm
#> 10 Aciclovir ~ Acic~ simula~ 540 s Time 10242. µg/l Concen~ NAm
#> # ... with 494 more rows, 17 more variables: yErrorType <chr>,
#> # yErrorUnit <chr>, IndividualId <int>, molWeight <dbl>, lloq <dbl>,
#> # Source <chr>, File <chr>, Sheet <chr>, Molecule <chr>, Species <chr>,
#> # Organ <chr>, Compartment <chr>, `Study Id` <chr>, Gender <chr>, Dose <chr>,
#> # Route <chr>, `Patient Id` <chr>, and abbreviated variable names
#> # 1: dataType, 2: xDimension, 3: yDimension, 4: yErrorValues
Further functionalities
Grouping of simulated results with observed data inside DataCombined
also allows to calculate the error between simulated results and observations using the calculateResiduals()
function. The function calculates the distance between each observed data point within the group and the simulated results. Simulation results interpolated if no simulated value is available for a certain observed time point. Observed data beyond simulated time is ignored, i.e., no extrapolation is performed.
There are two ways to calculate the distance (residuals) - linear and logarithmic - specified by the scaling
argument. For linear scaling (scaling = tlf::Scaling$lin
):
Residuals are calculated as: Simulation value - Observed value. This means that the residuals are defined by absolute differences.
For logarithmic scaling (scaling = tlf::Scaling$log
):
Residuals are calculated as: log(Simulation value) - log(Observed value) = log (Simulation Value / Observed Value). This means that the ratio of values is considered which is independent of the magnitude of the value. But for very small observed values, in particular close to 0 values, this can lead to problems, because log(10E-N) = -N can becomes large.
# Linear residuals
calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues
#> [1] 9.49304275 -0.14538952 0.17965548 1.84599976 3.52552442 3.64160094
#> [7] 3.15654735 2.38392729 1.48703530 0.50532597 0.08009793 -0.14892803
#> [13] -0.29395086
# Logarithmic residuals
calculateResiduals(myDataCombined, scaling = tlf::Scaling$log)$residualValues
#> [1] 0.104108908 -0.003164673 0.005480074 0.067421107 0.167165355
#> [6] 0.209421219 0.251369532 0.250984965 0.281241351 0.198767801
#> [11] 0.059638034 -0.181845923 -1.001432113
To quickly calculate the total error of the DataCombined
, one can sum up the absolute values of the residuals:
# Linear residuals
totalError <- sum(abs(calculateResiduals(myDataCombined, scaling = tlf::Scaling$lin)$residualValues))
print(totalError)
#> [1] 26.88703
Visualizations with DataCombined
See Visualizations with DataCombined
describing functions that can visualize data stored in DataCombined
object.