ospsuite offers a concept of storing and processing numerical x-y data in a unified format by implementing a DataSet
class. DataSet
objects standardize handling of (observed) data coming from different sources, such as excel files using the data importer functionality of the OSPS, loaded from *.pkml
, or manually created. This vignette gives an overview of the options to create DataSet
objects and combine them into grouped data sets using the DataCombined
class.
DataSet
A DataSet
object stores numerical data pairs - typically time as x values and measurement as y values - and optionally the measurement error . All values have a dimension and a unit (see Dimensions and Units for more information). Furthermore, each DataSet
must have a name. When creating a DataSet
from scratch (e.g. when the user wants to manually input observed data), a name must be provided:
library(ospsuite)
#> Loading required package: rClr
#> Loading the dynamic library for Microsoft .NET runtime...
#> Loaded Common Language Runtime version 4.0.30319.42000
# Create an empty data set
dataSet <- DataSet$new("My data set")
After creation, the DataSet
does not hold any data. The default dimension and unit for the x values is Time
and h
, respectively. The default dimension and unit for the y values is Concentration (mass)
and mg/l
, respectively. The dimension of the error values always corresponds to the dimension of the y values, though the units may differ.
Setting numerical values (or overwriting current values) is performed by the $setValues()
method:
dataSet$setValues(
xValues = c(1, 2, 3, 4),
yValues = c(0, 0.1, 0.6, 10),
yErrorValues = c(0.001, 0.001, 0.1, 1)
)
print(dataSet)
#> DataSet:
#> Name: My data set
#> X dimension: Time
#> X unit: h
#> Y dimension: Concentration (mass)
#> Y unit: mg/l
#> Error type: ArithmeticStdDev
#> Error unit: mg/l
#> Molecular weight:
#> LLOQ:
#> Meta data:
#> list()
The user can change the dimensions and units of the values. After changing the dimension, the unit is automatically set to the base unit of the dimension. Changing the dimension or unit does not transform the values.
# Print x, y, and error values
dataSet$xValues
#> [1] 1 2 3 4
dataSet$yValues
#> [1] 0.0 0.1 0.6 10.0
dataSet$yErrorValues
#> [1] 0.001 0.001 0.100 1.000
# Change the unit of x-values
dataSet$xUnit <- ospUnits$Time$min
# Print the x values - they did not change
dataSet$xValues
#> [1] 1 2 3 4
# Change dimension of y-values
dataSet$yDimension <- ospDimensions$Amount
print(dataSet)
#> DataSet:
#> Name: My data set
#> X dimension: Time
#> X unit: min
#> Y dimension: Amount
#> Y unit: µmol
#> Error type: ArithmeticStdDev
#> Error unit: µmol
#> Molecular weight:
#> LLOQ:
#> Meta data:
#> list()
# Change the units of y values and error values - they are now different!
dataSet$yUnit <- ospUnits$Amount$mol
dataSet$yErrorUnit <- ospUnits$Amount$pmol
print(dataSet)
#> DataSet:
#> Name: My data set
#> X dimension: Time
#> X unit: min
#> Y dimension: Amount
#> Y unit: mol
#> Error type: ArithmeticStdDev
#> Error unit: pmol
#> Molecular weight:
#> LLOQ:
#> Meta data:
#> list()
Two types of error values are supported - arithmetic error (default) and geometric error, the latter being given in fraction. The user can change the error type:
# Default error type is "ArithmeticStdDev"
dataSet$yErrorType
#> [1] "ArithmeticStdDev"
# Change error type to geometric
dataSet$yErrorType <- DataErrorType$GeometricStdDev
# Error unit is "Unitless" for dimension "Fraction".
dataSet$yErrorUnit
#> [1] ""
# Changing error type to arithmetic will set the dimension and unit of the error
# to the same dimension and unit as the y values
dataSet$yErrorType <- DataErrorType$ArithmeticStdDev
print(dataSet)
#> DataSet:
#> Name: My data set
#> X dimension: Time
#> X unit: min
#> Y dimension: Amount
#> Y unit: mol
#> Error type: ArithmeticStdDev
#> Error unit: mol
#> Molecular weight:
#> LLOQ:
#> Meta data:
#> list()
A DataSet
can store any kind of text meta data as name-values pairs and can be added by the addMetaData()
method:
# Add new meta data entries
dataSet$addMetaData(
name = "Molecule",
value = "Aciclovir"
)
dataSet$addMetaData(
name = "Organ",
value = "Muscle"
)
# Print meta data of the DataSet
print(dataSet$metaData)
#> $Molecule
#> [1] "Aciclovir"
#>
#> $Organ
#> [1] "Muscle"
A DataSet
or multiple DataSet
s can be converted to data.frame
(or tibble
) to be processed in downstream analysis and visualization workflows:
# Create a second data set
dataSet2 <- DataSet$new(name = "Second data set")
dataSet2$setValues(
xValues = c(1, 2, 3, 4, 5),
yValues = c(1, 0, 5, 8, 0.1)
)
# Convert data sets to a tibble
myTibble <- dataSetToTibble(dataSets = c(dataSet, dataSet2))
print(myTibble)
#> # A tibble: 9 x 14
#> name xValues yValues yError~1 xDime~2 xUnit yDime~3 yUnit yErro~4 yErro~5
#> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 My data ~ 1 0 0.00100 Time min Amount mol Arithm~ mol
#> 2 My data ~ 2 0.1 0.00100 Time min Amount mol Arithm~ mol
#> 3 My data ~ 3 0.6 0.1 Time min Amount mol Arithm~ mol
#> 4 My data ~ 4 10 1 Time min Amount mol Arithm~ mol
#> 5 Second d~ 1 1.00 NA Time h Concen~ mg/l NA NA
#> 6 Second d~ 2 0 NA Time h Concen~ mg/l NA NA
#> 7 Second d~ 3 5.00 NA Time h Concen~ mg/l NA NA
#> 8 Second d~ 4 8.00 NA Time h Concen~ mg/l NA NA
#> 9 Second d~ 5 0.100 NA Time h Concen~ mg/l NA NA
#> # ... with 4 more variables: molWeight <dbl>, lloq <dbl>, Molecule <chr>,
#> # Organ <chr>, and abbreviated variable names 1: yErrorValues, 2: xDimension,
#> # 3: yDimension, 4: yErrorType, 5: yErrorUnit
Importing data
Creating DataSet
objects from scratch is a rather advanced use case. Typically, observed data are loaded either from *.pkml
files exported from PK-Sim or MoBi, or imported from Excel files. The function loadDataSetFromPKML()
loads data from the *.pkml
file. Complementary to this function is the function saveDataSetToPKML()
that allows to export any DataSet
to a *.pkml
that can be loaded e.g. in MoBi.
# Load a data set from PKML
filePath <- system.file("extdata", "ObsDataAciclovir_1.pkml", package = "ospsuite")
dataSet <- loadDataSetFromPKML(filePath = filePath)
print(dataSet)
#> DataSet:
#> Name: Vergin 1995.Iv
#> X dimension: Time
#> X unit: h
#> Y dimension: Concentration (mass)
#> Y unit: mg/l
#> Error type: ArithmeticStdDev
#> Error unit: mg/l
#> Molecular weight: 225.21
#> LLOQ:
#> Meta data:
#> $Source
#> [1] "X:\\Orga\\BTS-TD\\ET\\TP CSB\\Projects\\Internal Projects\\MagenDarm\\TestSubstanzen\\Acyclovir\\Rohdaten_Acyclovir.xls.Vergin 1995 250 mg iv"
#>
#> $File
#> [1] "Rohdaten_Acyclovir"
#>
#> $Sheet
#> [1] "Vergin 1995 250 mg iv"
#>
#> $Molecule
#> [1] "Aciclovir"
#>
#> $Species
#> [1] "Human"
#>
#> $Organ
#> [1] "Peripheral Venous Blood"
#>
#> $Compartment
#> [1] "Plasma"
#>
#> $`Study Id`
#> [1] "Vergin 1995"
#>
#> $Gender
#> [1] "Undefined"
#>
#> $Dose
#> [1] "250 mg"
#>
#> $Route
#> [1] "IV"
#>
#> $`Patient Id`
#> [1] "Iv"
Another (and probably the most important) way to create DataSet
objects is by importing data from excel files. The function loadDataSetsFromExcel()
utilizes the data import functionality implemented in PK-Sim and MoBi and returns a set of DataSet
objects. For description of the supported file formats and configurations, please refer to the OSPS documentation.
Loading observed data from an Excel sheet requires an ImporterConfiguration
. The configuration describes mapping of excel sheet columns to numerical data (e.g. which column contains the x values) or meta data (e.g., description of the applied dose). One way to obtain such configuration is to create it in PK-Sim or MoBi, save it (as an *.xml
) file, and load it in R with the loadDataImporterConfiguration()
function:
# Load a configuration from xml file
filePath <- system.file("extdata", "dataImporterConfiguration.xml", package = "ospsuite")
importerConfiguration <- loadDataImporterConfiguration(configurationFilePath = filePath)
print(importerConfiguration)
#> DataImporterConfiguration:
#> Time column: Time [h]
#> Time unit: h
#> Time unit from column: FALSE
#> Measurement column: Concentration (mass)[ng/ml]
#> Measurement unit: ng/ml
#> Measurement unit from column: FALSE
#> Error column: Error [ng/ml]
#> Error type: ArithmeticStdDev
#> Error unit: ng/ml
#> Grouping columns: Study Id Organ Compartment Species Gender Molecule Route MW Patient Id Dose [unit]
#> Sheets:
#> Naming pattern: {Source}.{Sheet}.{Study Id}.{Organ}.{Compartment}.{Species}.{Gender}.{Molecule}.{Route}.{Molecular Weight}.{Subject Id}.{Dose}
A data importer configuration can also be created from scratch and has to be manually populated by the user. Alternatively, the user can let the software “guess” the configuration for a given excel sheet:
# Excel file
excelFilePath <- system.file("extdata", "CompiledDataSet.xlsx", package = "ospsuite")
sheetName <- "TestSheet_1"
# Create importer configuration for the excel sheet
importerConfiguration_guessed <- createImporterConfigurationForFile(
filePath = excelFilePath,
sheet = sheetName
)
print(importerConfiguration)
#> DataImporterConfiguration:
#> Time column: Time [h]
#> Time unit: h
#> Time unit from column: FALSE
#> Measurement column: Concentration (mass)[ng/ml]
#> Measurement unit: ng/ml
#> Measurement unit from column: FALSE
#> Error column: Error [ng/ml]
#> Error type: ArithmeticStdDev
#> Error unit: ng/ml
#> Grouping columns: Study Id Organ Compartment Species Gender Molecule Route MW Patient Id Dose [unit]
#> Sheets:
#> Naming pattern: {Source}.{Sheet}.{Study Id}.{Organ}.{Compartment}.{Species}.{Gender}.{Molecule}.{Route}.{Molecular Weight}.{Subject Id}.{Dose}
It is important to manually check the created configuration, as the automated configuration recognition cannot cover all possible cases.
If only specific sheets from the excel file should be imported, they can be specified in the ImporterConfiguration
. The following example loads the sheets TestSheet_1
and TestSheet_1_withMW
:
# Excel file
excelFilePath <- system.file("extdata", "CompiledDataSet.xlsx", package = "ospsuite")
sheetName <- "TestSheet_1"
# Create importer configuration for the excel sheet
importerConfiguration_guessed <- createImporterConfigurationForFile(
filePath = excelFilePath,
sheet = sheetName
)
# Add sheet names to the configuration
importerConfiguration_guessed$sheets <- c("TestSheet_1", "TestSheet_1_withMW")
# Load data
dataSets <- loadDataSetsFromExcel(
xlsFilePath = excelFilePath,
importerConfigurationOrPath = importerConfiguration_guessed
)
Currently, DataImporterConfiguration
created from scratch or for a specific data sheet does not support all features of importer configuration, such as specifying the column containing the molecular weight of the measured molecule or the LLOQ values. It is therefore recommended that you use importer configurations created in PK-Si or MoBi.