title: |
{width=1in}
NMdata: A fast R package for efficient data preparation, consistency-checking and post-processing in PK/PD modeling author: Philip Delff date: June 2022 fontsize: 8pt header-includes: - \usepackage{adjustbox} output: beamer_presentation: slide_level: 2 keep_tex: true classoption: "aspectratio=169"
\tableofcontents[hideallsubsections]
::: columns :::: column
An R package that can help
Designed to fit in to the user's setup and coding preferences
::::
:::: column
A plotting package
A tool to retrieve details about model runs
A calculation or simulation toolbox
A "silo" that requires you to do things in a certain way
:::: :::
$$\vspace{.01in}$$
\includegraphics[width=.5in]{figures/worksign.png}
Version: 1
Threads: 1
Packages:
- NMdata
# this will allow packages to be cached across projects for faster installs on new projects
Cache: /data/prod_vx708_001_pkgcache-2022-06-15
Repos:
- MPN: https://mpn.metworx.com/snapshots/stable/2022-06-15
Lockfile:
Type: renv
The workflow of a pharmacometrician is very technical, with many risks of errors.
Technical workload takes time from modeling, reflection, and understanding key questions.
During the first 2-3 years I spent in pharmacometrics, I must have spent half the time coding, desparately trying to get Nonmem to behave, and to understand the properties of the estimates I obtained.
Most of us develop our own ways to avoid some of the many difficulties in this process. This takes a lot of time and is most often only because we don't have adequate tools at hand. Or don't know them.
I generalized some of my solutions and collected them in NMdata
.
Almost every single line of code in the package is motivated by bad experiences. Errors, fear of errors, time wasted on debugging and double checking.
I have no intention of missioning these approaches to others. But if you find something interesting, feel free to take advantage.
Install from CRAN
or from MPN
using pkgr
.
library(NMdata)
## Welcome to NMdata. Best place to browse NMdata documentation is
## https://philipdelff.github.io/NMdata/
packageVersion("NMdata")
## [1] '0.0.12'
Three vignettes are available so far (see "Vignettes" tab when visiting URL above):
For a quick overview (after installation), do:
help(package="NMdata")
All functions and their arguments are documented in ?manual pages.
compareCols
::: columns :::: column
compareCols
provides an overview of these properties for any number of data sets. diff.only=FALSE
will give the complete list of columns in the two datasets.A slightly modified version of the pk
dataset has been created.
CYCLE
has been removed, andAMT
has been recoded to character:::: :::: column
```{.r .smaller} compareCols(pk,pk.reduced)
\vspace{12pt}
Before merging or stacking, we may want to
* recode `AMT` in one of the datasets to get the class we need
* decide what to do about the missing `CYCLE` in one of the datasets
::::
:::
## Keep track of missing values
```r
missings <- listMissings(pk)
## variable Nmissing
## 1: DV 150
head(missings)
## variable row value
## 1: DV 1 <NA>
## 2: DV 11 <NA>
## 3: DV 21 <NA>
## 4: DV 31 <NA>
## 5: DV 41 <NA>
## 6: DV 51 <NA>
You can specify
From ?listMissings
:
Usage:
listMissings(data, cols, by, na.strings = c("", "."), quiet = FALSE, as.fun)
::: columns :::: column
renameByContents
renames columns if a function of their contents returns TRUE
.NMisNumeric
NMisNumeric
is a function that tests if the contents are numeric to Nonmem
. "1039"
(character class) will be a numeric in Nonmem, "1-039"
will not. ::::
:::: column All column names are capital case. We rename to lowercase those that Nonmem will not be able to interpret as numeric. \footnotesize
pk <- renameByContents(data=pk,
fun.test=NMisNumeric,
fun.rename = tolower,
invert.test = TRUE)
compareCols
shows that four columns were renamed:
compareCols(pk.old,pk)
## Dimensions:
## data nrows ncols
## 1: pk.old 1502 22
## 2: pk 1502 22
##
## Columns that differ:
## column pk.old pk
## 1: EVENTU character <NA>
## 2: NAME character <NA>
## 3: TIMEUNIT character <NA>
## 4: TRTACT character <NA>
## 5: eventu <NA> character
## 6: name <NA> character
## 7: timeunit <NA> character
## 8: trtact <NA> character
##
## Columns where no differences were found: AMT, BLQ, CMT,
## CYCLE, DOSE, DV, EVID, FLAG, ID, NOMTIME, PART, PROFDAY,
## PROFTIME, STUDY, TIME, WEIGHTB, eff0, flag.
\normalsize ::::
:::
mergeCheck
is a wrapper of merge
which only accepts the results if The rows that come out of the merge are the exact same as in one of the existing datasets, only columns added from the second dataset
This limitation of the scope of the merge allows for a high degree of automated checks of consistency of the results.
This is not to say that merges beyond the scope of mergeCheck
are
relevant or necessary. But if mergeCheck
covers your needs, it's a
real time saver in terms of automated checks.
mergeCheck is not a new implementation of merge. It's an implementation of checks.
mergeCheck
uses merge.data.table
. The contribution is the checks that no rows are lost, duplicated or added.
The order of rows in the resulting data is always the same as the first dataset supplied.
Is mergeCheck
slower?
mergeCheck
is likely to be way faster than what you use already. \framesubtitle{Example: Would your standard checks of merges capture this?}
Say we want to add a covariate from a
dt.cov
. We expect the number of rows to be unchanged from pk
. mergeCheck
more strictly requires that we get all and only the same rows:
::: columns :::: column
mergeCheck
\footnotesize
## The resulting dimensions are correct
pkmerge <- merge(pk,dt.cov,by="ID")
dims(pk,dt.cov,pkmerge)
## data nrows ncols
## 1: pk 1502 22
## 2: dt.cov 150 2
## 3: pkmerge 1502 23
## But we now have twice as many rows for this subject
dims(pk[ID==31],pkmerge[ID==31])
## data nrows ncols
## 1: pk[ID == 31] 10 22
## 2: pkmerge[ID == 31] 20 23
::: :::: column
mergeCheck
throws an error...and suggests what is wrong \footnotesize
try(mergeCheck(pk,dt.cov,by="ID"))
## Rows disappeared during merge.
## Rows duplicated during merge.
## Overview of dimensions of input and output data:
## data nrows ncols
## 1: pk 1502 23
## 2: dt.cov 150 2
## 3: result 1502 24
## Overview of values of by where number of rows in x changes:
## ID N.x N.result
## 1: 31 10 20
## 2: 180 10 0
## Error in mergeCheck(pk, dt.cov, by = "ID") :
## Merge added and/or removed rows.
:::: \normalsize :::
If you only want to add columns by a merge, mergeCheck
does all the necessary checks for you.
\framesubtitle{Keep track of data exclusions - don't discard!}
It is good practice not to discard unwanted records from a dataset but to flag them and omit them in model estimation.
When reporting the analysis, we need to account for how many data records were discarded due to which criteria.
The implementation in NMdata
is based on sequentially checking
exclusion conditions.
The information is represented in one numerical column for Nonmem, and one (value-to-value corresponding) character column for the rest of us.
::: columns :::: column
flagsAssign
applies the conditions sequentially, by increasing or decreasing
value of FLAG
. BLQ
has to exist in pk
.FLAG=0
means that none of the conditions were met and row is kept in analysis. This cannot be customized.Nonmem
, you can include IGNORE=(FLAG.NE.0)
in $DATA
or $INFILE
.::::
:::: column \footnotesize
dt.flags <- fread(text="FLAG,flag,condition
10,Below LLOQ,BLQ==1
100,Negative time,TIME<0")
pk <- flagsAssign(pk,tab.flags=dt.flags,subset.data="EVID==0")
## Coding FLAG = 100, flag = Negative time
## Coding FLAG = 10, flag = Below LLOQ
pk <- flagsAssign(pk,subset.data="EVID==1",flagc.0="Dosing")
::::
:::
flagsCount
An overview of the number of observations disregarded due to the
different conditions is then obtained using flagsCount
:
flagsCount
includes a file
argument to save the the table right
away.
\footnotesize
flagsCount(data=pk[EVID==0],tab.flags=dt.flags)
## flag N.left Nobs.left N.discard N.disc.cum Nobs.discard Nobs.disc.cum
## 1: All available data 150 1352 NA 0 NA 0
## 2: Negative time 150 1350 0 0 2 2
## 3: Below LLOQ 131 755 19 19 595 597
## 4: Analysis set 131 755 NA 19 NA 597
::: columns :::: column
A unique identifier is needed in order to
Track rows in analysis data back to source data
Reliably combine (by merge) output with input data
It is not a problem if represented as a double
in R
Increasing
:::: :::: column
data.table
## order
setorder(pk,ID,TIME,EVID)
## add counter
pk[,ROW:=.I]
dplyr
(I'm not very familiar with dplyr
)pk <- pk %>%
arrange(ID,TIME,EVID) %>%
mutate(ROW=1:n())
:::: :::
::: columns :::: column \vspace{12pt} * The order of columns in Nonmem is important for two reasons.
The number of variables you can read into Nonmem is restricted (may not apply to recent Nonmem versions)
NMorderColumns
uses a mix of recognition of column names and analysis of the
column contents to sort the columns.
First: Standard columns (ID
, TIME
, EVID
etc.) and usable columns first
Columns that cannot be converted to numeric are put in the back
Additional columns to place earlier (argument first
) or late (last
) can be specified.
See ?NMorderColumns
for more options.
NMorderColumns
does not sort rows, nor does it modify any contents of columns.
:::: :::: column \footnotesize
pk.old <- copy(pk)
pk <- NMorderColumns(pk,first="WEIGHTB")
\normalsize
We may want to add MDV
and rerun NMorderColumns
.
\footnotesize
data.table(old=colnames(pk.old),new=colnames(pk))
## old new
## 1: ID ROW
## 2: NOMTIME ID
## 3: TIME NOMTIME
## 4: EVID TIME
## 5: CMT EVID
## 6: AMT CMT
## 7: DV AMT
## 8: STUDY DV
## 9: BLQ WEIGHTB
## 10: CYCLE FLAG
## 11: DOSE STUDY
## 12: PART BLQ
## 13: PROFDAY CYCLE
## 14: PROFTIME DOSE
## 15: WEIGHTB PART
## 16: eff0 PROFDAY
## 17: eventu PROFTIME
## 18: name eff0
## 19: timeunit eventu
## 20: trtact flag
## 21: FLAG name
## 22: flag timeunit
## 23: ROW trtact
## old new
:::: \normalsize :::
NMcheckData
: Check data syntax for Nonmem compatibilityAim: check data for all potential Nonmem compatibility issues and other obvious errors.
Findings must be returned in a structure so related subsets of the data can easily be identified for further inspection.
::: columns
:::: column
* NMcheckData
contains a very long list of checks of especially the standard Nonmem columns (ID
, TIME
, EVID
, AMT
, DV
, MDV
, RATE
, SS
, etc.). They are all checked for allowed values (e.g. TIME
must be non-negative, EVID
must be one of 0:4
, etc).
ID-level checks (e.g. did all ID's receive doses, is time increasing, are rows disjoint?)
All used columns are checked for Nonmem compatibility in terms of how Nonmem translates to numeric values.
Column names are checked for uniqueness and for non-allowed characters.
If you supply the col.usubjid column, the ID column is checked to align with col.usubjid.
NMcheckData
is based on simple framework making it simple to define new checks.
:::: :::: column \scriptsize
pk <- pk[ID>59]
res.check <- NMcheckData(pk)
## column check N Nid
## EVID Subject has no obs 1 1
## MDV Column not found 1 0
res.check
## row ID column check level ROW
## 1: NA 60 EVID Subject has no obs ID NA
## 2: NA NA MDV Column not found column NA
pkmod <- copy(pk)
pkmod[,MDV:=as.numeric(is.na(DV))]
pkmod[ID==60&EVID==1,CMT:=NA]
res.check <- NMcheckData(pkmod)
## column check N Nid
## EVID Subject has no obs 1 1
## CMT Missing for EVID!=3 1 1
res.check
## row ID column check level ROW
## 1: NA 60 EVID Subject has no obs ID NA
## 2: 1 60 CMT Missing for EVID!=3 row 291
:::: :::
::: columns :::: column
For the final step of writing the dataset, NMwriteData
is
provided.
NMwriteData
never modifies the data.csv
file with appropriate options for Nonmem compatibilityrds
file for R If you use NMscanData
to read Nonmem results, this information can be used automatically.
Provides a proposal for text to include in the
$INPUT
and $DATA
sections of the Nonmem control
streams.
These are the only steps involved between the supplied data set and the written csv.
scipen
is small to maximize precision.\footnotesize
file.csv <- fnExtension(file,".csv")
fwrite(data,na=".",quote=FALSE,row.names=FALSE,scipen=0,file=file.csv)
\normalsize
All arguments to fwrite
can be modified using the args.fwrite
argument.
:::: :::: column \footnotesize
NMwriteData(pk,file="derived/pk.csv")
## Data written to file(s):
## derived/pk.csv
## derived/pk.rds
## For NONMEM:
## $INPUT ROW ID NOMTIME TIME EVID CMT AMT DV WEIGHTB
## FLAG STUDY BLQ CYCLE DOSE PART PROFDAY PROFTIME eff0
## $DATA derived/pk.csv
## IGN=@
## IGNORE=(FLAG.NE.0)
\normalsize
\vspace{12pt}
eff0
is the last column in pk
that Nonmem
can make use of (remember NMisNumeric
from earlier?)
NMwriteData
detected the exclusion flag and suggests to include it in $DATA
.
:::: :::
::: columns :::: column
NMwriteSection
is a function that replaces sections (like $DATA or
$TABLE) of nonmem control streams.
NMwriteData
returns a list that can be directly processed by
NMwriteSection
In NMwriteData
, several arguments modify the proposed text
the proposed text for the Nonmem run, see ?NMwriteData
.
NMwriteData
is very useful for many other sections, like $TABLE
,
or even $PK
. But not $THETA
and $OMEAGE
(because they are
specific to each model).
NMwriteData
by defaults saves a backup of the overwritten control
streams.
NMwriteData
has a section reader counterpart in NMreadSection
NMextractDataFile
takes a control stream/list file and extracts
the input data file name/path. You can use this to identify the
model runs in which to update $DATA
.
:::: :::: column
\footnotesize
nmCode <- NMwriteData(pk,file="derived/pk.csv",
write.csv=FALSE,
### arguments that tailors text for Nonmem
nmdir.data="../derived",
nm.drop="PROFDAY",
nm.copy=c(CONC="DV"),
nm.rename=c(BBW="WEIGHTB"),
## PSN compatibility
nm.capitalize=TRUE)
## Data _not_ witten to any files.
## For NONMEM:
## $INPUT ROW ID NOMTIME TIME EVID CMT AMT CONC=DV BBW
## FLAG STUDY BLQ CYCLE DOSE PART PROFDAY=DROP PROFTIME
## EFF0
## $DATA ../derived/pk.csv
## IGN=@
## IGNORE=(FLAG.NE.0)
## example: pick run1*.mod
models <- NMwriteSection(dir="../models",
file.pattern="run1.+\\.mod$")
\normalsize
:::: :::
\framesubtitle{Ensure that the data can be traced back to the data generation script}
::: columns
:::: column
* If the argument script
is
supplied to NMwriteData
, a little meta information is saved together with the output file(s).
For csv files, the meta data is written to a txt file next to the csv file.
For rds files, the meta data is attached to the object saved in the
rds
file.
NMstamp
is used under the hood. You can use NMstamp
on any R object to attach similar meta information. Additional arguments (essentially anything) can be passed from NMwriteData
to NMstamp
using the argument args.stamp
.
NMstamp
and NMinfo
write and read an "attribute" called NMdata
.
:::: :::: column
\footnotesize
NMwriteData(pk,file="derived/pk.csv",
script = "NMdata_Rpackage.Rmd",quiet=T)
list.files("derived")
## [1] "pk.csv" "pk.rds" "pk_meta.txt"
## NMreadCsv reads the metadata .txt file if found
pknm <- NMreadCsv("derived/pk.csv")
NMinfo(pknm)
## $dataCreate
## $dataCreate$DataCreateScript
## [1] "NMdata_Rpackage.Rmd"
##
## $dataCreate$CreationTime
## [1] "2022-06-30 09:29:58 EDT"
##
## $dataCreate$writtenTo
## [1] "derived/pk.csv"
## The .rds file contains the metadata already
pknm2 <- readRDS("derived/pk.rds")
NMinfo(pknm2)
## $dataCreate
## $dataCreate$DataCreateScript
## [1] "NMdata_Rpackage.Rmd"
##
## $dataCreate$CreationTime
## [1] "2022-06-30 09:29:58 EDT"
##
## $dataCreate$writtenTo
## [1] "derived/pk.rds"
:::: :::
\normalsize
NMscanData
is an automated and general reader of Nonmem.
Based on the list file (.lst
) it will:
Nonmem
modelNonmem
(e.g. observations or subjects that are not part of the
analysis)\pause \footnotesize
::: columns :::: column
file1.lst <- system.file("examples/nonmem/xgxr003.lst",
package="NMdata")
res0 <- NMscanData(file1.lst,merge.by.row=FALSE)
## No missing values identified
## Model: xgxr003
##
## Used tables, contents shown as used/total:
## file rows columns IDs
## xgxr003_res.txt 905/905 7/7 150/150
## xgxr003_res_vols.txt 905/905 3/7 150/150
## xgxr003_res_fo.txt 150/150 1/2 150/150
## xgxr1.csv (input) 905/1502 21/24 150/150
## (result) 905 32+2 150
## Input and output data combined by translation of
## Nonmem data filters (not recommended).
##
## Distribution of rows on event types in returned data:
## EVID CMT output result
## 0 2 755 755
## 1 1 150 150
## All All 905 905
:::: \pause :::: column
class(res0)
## [1] "NMdata" "data.table" "data.frame"
dims(res0)
## data nrows ncols
## 1: res0 905 34
head(res0,n=2)
## ROW ID NOMTIME TIME EVID CMT AMT DV FLAG STUDY KA
## 1: 1 31 0 0 1 1 3 0 0 1 0.1812
## 2: 11 32 0 0 1 1 3 0 0 1 0.1812
## Q PRED RES WRES V2 V3 BLQ CYCLE DOSE PART
## 1: 2307400 0 0 0 0.042 0.1785 0 1 3 1
## 2: 2307400 0 0 0 0.042 0.1785 0 1 3 1
## PROFDAY PROFTIME WEIGHTB eff0 CL EVENTU NAME
## 1: 1 0 87.031 56.461 0.7245691 mg Dosing
## 2: 1 0 100.620 45.096 0.7245691 mg Dosing
## TIMEUNIT TRTACT flag trtact model nmout
## 1: Hours 3 mg Dosing 3 mg xgxr003 TRUE
## 2: Hours 3 mg Dosing 3 mg xgxr003 TRUE
\normalsize :::: :::
Using a unique row identifier for merging data is highly recommended:
\footnotesize
res1 <- NMscanData(file.nm("xgxr001.lst"),merge.by.row=TRUE)
## Model: xgxr001
##
## Used tables, contents shown as used/total:
## file rows columns IDs
## xgxr001_res.txt 905/905 16/16 150/150
## xgxr1.csv (input) 905/1502 22/24 150/150
## (result) 905 38+2 150
##
## Input and output data merged by: ROW
##
## Distribution of rows on event types in returned data:
## EVID CMT output result
## 0 2 755 755
## 1 1 150 150
## All All 905 905
class(res0)
## [1] "NMdata" "data.table" "data.frame"
\normalsize
col.row
if found.col.row
is ROW
. We shall see later how to modify this.\framesubtitle{Example: quickly get from a list file to looking at the model}
\footnotesize :::::::::::::: {.columns} ::: {.column width="45%"}
## Using data.table for easy summarize
res1 <- NMscanData(file1.lst,merge.by.row=TRUE,
as.fun="data.table",quiet=TRUE)
## Derive geometric mean pop predictions by
## treatment and nominal sample time. Only
## use sample records.
res1[EVID==0,
gmPRED:=exp(mean(log(PRED))),
by=.(trtact,NOMTIME)]
::: ::: {.column width="55%"}
\normalsize
::: ::::::::::::::
:::::::::::::: {.columns} ::: {.column width="45%"}
\footnotesize
res2 <- NMscanData(file1.lst,
merge.by.row=TRUE,recover.rows=TRUE)
## Model: xgxr003
##
## Used tables, contents shown as used/total:
## file rows columns IDs
## xgxr003_res.txt 905/905 7/7 150/150
## xgxr003_res_vols.txt 905/905 3/7 150/150
## xgxr003_res_fo.txt 150/150 1/2 150/150
## xgxr1.csv (input) 1502/1502 21/24 150/150
## (result) 1502 32+2 150
##
## Input and output data merged by: ROW
##
## Distribution of rows on event types in returned data:
## EVID CMT input-only output result
## 0 1 2 0 2
## 0 2 595 755 1350
## 1 1 0 150 150
## All All 597 905 1502
:::
::: {.column width="55%"}
## Warning: Removed 7 row(s) containing missing values
## (geom_path).
:::
::::::::::::::
NMscanMultiple
:::::::::::::: {.columns} ::: {.column width="45%"} \footnotesize
NMdataConf(as.fun="data.table")
NMdataConf(col.row="ROW")
NMdataConf(merge.by.row=TRUE)
models <- file.nm(cc(xgxr001.lst,xgxr014.lst))
res.mult <- NMscanMultiple(files=models,quiet=T)
##
## Overview of model scanning results:
## lst
## 1: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr001.lst
## 2: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr014.lst
## nrows ncols success warning
## 1: 905 40 TRUE FALSE
## 2: 905 36 TRUE FALSE
res.mult[EVID==0&nmout==TRUE,
gmPRED:=exp(mean(log(PRED))),
by=.(model,trtact,NOMTIME)]
class(res.mult)
## [1] "NMdata" "data.table" "data.frame"
NMscanMultiple
can search for models by matching file names to a regular expression, similarly to NMwriteSection
.
:::
::: {.column width="55%"}
\normalsize
::: ::::::::::::::
::: columns
:::: column
By default, NMscanData
will look for an rds file next to the csv file (same file name, only extension .rds different).
If this is found, it will be read, providing an enriched (e.g. conserving factor levels and any other information).
There are no checks of consistency of rds
file against delimited file read by Nonmem
.
I am interested in ideas on how to do this. If we can avoid reading the csv file, it would be highly prefered.
You get the rds automatically if using NMwriteData
.
Disable looking for the rds by argument use.rds=FALSE
.
Default value of use.rds
can be modified with NMdataConf
.
::::
:::: column
The plots are correctly ordered by doses - because they are ordered by factor levels as in rds
input data.
\footnotesize
lst <- system.file("examples/nonmem/xgxr014.lst",
package="NMdata")
res14 <- NMscanData(lst,quiet=TRUE)
::::
:::
\normalsize
::: columns
:::: column
Most important message: an NMdata
object can be used as if it weren't.
Methods defined for NMdata
:
summary
: The information that is written to the console if quiet=FALSE
.Simple other methods like rbind
and similar are defined by dropping the NMdata
class and then perform the operation.
NMinfo
lists metadata from NMdata
objects and only works on NMdata
objects. Components in metadata are (as available):NMinfo(res1,"details")
: How was the data read and combined?NMinfo(res1,"dataCreate")
: Meta data found attached to the input data file.NMinfo(res1,"input.colnames")
: The translation table of input column names from input to outputNMinfo(res1,"input.filters")
: The "filters" (IGNORE/ACCEPT) from Nonmem and how they are applied in R.NMinfo(res1,"tables")
: What tables were read and how?NMinfo(res1,"columns")
: What columns were read from what tables?
:::::::: column \tiny
class(res1)
## [1] "NMdata" "data.table" "data.frame"
NMinfo(res1,"details")
## $model
## [1] "xgxr003"
##
## $call
## [1] "NMscanData(file1.lst, merge.by.row = TRUE, as.fun = \"data.table\", "
## [2] " quiet = TRUE)"
##
## $time.NMscanData
## [1] "2022-06-30 09:30:00 EDT"
##
## $file.lst
## [1] "C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr003.lst"
##
## $file.mod
## [1] "C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr003.mod"
##
## $time.ok
## [1] "Not checked"
##
## $dir.data
## NULL
##
## $input.used
## [1] TRUE
##
## $rows.recovered
## [1] FALSE
##
## $merge.by.row
## [1] TRUE
##
## $col.row
## [1] "ROW"
##
## $file.input
## [1] "C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/data/xgxr1.csv"
##
## $logtime.lst
## [1] NA
##
## $mtime.lst
## [1] "2022-06-07 07:30:06 EDT"
##
## $method.time.lst
## [1] NA
##
## $mtime.mod
## [1] "2022-06-07 07:30:06 EDT"
##
## $mtime.input
## [1] "2022-06-07 07:30:06 EDT"
##
## $logtime.input
## [1] NA
##
## $method.time.inp
## [1] NA
::::
:::
\framesubtitle{What data was read?}
::: columns :::: column
\scriptsize
NMinfo(res1,"tables")
## source name nrow ncol nid level
## 1: output xgxr003_res.txt 905 7 NA row
## 2: output xgxr003_res_vols.txt 905 7 150 row
## 3: output xgxr003_res_fo.txt 150 2 150 id
## 4: input xgxr1.csv 1502 24 150 <NA>
## scope has.col.row has.col.id full.length filetype
## 1: all TRUE FALSE TRUE output
## 2: all FALSE TRUE TRUE output
## 3: firstonly FALSE TRUE FALSE output
## 4: <NA> TRUE TRUE NA text
## format file.mtime
## 1: 2022-06-07 07:30:06
## 2: tF13.4 2022-06-07 07:30:06
## 3: ,1PE15.8 2022-06-07 07:30:06
## 4: <NA> 2022-06-07 07:30:06
## file
## 1: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr003_res.txt
## 2: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr003_res_vols.txt
## 3: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/nonmem/xgxr003_res_fo.txt
## 4: C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/data/xgxr1.csv
## noheader file.logtime
## 1: FALSE NA
## 2: FALSE NA
## 3: FALSE NA
## 4: NA NA
:::: :::: column
(The nrows
and topn
arguments are arguments to print.data.table
to get a top and bottom snip of the table.)
\scriptsize
print(NMinfo(res1,"columns"),nrows=20,topn=10)
## variable file source level COLNUM
## 1: ROW xgxr003_res.txt output row 1
## 2: ID xgxr003_res_vols.txt output row 2
## 3: NOMTIME xgxr1.csv input row 3
## 4: TIME xgxr1.csv input row 4
## 5: EVID xgxr1.csv input row 5
## 6: CMT xgxr1.csv input row 6
## 7: AMT xgxr1.csv input row 7
## 8: DV xgxr003_res.txt output row 8
## 9: FLAG xgxr1.csv input row 9
## 10: STUDY xgxr1.csv input row 10
## ---
## 33: model <NA> NMscanData model 33
## 34: nmout <NA> NMscanData row 34
## 35: DV xgxr003_res_vols.txt output row NA
## 36: PRED xgxr003_res_vols.txt output row NA
## 37: RES xgxr003_res_vols.txt output row NA
## 38: WRES xgxr003_res_vols.txt output row NA
## 39: ROW xgxr1.csv input row NA
## 40: ID xgxr1.csv input row NA
## 41: DV xgxr1.csv input row NA
## 42: ID xgxr003_res_fo.txt output id NA
:::: :::
\framesubtitle{Check of usual suspect: DATA}
::: columns
:::: column
NMcheckColnames
lists column names
- As in input data set
- As in Nonmem $DATA
- As inferred by NMscanInput
(and NMscanData
)
This will help you easily check if $DATA
matches the input data file.
This is a new function that will be available in the next NMdata
release
A more advanced idea is some automated guessing if mistakes were made. This is currently not on the todo list
::::
:::: column
In this case, input column names are aligned with $DATA
\footnotesize
NMcheckColnames(lst)
## Read rds input data file:
## C:/Users/delff/OneDrive - Vertex Pharmaceuticals/Documents/R/win-library/4.1/NMdata/examples/data/xgxr2.rds.
## datafile INPUT nonmem result compare
## 1: ROW ROW ROW ROW OK
## 2: ID ID ID ID OK
## 3: NOMTIME NOMTIME NOMTIME NOMTIME OK
## 4: TIME TIME TIME TIME OK
## 5: EVID EVID EVID EVID OK
## 6: CMT CMT CMT CMT OK
## 7: AMT AMT AMT AMT OK
## 8: DV DV DV DV OK
## 9: FLAG FLAG FLAG FLAG OK
## 10: STUDY STUDY STUDY STUDY OK
## 11: BLQ BLQ BLQ BLQ OK
## 12: CYCLE CYCLE CYCLE CYCLE OK
## 13: DOSE DOSE DOSE DOSE OK
## 14: PART PART PART PART OK
## 15: PROFDAY PROFDAY PROFDAY PROFDAY OK
## 16: PROFTIME PROFTIME PROFTIME PROFTIME OK
## 17: WEIGHTB WEIGHTB WEIGHTB WEIGHTB OK
## 18: eff0 eff0 eff0 eff0 OK
## 19: EVENTU <NA> <NA> EVENTU <NA>
## 20: NAME <NA> <NA> NAME <NA>
## 21: TIMEUNIT <NA> <NA> TIMEUNIT <NA>
## 22: TRTACT <NA> <NA> TRTACT <NA>
## 23: flag <NA> <NA> flag <NA>
## 24: trtact <NA> <NA> trtact <NA>
## datafile INPUT nonmem result compare
\normalsize :::: :::
NMscanData
?The answer to this should be as close to "nothing" as possible - that's more or less the aim of the function.
(As always) you just have to make sure that the information that you need is present in input data and output data.
No need to output information that is unchanged from
input, but make sure to output what you need (like IPRED
, CWRES
, CL
,
ETA1
etc which cannot be found in input). Always output the row identifier!
Some of these values can be
found from other files generated by Nonmem
but notice: NMscanData
only uses input and output data.
Including a unique row identifier in both input and output data is the most robust way to combine the tables.
I would not take "most likely" when robustness is available.
In firstonly
tables, include the subject ID or the row identifier.
NMscanData
limitationsThe most important limitation to have in mind is not related to NMscanData
iteself
NMfreezeModels
does that and will be included in NMdata
after a little more testing.Nonmem
can be run in a wrapper script that either copies the input
data, or runs NMscanData
and saves the output in a compressed file
format (like rds
). Even if limitations of NMscanData
may be several, they are all rare. There is a very good chance you will never run into any of them.
Not all data filter statements implemented. Nested ACCEPT
and IGNORE
statements are not supported at this
point. The resulting number of rows after applying filters is checked
against row-level output table dimensions (if any available).
Disjoint rows with common ID
values are currently not supported together with firstonly
or lastonly
tables. This is on the todo list.
The RECORDS
and NULL
options in $DATA
are not implemented. If using
RECORDS
, please use the col.row
option to merge by a unique row
identifier.
Character time variables not interpreted. If you need this, we can implement it relatively easily.
Only output tables returning either all rows or one row per
subject can be merged with input. Tables written with options like
FIRSTLASTONLY
(two rows per subject) and OBSONLY
are disregarded
with a warning (you can read them with NMscanTables
). LASTONLY
is
treated like FIRSTONLY
, i.e. as ID-level information if not
available elsewhere.
NMscanData
uses a few simpler functions to read all the data it can find. These functions may be useful when you don't want the full automatic package provided by NMscanData
.
NMreadTab
TABLE NO.
" counterNMscanTables
(uses NMreadTab
)NMreadCsv
NMscanInput
(uses NMreadCSV
)\framesubtitle{Tailor NMdata
default behavior to your setup and preferences}
::: columns
:::: column
NMdataConf
supports changing many default argument values, simplifying coding.
Notice, values are reset when library(NMdata)
or NMdataConf(reset=TRUE)
are called.
See all currently used values by NMdataConf()
.
::::
:::: column My initialization of scripts often contain this:
library(NMdata)
NMdataConf(as.fun="data.table"
### this is the default value
,col.row="ROW"
### Recommended but _right now_ not default
,merge.by.row=TRUE
### You can switch this when script is final
,quiet=FALSE)
::::
:::
Other commonly used settings in NMdataConf
are
as.fun
: a function to apply to all objects before returning them from NMdata
functions. If you use dplyr/tidyverse
, do (notice, no quotes!):library(tibble)
NMdataConf(as.fun=tibble::as_tibble)
use.input
: Should NMscanData
combine (output data) with input data? (default TRUE
)
recover.rows
: Should NMscanData
Include rows not processed by Nonmem? (default FALSE
).
file.mod
: A function that translates the list file path to the input control stream file path. Default is to replace extension with .mod
.
check.time
: Default is TRUE
, meaning that output (list file and tables) are expected to newer than input (control stream and input data). If say you copy files between systems, this check may not make sense.
NMdata
not use options()
?R
has a system for handling settings. NMdata
does not use that.
NMdataConf
can check both setting/argument names and values for consistency. try(NMdataConf(asfun=tibble::as_tibble))
## Error : Option not found
try(NMdataConf(use.input="FALSE"))
## Error : use.input must be logical
NMdataConf
:NMdataConf(reset=TRUE)
NMdataConf(use.input=NULL, as.fun=NULL)
NMdataConf()
NMdata
qualified?\includegraphics[width=.8\textwidth]{badges_snip_210623}
NMdata
contains very little calculations (only exception may be flagsAssign
/flagsCount
)
Historic bugs have mostly resulted in uninformative errors due to e.g. failure in processing text. Never a wrong data set.
NMdata
includes 192 "unit tests" where results of function calls
with different datasets and arguments are compared to expected
results
Tests are consistently run before any release of the package
The tests are crucial in making sure that fixing one bug or introducing a new feature does not introduce new bugs
The testing approach is as recommended in "R packages" by Hadley Wickham and Jennifer Bryan https://r-pkgs.org/tests.html.
If you have a specific example you want to make sure is tested in the package, we will include the test in the package
NMdata
NMdata
NMdata
more accessible and usefulIf you have ideas you want to contribute, let's discuss!
Additional features
NMfreezeModels
: Save Nonmem models with input data and all results to ensure reproducibility of output
::: columns :::: column Data creation
renameByContents
- compareCols
mergeCheck
flagsAssign
/flagsCount
NMorderColumns
NMcheckData
)NMwriteData
NMstamp
/NMinfo
Read/write Nonmem control streams
NMreadSection
/NMwriteSection
::::
:::: column
Retrieve data from Nonmem runs
NMscanData
summary
, NMinfo
NMscanInput
, NMreadCsv
NMscanTables
, NMreadTab
NMcheckColnames
Adjust behavior to your preferences
NMdataConf
Other
NMfreezeModels
):::: :::
NMdata
functions under development:::::::::::::: {.columns} ::: {.column width="85%"} In order to ensure reproducibility, any output has to be produced based on arvhived/frozen Nonmem models. ::: ::: {.column width="15%"} \includegraphics[width=.5in]{figures/worksign.png} ::: ::::::::::::::
The components that need to be "frozen" are
NMfreezeModels
does freeze
Limitations
:::::::::::::: {.columns} ::: {.column width="85%"} * A function to read frozen Nonmem results and mrgsolve code to ensure that the right simulation model and parameter values are used
ggwrite
: Saves images in sizes made for powerpoint, including stamps (time, source, output filename). It can save multiple plots at once as one file (pdf) or multiple files. ggwrite
: Flexible saving of tracable output::: columns
:::: column
ggwrite
is a wrapper of png
and pdf
(and dev.off
) with convenience features such as
?canvasSize
)save
and show
arguments for very simple conditional behaviorsave
defaults to TRUE
if a filename is givenshow
defaults to the inverse of save
::::
:::: column
Save pls1, as one file and as multiple files, named by the dose levels.
\footnotesizewriteOutput <- TRUE
script <- "path/to/script.R"
ggwrite(pls1,file="results/individual_profiles.png",
stamp=script,canvas="wide-screen",useNames=TRUE,
save=writeOutput)
ggwrite(pls1,file="results/individual_profiles.pdf",
stamp=script,canvas="wide-screen",useNames=TRUE,
save=writeOutput,onefile=TRUE)
list.files("results")
results/individual_profiles_trtact300mg.png
:
\begin{center}
\includegraphics[trim={30cm 0 0 14cm},clip]{results/individual_profiles_trtact300mg.png}
\end{center}:::: :::
execSafe
: Save input data with each Nonmem runAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.