PD Modeling

library(knitr)
opts_chunk$set(fig.path='plots/',fig.align='center',fig.show='hold',size='footnotesize',cache=F)

library(ggplot2)
theme_set(theme_bw(base_size=14)+
          theme(legend.position="bottom")+
          ## purple facet strip label background and white text
          theme(strip.background =element_rect(fill="#52247F")
               ,strip.text=element_text(color = "#ffffff"))
          )
## theme_set(theme_bw(base_size=14)+
##           theme(legend.position="bottom")+
##           theme(strip.background =element_rect(fill="#ff9dff"))
##           )
library(data.table)
unloadNamespace("NMdata")

## some setup
options(width=60)  # make the printing fit on the page
set.seed(1121)   # make the results repeatable

### shortcuts to examples in NMdata
file.data <- function(...) system.file("examples/data",..., package="NMdata")
file.nm <- function(...) system.file("examples/nonmem",..., package="NMdata")

```{css, echo=FALSE} .watch-out { background-color: lightpink; border: 3px solid red; font-weight: bold; } .smaller { background-color: lightgreen; font-size: 4pt; }

## Outline
\tableofcontents[hideallsubsections]


# Introduction

## What is NMdata?
::: columns
:::: column
### NMdata is 

An R package that can help

*  Creating and checking event-based data sets for PK/PD modeling
*  Keeping Nonmem code updated to match contents of datasets
*  Read all output data and combine with input data from Nonmem runs
- supply output list file (.lst), and the reader is very flexible and automated 

Designed to fit in to the user's setup and coding preferences

* NMdata comes with a configuration tool that can be used to tailor default behaviour to the user's system configuration and preferences.

::::

:::: column

### NMdata is not

* A plotting package

* A tool to retrieve details about model runs

* A calculation or simulation toolbox

* A "silo" that requires you to do things in a certain way
- No tools in NMdata requires other NMdata tools to be used

::::
:::

$$\vspace{.01in}$$

* The data creation tools should be relevant independently of estimation/simulation tool.
* Latest stable release is 0.0.12 and is available on CRAN and MPN (starting from 2022-06-15 snapshot).

## NMdata 0.0.12 on MPN
\includegraphics[width=3.5in]{figures/nmdata_mpn_2022-06-27 22-03-35.png}


<!-- ## Who can find NMdata useful? -->

<!-- * The data set creation tools are relevant no matter the estimation and simulations tools. -->

<!-- * Nonmem users will find additional tools for handling the exchange of data between R and Nonmem. -->


<!-- ## About the author -->
<!-- * Pharmacometrician with experience from biostatistics -->

<!-- * Background in engineering, experience as system administrator, 15 years of R experience -->

<!-- * Very concerned with code robustness and ensuring data consistency. -->

<!-- * Authored an R package on safe data transfer from SAS to R and one on survival analysis.  -->

<!-- I hate being stuck in leg work and having too little time for modeling, -->
<!-- reflection, and understanding key questions. `NMdata` is a big help for -->
<!-- me personally in freeing time to more high-level tasks. -->


<!-- Lots of work missing on this one -->
<!-- ## Motivation -->

<!-- PK/PD modeling is technically extremely heavy. We want to do provide clarity to decision making, but spend a lot of our time in deep mud. -->

<!-- `NMdata` is my humble experience collected in efficient functions that fill some holes and help with some of the most annoying design -->

## How to update to recent MPN snapshot
Update the `pkgr.yml` file: 
(example: `prod_vx123_001_analysis/trunk/analysis/vx_123_001_project/pkgr.yml`)

Version: 1 Threads: 1 Packages: - NMdata

this will allow packages to be cached across projects for faster installs on new projects

Cache: /data/prod_vx708_001_pkgcache-2022-06-15 Repos: - MPN: https://mpn.metworx.com/snapshots/stable/2022-06-15 Lockfile: Type: renv

Then go to `prod_vx123_001_analysis/trunk/analysis/vx_123_001_project` and install/update packages from the linux terminal (not R):

$ cd /data/prod_vx123_001_analysis/trunk/analysis/vx_123_001_project $ pkgr --update install

## Motivation
* The workflow of a pharmacometrician is very technical, with many risks of errors.

* Technical workload takes time from modeling, reflection, and
understanding key questions. 

* During the first 2-3 years I spent in pharmacometrics, I must have spent half the time coding, desparately trying to get Nonmem to behave, and to understand the properties of the estimates I obtained.

* Most of us develop our own ways to avoid some of the many
difficulties in this process. This takes a lot of time and is most
often only because we don't have adequate tools at hand. Or don't know them.

* I generalized some of my solutions and collected them in `NMdata`.

* Almost every single line of code in the package is motivated by bad
experiences. Errors, fear of errors, time wasted on debugging and
double checking.

* I have no intention of missioning these approaches to others. But if
you find something interesting, feel free to take advantage.


<!-- This could become a good slide, but so far not ready at all -->
<!-- ## Overview of NMdata functionality -->
<!-- * Data creation -->
<!-- - Checking of compatibility of data.frames. -->
<!-- - Merge with automated checks  -->

<!-- * Nonmem control stream editing -->

<!-- * Retrieve data from Nonmem -->




## Getting started
Install from `CRAN` or from `MPN` using `pkgr`.
```r
library(NMdata)

library(devtools)
load_all()

NMdataConf(check.time=FALSE)
NMdataConf(as.fun="data.table")

Three vignettes are available so far (see "Vignettes" tab when visiting URL above):

For a quick overview (after installation), do:

help(package="NMdata")

All functions and their arguments are documented in ?manual pages.

pk <- readRDS(file=system.file("examples/data/xgxr2.rds",package="NMdata"))
pk[,trtact:=NULL]
## will create this in the example
pk[,ROW:=NULL]

pk.reduced <- copy(pk)
pk.reduced <- pk.reduced[1:(.N%/%2)]
pk.reduced[,CYCLE:=NULL]
pk.reduced[,AMT:=as.character(AMT)]

Data set creation

Compare compatibility of data sets for rbind and merge: `compareCols`

::: columns :::: column

In order to rbind or merge data sets, they must be compatible in
presence of columns, depending of desired outcome
equally importantly, the classes of the common columns.
compareCols provides an overview of these properties for any number of data sets.
By default, only discrepancies are returned.
Using diff.only=FALSE will give the complete list of columns in the two datasets.

A slightly modified version of the pk dataset has been created.

Rows have been omitted
CYCLE has been removed, and
AMT has been recoded to character

opt.old <- options(width=50)

:::: :::: column

compareCols(pk,pk.reduced)

options(opt.old)

\vspace{12pt}

Before merging or stacking, we may want to

recode AMT in one of the datasets to get the class we need
decide what to do about the missing CYCLE in one of the datasets

:::: :::

Keep track of missing values

missings <- listMissings(pk)
head(missings)

You can specify

What columns to search
A grouping variable for findings (i.e. study if combining datasets)
The strings that are interpreted as missing.

From ?listMissings:

Usage:

     listMissings(data, cols, by, na.strings = c("", "."), quiet = FALSE, as.fun)

Rename columns based on contents

::: columns :::: column

renameByContents

Nonmem almost entirely relies on numeric data values.
The source data will often contain character variables, i.e. columns with non-numeric data values. We want to use these and other non-numerics in post-processing.
If the column names reflect whether the values are numeric, mistakes and double-checking can be avoided.
renameByContents renames columns if a function of their contents returns TRUE.

`NMisNumeric`

NMisNumeric is a function that tests if the contents are numeric to Nonmem.
Subject ID "1039" (character class) will be a numeric in Nonmem, "1-039" will not.
We invert that, and those that Nonmem cannot interpret as numeric become lowercase.

::::

:::: column All column names are capital case. We rename to lowercase those that Nonmem will not be able to interpret as numeric. \footnotesize

pk.old <- copy(pk)

pk <- renameByContents(data=pk,
                       fun.test=NMisNumeric,
                       fun.rename = tolower,
                       invert.test = TRUE)

compareCols shows that four columns were renamed:

compareCols(pk.old,pk)

\normalsize ::::

:::

Automated checking of merges

Merges are a very common source of data creation bugs.
Merges likely leave you with an unexpected number of rows, some repeated or some omitted.
mergeCheck is a wrapper of merge which only accepts the results if

The rows that come out of the merge are the exact same as in one of the existing datasets, only columns added from the second dataset

This limitation of the scope of the merge allows for a high degree of automated checks of consistency of the results.
This is not to say that merges beyond the scope of mergeCheck are relevant or necessary. But if mergeCheck covers your needs, it's a real time saver in terms of automated checks.

mergeCheck is not a new implementation of merge. It's an implementation of checks.

mergeCheck uses merge.data.table. The contribution is the checks that no rows are lost, duplicated or added.
The order of rows in the resulting data is always the same as the first dataset supplied.

Is mergeCheck slower?

If you don't use data.table already, mergeCheck is likely to be way faster than what you use already.
The checking overlay should be neglegible.
If checks fail, an additional merge is done to help user identify problems. This may cost significant additional time but is likely to save you coding and (at least) the same calculation time anyway.

mergeCheck

\framesubtitle{Example: Would your standard checks of merges capture this?}

dt.cov <- pk[,.(ID=unique(ID))]
dt.cov[,COV:=sample(1:5,size=.N,replace=TRUE)]
dt.cov <- dt.cov[c(1,1:(.N-1))]

Say we want to add a covariate from a dt.cov. We expect the number of rows to be unchanged from pk. mergeCheck more strictly requires that we get all and only the same rows:

::: columns :::: column

Without `mergeCheck`

\footnotesize

## The resulting dimensions are correct
pkmerge <- merge(pk,dt.cov,by="ID")
dims(pk,dt.cov,pkmerge)

## But we now have twice as many rows for this subject
dims(pk[ID==31],pkmerge[ID==31])

::: :::: column

`mergeCheck` throws an error

...and suggests what is wrong \footnotesize

try(mergeCheck(pk,dt.cov,by="ID"))

:::: \normalsize :::

Conclusion

If you only want to add columns by a merge, mergeCheck does all the necessary checks for you.

Exclusion flags

\framesubtitle{Keep track of data exclusions - don't discard!}

It is good practice not to discard unwanted records from a dataset but to flag them and omit them in model estimation.
When reporting the analysis, we need to account for how many data records were discarded due to which criteria.
The implementation in NMdata is based on sequentially checking exclusion conditions.
The information is represented in one numerical column for Nonmem, and one (value-to-value corresponding) character column for the rest of us.

FlagsAssign

::: columns :::: column

flagsAssign applies the conditions sequentially, by increasing or decreasing value of FLAG.
You can use any expression that can be evaluated row-wise within the data.frame. In this case, BLQ has to exist in pk.
If you need to evaluate a condition based on multiple rows (say inadequate dosing history for a subject), do that first, and include a column representing this condition.
FLAG=0 means that none of the conditions were met and row is kept in analysis. This cannot be customized.
In Nonmem, you can include IGNORE=(FLAG.NE.0) in $DATA or $INFILE.

::::

:::: column \footnotesize

pk[,`:=`(FLAG=NULL,flag=NULL)]

dt.flags <- fread(text="FLAG,flag,condition
10,Below LLOQ,BLQ==1
100,Negative time,TIME<0")

pk <- flagsAssign(pk,tab.flags=dt.flags,subset.data="EVID==0")
pk <- flagsAssign(pk,subset.data="EVID==1",flagc.0="Dosing")

::::

:::

`flagsCount`

An overview of the number of observations disregarded due to the different conditions is then obtained using flagsCount:
flagsCount includes a file argument to save the the table right away.

opts <- options(width=100)

\footnotesize

flagsCount(data=pk[EVID==0],tab.flags=dt.flags)

options(opts)

Now pick the columns you want and format your table for the report.

Finalize data for Nonmem

Advice: always include a unique row identifier

::: columns :::: column

Why

A unique identifier is needed in order to

Track rows in analysis data back to source data
Reliably combine (by merge) output with input data

The identifier should be

Numeric
For Nonmem to be able to read it
Integer
To avoid risk of rounding
It is not a problem if represented as a double in R
Increasing
Not strictly necessary
Avoid confusion
May be useful for post-processing to have a single column to order by

:::: :::: column

Sort rows and add a row counter

with data.table

## order
setorder(pk,ID,TIME,EVID)
## add counter
pk[,ROW:=.I]

Or, with dplyr (I'm not very familiar with dplyr)

pk <- pk %>%
    arrange(ID,TIME,EVID) %>%
    mutate(ROW=1:n())

:::: :::

NMorderColumns

::: columns :::: column \vspace{12pt} * The order of columns in Nonmem is important for two reasons.

Non-numeric Characters in a variable read into Nonmem will make the run fail
The number of variables you can read into Nonmem is restricted (may not apply to recent Nonmem versions)
NMorderColumns uses a mix of recognition of column names and analysis of the column contents to sort the columns.
First: Standard columns (ID, TIME, EVID etc.) and usable columns first
Columns that cannot be converted to numeric are put in the back
Additional columns to place earlier (argument first) or late (last) can be specified.
See ?NMorderColumns for more options.
NMorderColumns does not sort rows, nor does it modify any contents of columns.

:::: :::: column \footnotesize

pk.old <- copy(pk)
pk <- NMorderColumns(pk,first="WEIGHTB")

\normalsize We may want to add MDV and rerun NMorderColumns. \footnotesize

data.table(old=colnames(pk.old),new=colnames(pk))

:::: \normalsize :::

`NMcheckData`: Check data syntax for Nonmem compatibility

Aim: check data for all potential Nonmem compatibility issues and other obvious errors.
Findings must be returned in a structure so related subsets of the data can easily be identified for further inspection.

::: columns :::: column * NMcheckData contains a very long list of checks of especially the standard Nonmem columns (ID, TIME, EVID, AMT, DV, MDV, RATE, SS, etc.). They are all checked for allowed values (e.g. TIME must be non-negative, EVID must be one of 0:4, etc).

ID-level checks (e.g. did all ID's receive doses, is time increasing, are rows disjoint?)
All used columns are checked for Nonmem compatibility in terms of how Nonmem translates to numeric values.
Column names are checked for uniqueness and for non-allowed characters.
If you supply the col.usubjid column, the ID column is checked to align with col.usubjid.
NMcheckData is based on simple framework making it simple to define new checks.

:::: :::: column \scriptsize

pk <- pk[ID>59]
res.check <- NMcheckData(pk)
res.check
pkmod <- copy(pk)
pkmod[,MDV:=as.numeric(is.na(DV))]
pkmod[ID==60&EVID==1,CMT:=NA]
res.check <- NMcheckData(pkmod)
res.check

:::: :::

NMwriteData

::: columns :::: column

For the final step of writing the dataset, NMwriteData is provided.

NMwriteData never modifies the data.
Checks character variables for Nonmem compatibility (commas not allowed)
writes a csv file with appropriate options for Nonmem compatibility
Default is to also write an rds file for R
Contents identical to R object including all information (such as factor levels) which cannot be saved in csv.
If you use NMscanData to read Nonmem results, this information can be used automatically.
Provides a proposal for text to include in the $INPUT and $DATA sections of the Nonmem control streams.

The csv writer is very simple

These are the only steps involved between the supplied data set and the written csv.

scipen is small to maximize precision.

\footnotesize

file.csv <- fnExtension(file,".csv")
fwrite(data,na=".",quote=FALSE,row.names=FALSE,scipen=0,file=file.csv)

\normalsize

All arguments to fwrite can be modified using the args.fwrite argument.

:::: :::: column \footnotesize

NMwriteData(pk,file="derived/pk.csv")

\normalsize

\vspace{12pt}

eff0 is the last column in pk that Nonmem can make use of (remember NMisNumeric from earlier?)
NMwriteData detected the exclusion flag and suggests to include it in $DATA.

:::: :::

Update Nonmem control streams

::: columns :::: column

NMwriteSection is a function that replaces sections (like $DATA or $TABLE) of nonmem control streams.
NMwriteData returns a list that can be directly processed by NMwriteSection
In NMwriteData, several arguments modify the proposed text the proposed text for the Nonmem run, see ?NMwriteData.

Tips

NMwriteData is very useful for many other sections, like $TABLE, or even $PK. But not $THETA and $OMEAGE (because they are specific to each model).
NMwriteData by defaults saves a backup of the overwritten control streams.
NMwriteData has a section reader counterpart in NMreadSection
NMextractDataFile takes a control stream/list file and extracts the input data file name/path. You can use this to identify the model runs in which to update $DATA.

:::: :::: column

\footnotesize

nmCode <- NMwriteData(pk,file="derived/pk.csv",
                      write.csv=FALSE,
### arguments that tailors text for Nonmem
                      nmdir.data="../derived",
                      nm.drop="PROFDAY",
                      nm.copy=c(CONC="DV"),
                      nm.rename=c(BBW="WEIGHTB"),
                      ## PSN compatibility
                      nm.capitalize=TRUE)

## example: pick run1*.mod
models <- list.files("../models",
                     pattern="run1.+\\.mod$",
                     full.names=T)
## update $INPUT and $DATA
lapply(models,NMwriteSection,list.sections=nmCode)
## update $INPUT 
lapply(models,
       NMwriteSection,section="INPUT",newlines=nmCode$INPUT)

## example: pick run1*.mod
NMwriteSection(dir="../models",
               file.pattern="run1.+\\.mod$",
               section="INPUT",
               newlines=nmCode$INPUT)

\normalsize

:::: :::

Automated documentation of data

\framesubtitle{Ensure that the data can be traced back to the data generation script}

::: columns :::: column * If the argument script is supplied to NMwriteData, a little meta information is saved together with the output file(s).

For csv files, the meta data is written to a txt file next to the csv file.
For rds files, the meta data is attached to the object saved in the rds file.
NMstamp is used under the hood. You can use NMstamp on any R object to attach similar meta information.
Additional arguments (essentially anything) can be passed from NMwriteData to NMstamp using the argument args.stamp.
NMstamp and NMinfo write and read an "attribute" called NMdata.

:::: :::: column

\footnotesize

NMwriteData(pk,file="derived/pk.csv",
            script = "NMdata_Rpackage.Rmd",quiet=T)
list.files("derived")
## NMreadCsv reads the metadata .txt file if found
pknm <- NMreadCsv("derived/pk.csv")
NMinfo(pknm)
## The .rds file contains the metadata already
pknm2 <- readRDS("derived/pk.rds")
NMinfo(pknm2)

:::: :::

\normalsize

Retrieving data from Nonmem runs

NMscanData

NMscanData is an automated and general reader of Nonmem.

Returns one data set combining all information from input data and all output tables. Performs multiple consistency checks.

Based on the list file (.lst) it will:

Read and combine output tables
If wanted, read input data and restore variables that were not output from the Nonmem model
If wanted, also restore rows from input data that were disregarded in Nonmem (e.g. observations or subjects that are not part of the analysis)

\pause \footnotesize

::: columns :::: column

file1.lst <- system.file("examples/nonmem/xgxr003.lst",
                         package="NMdata")
res0 <- NMscanData(file1.lst,merge.by.row=FALSE)

:::: \pause :::: column

class(res0)
dims(res0)
head(res0,n=2)

\normalsize :::: :::

Remember the unique row identifier

Using a unique row identifier for merging data is highly recommended:

\footnotesize

res1 <- NMscanData(file.nm("xgxr001.lst"),merge.by.row=TRUE)
class(res0)

\normalsize

The default behavior will be to merge by col.row if found.
Default value of col.row is ROW. We shall see later how to modify this.

NMscanData

\framesubtitle{Example: quickly get from a list file to looking at the model}

\footnotesize :::::::::::::: {.columns} ::: {.column width="45%"}

## Using data.table for easy summarize
res1 <- NMscanData(file1.lst,merge.by.row=TRUE,
                   as.fun="data.table",quiet=TRUE)
## Derive geometric mean pop predictions by
## treatment and nominal sample time. Only
## use sample records.
res1[EVID==0,
     gmPRED:=exp(mean(log(PRED))),
     by=.(trtact,NOMTIME)]

::: ::: {.column width="55%"}

\normalsize

## plot individual observations and geometric
## mean pop predictions. Split (facet) by treatment.
ggplot(subset(res1,EVID==0))+
    geom_point(aes(TIME,DV))+
    geom_line(aes(NOMTIME,gmPRED),colour="red")+
    scale_y_log10()+
    facet_wrap(~trtact,scales="free_y",ncol=2)+
    labs(x="Hours since administration",
         y="Concentration (ng/mL)")

::: ::::::::::::::

Recover discarded rows

:::::::::::::: {.columns} ::: {.column width="45%"}

NMdataConf(as.fun="data.table")
system.file("examples/nonmem/xgxr014.lst", package="NMdata")

\footnotesize

res2 <- NMscanData(file1.lst,
                   merge.by.row=TRUE,recover.rows=TRUE)

No information is carried from output tables to recovered input data rows. For instance, it could make sense to merge back unique values within subjects (like subject level parameter estimates). Such "back-filling" must be done manually.

:::

::: {.column width="55%"}

## Derive another data.table with geometric mean pop predictions by
## treatment and nominal sample time. Only use sample records.
res2[EVID==0&nmout==TRUE,
                  gmPRED:=exp(mean(log(PRED))),
                  by=.(trtact,NOMTIME)]
## plot individual observations and geometric mean pop
## predictions. Split by treatment.
ggplot(res2[EVID==0])+
    geom_point(aes(TIME,DV,colour=flag))+
    geom_line(aes(NOMTIME,gmPRED))+
    scale_y_log10()+
    facet_wrap(~trtact,scales="free_y",ncol=2)+
    labs(x="Hours since administration",y="Concentration (ng/mL)")

::: ::::::::::::::

Compare models using `NMscanMultiple`

A wrapper of NMscanData that reads and stacks multiple models.

:::::::::::::: {.columns} ::: {.column width="45%"} \footnotesize

NMdataConf(as.fun="data.table")
NMdataConf(col.row="ROW")
NMdataConf(merge.by.row=TRUE)

## notice fill is an option to rbind with data.table
lst.1 <- system.file("examples/nonmem/xgxr001.lst",
                     package="NMdata")
lst.2 <- system.file("examples/nonmem/xgxr014.lst",
                     package="NMdata")
res1.m <- NMscanData(lst.1,quiet=TRUE)
res2.m <- NMscanData(lst.2,quiet=TRUE,
                     modelname="single-compartment")

res.mult <- rbind(res1.m,res2.m,fill=T)
res.mult[EVID==0&nmout==TRUE,
         gmPRED:=exp(mean(log(PRED))),
         by=.(model,trtact,NOMTIME)]
## NMdata class gone because of rbind
class(res.mult)

A simple example comparing a single-compartment and a two-compartment model.

models <- file.nm(c("xgxr001.lst","xgxr014.lst"))

res.mult <- NMscanMultiple(files=models,quiet=T)
## Deriving geometric mean PRED vs time for each
## model and treatment
res.mult[EVID==0&nmout==TRUE,
         gmPRED:=exp(mean(log(PRED))),
         by=.(model,trtact,NOMTIME)]

NMscanMultiple can search for models by matching file names to a regular expression, similarly to NMwriteSection.

::: ::: {.column width="55%"} \normalsize

ggplot(res.mult,aes(NOMTIME,gmPRED,colour=model))+
    geom_point(aes(TIME,DV),
               alpha=.5,colour="grey")+
    geom_line(size=1.1)+
    scale_y_log10()+
    labs(x="Hours since administration",y="Concentration (ng/mL)")+
    facet_wrap(~trtact,scales="free_y",ncol=2)

::: ::::::::::::::

Preserve all input data properties

::: columns :::: column By default, NMscanData will look for an rds file next to the csv file (same file name, only extension .rds different).

If this is found, it will be read, providing an enriched (e.g. conserving factor levels and any other information).
There are no checks of consistency of rds file against delimited file read by Nonmem.
I am interested in ideas on how to do this. If we can avoid reading the csv file, it would be highly prefered.
You get the rds automatically if using NMwriteData.
Disable looking for the rds by argument use.rds=FALSE.
Default value of use.rds can be modified with NMdataConf.

:::: :::: column The plots are correctly ordered by doses - because they are ordered by factor levels as in rds input data.

\footnotesize

lst <- system.file("examples/nonmem/xgxr014.lst",
                   package="NMdata")
res14 <- NMscanData(lst,quiet=TRUE)

## Derive another data.table with geometric mean pop predictions by
## treatment and nominal sample time. Only use sample records.
res14[EVID==0&nmout==TRUE,
                  gmPRED:=exp(mean(log(PRED))),
                  by=.(trtact,NOMTIME)]
## plot individual observations and geometric mean pop
## predictions. Split by treatment.
ggplot(res14[EVID==0])+
    geom_point(aes(TIME,DV,colour=flag))+
    geom_line(aes(NOMTIME,gmPRED))+
    scale_y_log10()+
    facet_wrap(~trtact,scales="free_y",ncol=2)+
    labs(x="Hours since administration",y="Concentration (ng/mL)")

:::: :::

\normalsize

The NMdata class

::: columns :::: column Most important message: an NMdata object can be used as if it weren't.

Methods defined for NMdata:

summary: The information that is written to the console if quiet=FALSE.

Simple other methods like rbind and similar are defined by dropping the NMdata class and then perform the operation.

NMinfo lists metadata from NMdata objects and only works on NMdata objects. Components in metadata are (as available):
NMinfo(res1,"details"): How was the data read and combined?
NMinfo(res1,"dataCreate"): Meta data found attached to the input data file.
NMinfo(res1,"input.colnames"): The translation table of input column names from input to output
NMinfo(res1,"input.filters"): The "filters" (IGNORE/ACCEPT) from Nonmem and how they are applied in R.
NMinfo(res1,"tables"): What tables were read and how?
NMinfo(res1,"columns"): What columns were read from what tables? ::::

:::: column \tiny

class(res1)
NMinfo(res1,"details")

::::

:::

The NMdata class

\framesubtitle{What data was read?}

::: columns :::: column

Table-specific information

\scriptsize

NMinfo(res1,"tables")

:::: :::: column

Column-specific information

(The nrows and topn arguments are arguments to print.data.table to get a top and bottom snip of the table.) \scriptsize

print(NMinfo(res1,"columns"),nrows=20,topn=10)

:::: :::

What to do when Nonmem results seem meaningless?

\framesubtitle{Check of usual suspect: DATA}

::: columns :::: column NMcheckColnames lists column names - As in input data set - As in Nonmem $DATA - As inferred by NMscanInput (and NMscanData) This will help you easily check if $DATA matches the input data file. This is a new function that will be available in the next NMdata release A more advanced idea is some automated guessing if mistakes were made. This is currently not on the todo list :::: :::: column In this case, input column names are aligned with $DATA \footnotesize

NMcheckColnames(lst)

\normalsize :::: :::

What should I do for my models to be compatible with `NMscanData`?

The answer to this should be as close to "nothing" as possible - that's more or less the aim of the function.
(As always) you just have to make sure that the information that you need is present in input data and output data.
No need to output information that is unchanged from input, but make sure to output what you need (like IPRED, CWRES, CL, ETA1 etc which cannot be found in input). Always output the row identifier!
Some of these values can be found from other files generated by Nonmem but notice: NMscanData only uses input and output data.
Including a unique row identifier in both input and output data is the most robust way to combine the tables.
Everything will most likely work even if you don't
I would not take "most likely" when robustness is available.
In firstonly tables, include the subject ID or the row identifier.

`NMscanData` limitations

The most important limitation to have in mind is not related to NMscanData iteself

If merging with input data, the input data must be available as was when the model was run.
Option 1: "Freeze" model runs together with data. NMfreezeModels does that and will be included in NMdata after a little more testing.
Option 2 (platform-dependent): Nonmem can be run in a wrapper script that either copies the input data, or runs NMscanData and saves the output in a compressed file format (like rds).

Even if limitations of NMscanData may be several, they are all rare. There is a very good chance you will never run into any of them.

Not all data filter statements implemented. Nested ACCEPT and IGNORE statements are not supported at this point. The resulting number of rows after applying filters is checked against row-level output table dimensions (if any available).
Disjoint rows with common ID values are currently not supported together with firstonly or lastonly tables. This is on the todo list.
The RECORDS and NULL options in $DATA are not implemented. If using RECORDS, please use the col.row option to merge by a unique row identifier.
Character time variables not interpreted. If you need this, we can implement it relatively easily.
Only output tables returning either all rows or one row per subject can be merged with input. Tables written with options like FIRSTLASTONLY (two rows per subject) and OBSONLY are disregarded with a warning (you can read them with NMscanTables). LASTONLY is treated like FIRSTONLY, i.e. as ID-level information if not available elsewhere.

Data read building blocks

NMscanData uses a few simpler functions to read all the data it can find. These functions may be useful when you don't want the full automatic package provided by NMscanData.

NMreadTab
Fast read and format output tables from Nonmem
Handles the "TABLE NO." counter
If you simulate a large number of subjects in Nonmem and get a large (gigabytes) output data file, this will be extremely fast compared to almost anything else.
NMscanTables (uses NMreadTab)
Given a control stream or list file, read all output tables
NMreadCsv
Fast read delimited (input data) files
NMscanInput (uses NMreadCSV)
Given a control stream or list file, read input data.
Optionally reads and applies Nonmem ignore/accept statements
Optionally translates column names according to names used in Nonmem

Configuration of NMdata defaults

NMdataConf

\framesubtitle{Tailor NMdata default behavior to your setup and preferences} ::: columns

:::: column

NMdataConf supports changing many default argument values, simplifying coding.
Notice, values are reset when library(NMdata) or NMdataConf(reset=TRUE) are called.
See all currently used values by NMdataConf().

::::

:::: column My initialization of scripts often contain this:

library(NMdata)
NMdataConf(as.fun="data.table"
### this is the default value
          ,col.row="ROW"
### Recommended but _right now_ not default
          ,merge.by.row=TRUE
### You can switch this when script is final
          ,quiet=FALSE)

:::: ::: Other commonly used settings in NMdataConf are

as.fun: a function to apply to all objects before returning them from NMdata functions. If you use dplyr/tidyverse, do (notice, no quotes!):

library(tibble)
NMdataConf(as.fun=tibble::as_tibble)

use.input: Should NMscanData combine (output data) with input data? (default TRUE)
recover.rows: Should NMscanData Include rows not processed by Nonmem? (default FALSE).
file.mod: A function that translates the list file path to the input control stream file path. Default is to replace extension with .mod.
check.time: Default is TRUE, meaning that output (list file and tables) are expected to newer than input (control stream and input data). If say you copy files between systems, this check may not make sense.

Why does `NMdata` not use `options()`?

R has a system for handling settings. NMdata does not use that.

Main reason: NMdataConf can check both setting/argument names and values for consistency.

try(NMdataConf(asfun=tibble::as_tibble))
try(NMdataConf(use.input="FALSE"))

A few extra features are available with NMdataConf:
Reset all settings: NMdataConf(reset=TRUE)
Reset individual settings: NMdataConf(use.input=NULL, as.fun=NULL)
Retrieve all current settings: NMdataConf()

How is `NMdata` qualified?

library(devtools)
res.test <- test()
Ntests <- sum(sapply(res.test,function(x)length(x$results)))

\includegraphics[width=.8\textwidth]{badges_snip_210623}

NMdata contains very little calculations (only exception may be flagsAssign/flagsCount)
Historic bugs have mostly resulted in uninformative errors due to e.g. failure in processing text. Never a wrong data set.
NMdata includes r Ntests "unit tests" where results of function calls with different datasets and arguments are compared to expected results
Tests are consistently run before any release of the package
The tests are crucial in making sure that fixing one bug or introducing a new feature does not introduce new bugs
The testing approach is as recommended in "R packages" by Hadley Wickham and Jennifer Bryan https://r-pkgs.org/tests.html.
If you have a specific example you want to make sure is tested in the package, we will include the test in the package

Next steps for `NMdata`

The following would be great help in making NMdata more accessible and useful
Testing - please use the package and provide feedback
Review of documentation, vignettes, and descriptions/explanations on website
Graphical representations and illustrations.
A tidyverse workflow for a new vignette
If you have ideas you want to contribute, let's discuss!
Additional features
Functions to generate dosing regimens for simulations and nominal-time datasets
Functions for easy documentation of column contents (description, units, 1:1 relationships between character and numeric columns)
NMfreezeModels: Save Nonmem models with input data and all results to ensure reproducibility of output

Summary

::: columns :::: column Data creation

renameByContents
compareCols
mergeCheck
flagsAssign/flagsCount
NMorderColumns
(NMcheckData)
NMwriteData
NMstamp/NMinfo

Read/write Nonmem control streams

NMreadSection/NMwriteSection :::: :::: column Retrieve data from Nonmem runs
NMscanData
NMscanMultiple
summary, NMinfo
NMscanInput, NMreadCsv
NMscanTables, NMreadTab
NMcheckColnames

Adjust behavior to your preferences

NMdataConf

Other

(NMfreezeModels)

:::: :::

`NMdata` functions under development

NMfreezeModels

:::::::::::::: {.columns} ::: {.column width="85%"} In order to ensure reproducibility, any output has to be produced based on arvhived/frozen Nonmem models. ::: ::: {.column width="15%"} \includegraphics[width=.5in]{figures/worksign.png} ::: ::::::::::::::

The components that need to be "frozen" are

Nonmem control streams
input data
estimation results (output tables, .lst, .ext etc.)
simulation code (say mrgsolve scripts)
?

NMfreezeModels does freeze

input control streams
input data
all output tables
all nonmem results files

Limitations

NMfreezeModels does not provide a solution for the simulation code at this point. I am very interested in how we can do this.
Only supports collections of models with one common input dataset
The permissions of the frozen folder should be read-only. However, that means that once the freeze it's done, you can no longer add code or descriptions. It all has to be handled in the freeze procedure.

Safe model reader

:::::::::::::: {.columns} ::: {.column width="85%"} * A function to read frozen Nonmem results and mrgsolve code to ensure that the right simulation model and parameter values are used

Obviously, this is closely related to the way mrgsolve code is frozen together with nonmem code. ::: ::: {.column width="15%"} \includegraphics[width=.5in]{figures/ideabulb.jpg} ::: ::::::::::::::

Other tools

tracee

New package on CRAN from same author as NMdata
A small package focusing on making outputs (graphics) traceable back to code
The author has been using the code for years

`ggwrite`: Flexible saving of tracable output

Saves images in sizes made for powerpoint, including stamps (time, source, output filename). It can save multiple plots at once as one file (pdf) or multiple files.

::: columns :::: column ggwrite is a wrapper of png and pdf (and dev.off) with convenience features such as

Support for multiple plots at once
saved as either multiple files, named by list element names if wanted (or just numbered)
or a single pdf with one plot per page
Stamping with creation time, script name, and output name
"canvas" sizes made for powerpoint or full-screen display (see ?canvasSize)
Custom canvases are very simple to create
Independent save and show arguments for very simple conditional behavior
save defaults to TRUE if a filename is given
show defaults to the inverse of save :::: :::: column

## install.packages("tracee",repos="https://cloud.r-project.org")
library(tracee)

\footnotesize

writeOutput <- TRUE
script <- "path/to/script.R"
p1 <- ggplot(res1,aes(PRED,DV,colour=TRTACT))+geom_point()+
    geom_abline(slope=1)+
    scale_x_log10()+scale_y_log10()
ggwrite(p1,file="results/pred_dv.png",
        script=script,
        save=writeOutput)

Notice the caption with output and script file names. \begin{center} \includegraphics{results/pred_dv.png} \end{center}

:::: :::

`execSafe`: Save input data with each Nonmem run

Executes Nonem from within R
Archives input data together with Nonmem run (you can tell NMdata to read that archive when reading your model)
Automatically generates a PNM file if you want to parallellize

philipdelff/NMdata documentation built on June 13, 2025, 6:28 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

philipdelff/NMdata Preparation, Checking and Post-Processing Data for PK/PD Modeling

In philipdelff/NMdata: Preparation, Checking and Post-Processing Data for PK/PD Modeling

this will allow packages to be cached across projects for faster installs on new projects

Data set creation

Compare compatibility of data sets for rbind and merge: compareCols

Keep track of missing values

Rename columns based on contents

renameByContents

NMisNumeric

Automated checking of merges

mergeCheck

Without mergeCheck

mergeCheck throws an error

Conclusion

Exclusion flags

FlagsAssign

flagsCount

Finalize data for Nonmem

Advice: always include a unique row identifier

Why

The identifier should be

Sort rows and add a row counter

NMorderColumns

NMcheckData: Check data syntax for Nonmem compatibility

NMwriteData

The csv writer is very simple

Update Nonmem control streams

Tips

Automated documentation of data

Retrieving data from Nonmem runs

NMscanData

Remember the unique row identifier

NMscanData

Recover discarded rows

Compare models using NMscanMultiple

Preserve all input data properties

The NMdata class

The NMdata class

Table-specific information

Column-specific information

What to do when Nonmem results seem meaningless?

What should I do for my models to be compatible with NMscanData?

NMscanData limitations

Data read building blocks

Configuration of NMdata defaults

NMdataConf

Why does NMdata not use options()?

How is NMdata qualified?

Next steps for NMdata

Next steps for NMdata

Summary

NMdata functions under development

NMfreezeModels

Safe model reader

Other tools

tracee

ggwrite: Flexible saving of tracable output

execSafe: Save input data with each Nonmem run

R Package Documentation

Browse R Packages

We want your feedback!

philipdelff/NMdata
Preparation, Checking and Post-Processing Data for PK/PD Modeling

Compare compatibility of data sets for rbind and merge: `compareCols`

`NMisNumeric`

Without `mergeCheck`

`mergeCheck` throws an error

`flagsCount`

`NMcheckData`: Check data syntax for Nonmem compatibility

Compare models using `NMscanMultiple`

What should I do for my models to be compatible with `NMscanData`?

`NMscanData` limitations

Why does `NMdata` not use `options()`?

How is `NMdata` qualified?

Next steps for `NMdata`

Next steps for `NMdata`

`NMdata` functions under development

`ggwrite`: Flexible saving of tracable output

`execSafe`: Save input data with each Nonmem run