knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ,fig.width=7) ## this change data.table syntax. I think we can do without. ## knitr::opts_chunk$set(tidy.opts=list(width.cutoff=60), tidy=TRUE)
## library(devtools) ## load_all("C:/Users/delff/working_copies/NMdata")
library(NMdata) library(ggplot2) theme_set(theme_bw()+theme(legend.position="bottom"))
Built r Sys.Date()
using NMdata r packageVersion("NMdata")
.
Getting data from R to Nonmem and back can be tedious and time consuming. The way data is exported from Nonmem means that a few tables may have to be combined, and then some variables may still be missing (like character variables which may be in the input data file). Most modellers develop some habits over time to avoid some issues. But still, it may be time demanding even for experienced modelers to pick up a model developed by someone else or by themselves earlier in their career just because they need to understand what data exported and how. This vignette focuses on how NMdata provides a very general solution for what needs to be trivial: get a dataset out of a Nonmem run, including discarded information from the input data. And "general" means that it will work out of the box in as many cases as possible. It will do its best to understand how the data can be read and combined, (ideally) regardless of the way the model was written.
In brevity, the most important steps are
It should not be to hard to do. But with the large degree of flexibility Nonmem offer, there are a lot of caveats to be aware of, again especially if the model wasn't written by your own (current) standards.
Try NMscanData on any model:
res0 <- NMscanData(NMdata_filepath("examples/nonmem/xgxr001.lst")) class(res0)
NMscanData has read and combined input and output data and returned
a data.table. The NMdata
class allows for keeping additional
information in the object, but it can be treated as a data.table. If
you are not used to data.table and don't want to use them, use the as.fun=as.data.frame
argument and get a
data.frame
back. In fact, you can get any class back (including the
popular tibble
). See the FAQ for how to do this, and for how to
change the general behaviour of NMdata
functions to return the class
you prefer. NMdata
uses data.table
under the hood but all
functions will work even if you don't.
NMscanData
writes a little information about what has been read. We
are being recommended merging by a unique row identifier. The default
of interpreting Nonmem code is not recommended. It is the default
because it does not rely on assumptions about the input data. It
happens that this dataset and one of the output tables contain a row
identifier (a row counter in the column called ROW
). Let's follow
the recommendation and use this column for merging. The following
checks that we get the same result from the two methods.
res1 <- NMscanData(NMdata_filepath("examples/nonmem/xgxr001.lst"),col.row="ROW") all.equal(res0,res1)
All features shown below will work whether you supply col.row
or
not. We use col.row
because it is more robust and always
recommended.
Let's have a quick look at the data we got back. The following is again done with data.table but the comments in the code should make it clear what happens.
The data used for the example is a PK single ascending dose data set borrowed from the xgxr package, kindly allowed by the xgxr team.
## trtact is a character. Make it a factor with levels ordered by numericaldose level. res1[,trtact:=reorder(trtact,DOSE)] ## Derive another data.table with geometric mean pop predictions by ## treatment and nominal sample time. Only use sample records. res1.mean <- res1[EVID==0,.(gmPRED=exp(mean(log(PRED)))),by=.(trtact,NOMTIME)] ## plot individual observations and geometric mean pop predictions. Split by treatment. ggplot(res1[EVID==0])+ geom_point(aes(TIME,DV))+ geom_line(aes(NOMTIME,gmPRED),data=res1.mean,colour="red")+ scale_y_log10()+ facet_wrap(~trtact,scales="free_y")+ labs(x="Hours since administration",y="Concentration (ng/mL)")
You see from the plot that the obtained dataset contains both model
predictions (i.e. from output tables) and a character variable,
trtact
(i.e. from input data). NMscanData has read both and combined
them.
In a proper PK/PD analysis we need to explore the data at multiple variability levels. What we looked at above is at dosing and sampling level (one row per dosing or sampling event). NMdata provides very useful functions to extract information at other levels of variability. Before extracting the subject-level information we will add an individual exposure measure to the dataset. We will use the emperical Bayes' estimate of the individual maximum concentration. This is derived as the maximum prediction across the sample times - it may be better to simulate the model at a richer time scale to get better precision.
res1[,Cmax:=max(IPRED),by=.(ID)] res1.id <- findCovs(res1,cols.id="ID") dim(res1.id) ggplot(res1.id,aes(WEIGHTB,Cmax/DOSE,colour=trtact))+ geom_point()+ labs(x="Bodyweight at baseline (kg)")
If your model includes occasion variability, you probably also want to look at
## we have no occasion variability in this data ## res1.id.occ <- findCovs(res1,cols.id=c("ID","OCC"))
Let's use the same function to see the variables that are constant across the whole dataset
findCovs(res1)
Let's take a look at what is in the
res1.id
generated above. It is a mix of variables that vary at subject level and varibles
that are constant across the full dataset.
dim(res1.id) head(res1.id,2)
findCovs
has a counterpart in findVars
which finds variables that
do vary within constant values of optional columns. To get only the ones that
vary within the dataset (i.e. they are truely subject-level
variables), we can do
res1.id2 <- findVars(res1.id) dim(res1.id2) head(res1.id2,2)
findVars
supports the cols.id argument too. So you can use
findVars(res1,cols.id="ID")
for find variables that are non-constant
within (at least one value of) ID. cols.id
can be of arbitrary length.
Of course, we most often know what covariates or other subject-level
variables to look at, and we would not search for them after running a
nonmem model. But in the situation where you are looking at someone
else's work or you are doing a meta analysis across models where the
data has been coded slightly differently, these simple tools can be
very useful. Also, remember that if a variable is returned by these
functions, you know that they fullfill the variability requirement. We
know that variables in res1.id
above non-varying within ID, meaning
that no ID has more than one value of the variable. NA
counts as any other value, so if a value is returned for subject in res1.id, this subject does not have NA's in res1
.
You may have wondered why there is so little data on the small doses
in the plot of observations and geometric mean population predictions
above. The reason is that observations below the lower limit of
quantification have been disregarded. For this and many other reasons,
it is very common to use ACCEPT
and IGNORE
statements in the
$INPUT
section of nonmem control streams. NMscanData can include
this data in the returned data object as well. Let's redo the plot
above taking advantage of this option:
res2 <- NMscanData(NMdata_filepath("examples/nonmem/xgxr014.lst"), col.row="ROW",recover.rows=TRUE) ## Derive another data.table with geometric mean pop predictions by ## treatment and nominal sample time. Only use sample records. res2.mean <- res2[EVID==0&nmout==TRUE, .(gmPRED=exp(mean(log(PRED)))), by=.(trtact,NOMTIME)] ## plot individual observations and geometric mean pop ## predictions. Split by treatment. ggplot(res2[EVID==0])+ geom_point(aes(TIME,DV,colour=flag))+ geom_line(aes(NOMTIME,gmPRED),data=res2.mean)+ scale_y_log10()+ facet_wrap(~trtact,scales="free_y")+ labs(x="Hours since administration",y="Concentration (ng/mL)")
A couple of details are different compared to when we read res1
. For
now, notice one thing changed when calculating the population geometric
means by nominal time and dose. We included nmout==TRUE
. nmout
is
a boolean column created by NMscanData expressing whether the rows
were in the output data (nmout==TRUE
) or they were recovered from
the input data. PRED
is obviously not calculated for data rows
neglected by Nonmem, so we want to keep those out of the
calculation. Let's see how many were recovered
## this is just a long-format representation of ## with(res1,table(nmout,flag)) using data.table. res1[,.N,by=.(nmout,flag)]
So 905 were part of the analysis.
The recover.rows
argument can be used independently of whether
col.row
is used.
You may have noticed a column called model
in some of the output
above. NMscanData by default adds this column for convenience when
working with multiple models. You can specify both column
name and content as arguments in NMscanData. If you don't, the column
will be called model
, and the model name taken from the lst file
name (say, xgxr001
). In the
following we use this to compare population predictions from two
different models. We read them again just to show the use of the
argument to name the models ourselves.
## notice fill is an option to rbind with data.table res1.m <- NMscanData(NMdata_filepath("examples/nonmem/xgxr001.lst"), col.row="ROW") res2.m <- NMscanData(NMdata_filepath("examples/nonmem/xgxr014.lst"), col.row="ROW",name="single-compartment") res.mult <- rbind(res1.m,res2.m,fill=T) res.mult.mean <- res.mult[EVID==0&nmout==TRUE, .(gmPRED=exp(mean(log(PRED)))), by=.(model,trtact,NOMTIME)] ggplot(res.mult.mean,aes(NOMTIME,gmPRED,colour=model))+ geom_line()+ scale_y_log10()+ facet_wrap(~trtact,scales="free_y")
A unique row identifier is not needed for NMscanData to run and in
most cases succesfully and correctly complete all the steps
above. However, the most robust way to use NMscanData is to include a
unique row identifier in both input and at least one full-length
output table. Then tell NMscanData to use the row identifier using the
col.row
argument. The advantage is that less nonmem code has to be
interpreted for this method. However, notice that no checks are being
done if the row identifier has been modified during the nonmem run. If
you tell NMscanData that the column called ROW
is a to be used as
the row identifier, and ROW is being modified in say $PK
before
being output in an output table, this will mess up the result. But if
you keep the good practice of including a unique row identifier in
your dataset, the advice is to include it in at least one output table
and tell NMscanData to use it by the argument col.row
.
In the code above when plotting the population predictions together
with individual observations, we saw that character variables
(the treatment as dose with unit) was read from the input data file
and was used for splitting the plots. However, in order to sort the
plots naturally by increasing dose level, we had to run reorder
on
the variable. This is because the input data is read from a csv
file. NMscanData will not code any character variables as factors when
reading from text files.
Since the character representation of treatment was already prepared,
it would be natural to encode the factor levels already at that
point. In order to preserve such information, we can use R's native
rds format. If the argument use.rds
is TRUE
, NMscanData will look for
an rds file next to the input data file (which is a delimited text
file) the exact same name as the text file except the extension must
be ".rds" rather than say ".csv" (for nonmem and NMscanData, the
extension of the delimited text file doesn't matter). If it finds the
rds file, this will be used instead. No checks are done of whether the
contents are similar in any way to the delimited text file which is
ignored in this case.
Return to the example above creating the dataset dat2
. Notice the
message from NMscanData that an rds file was found. This is why we
could sort the plots correctly on the dose level with reordering the
factor levels first. There are three advantages of using rds files
fread
from data.table
which is extremely fast for delimited files so in many cases this
difference can be small).data.table
)
so file access (I/O) is likely to be the main bottleneck.If you write Nonmem datasets with the NMdata::NMwriteData
, you can
get an rds file automatically, exactly where NMscanData
will look
for it. Creating datasets are out of the scope of this vignette, though.
By default, NMscanData expects a psn-style naming of input and output
control streams. This means that if the input control stream is called
model.mod, the returned control stream is called model.lst. If you use
a different setup, you can use the file.mod
argument. Do one of the
following if needed:
input.txt
and the output
is called output.txt
:out2in <- function(file) file.path(dirname(file),"input.txt") res <- NMscanData("path/to/output.txt",file.mod=out2in)
So far, the merge has been very straightforward, but in many situations, choices have to be made.
The following main principles are followed
recover.rows
argument), no information from output is merged onto these rowsThe lst file was scanned for output tables, and they were all read (including interpreting the possible firstonly option). The input data has been used based on the DATA and INPUT sections of the control stream. The key steps in this process are available as independent functions.
NMreadTab: Read an Nonmem output table based on the path to the output table file.
NMscanTables: Read all output data files defined in a Nonmem run. Return a list of tables (as data.frames or data.tables).
NMtransInput: Read input data based on a nonmem file. Data will be processed and named like the Nonmem model. ACCEPT and IGNORE filters can be applied as well. There are a few limitations to this functionality at this point.
The answer to this should be as close to "nothing" as possible - that's more or less the aim of the function. You just have to make sure that the information that you need is present in input data and output data. No need to output information that is unchanged from input, but make sure to output what you need (like IPRED, CWRES, CL, ETA1 etc which cannot be found in input). Some of these values can be found from other files generated by Nonmem but notice: NMscanData uses only input and output data.
It is recommended to always use a unique row identifier in both input and output data. This is the most robust way to merge back with input data. In firstonly tables, include the subject ID. Again, everything will most likely work even if you don't, I personally don't like relying on "most likely" when I can just as well have robustness.
If merging with input data, the input data must be available as was when the model was run. If you want to avoid this potential issue, Nonmem can be run in a wrapper script that either copies the input data, or runs NMscanData and saves the output in a compressed file format (like rds or zip).
Nested ACCEPT and IGNORE statements are not supported at this point. The resulting number of rows after applying filters is checked against row-level output table dimensions (if any available). In other words, you have to be unlucky to run into trouble without an error. But it is alway recommended to use a unique row identifier in both input and output tables in order to avoid relying on interpretation of Nonmem code.
The RECORDS and NULL options in $DATA are not implemented. If using
RECORDS, please use the col.row
option to merge by a unique row
identifier.
Nonmem supports a clocktime input format for a column called TIME in input data. Based on a day counter and a character ("00:00") clock format, Nonmem can calculate the individual time since first record. This behaviour is not mimicked by NMscanData, and the only ways to get TIME in this case are to either include it in an output TABLE or to code the translation yourself after calling NMscanData. Of course, this is on the todo list.
The FIRSTLASTONLY
option in $TABLE
results in two rows per
subject. For now, this cannot be merged with input, and these tables
are disregarded with a warning (you can read them with
NMscanTables
). LASTONLY
is treated like FIRSTONLY
, i.e. as
ID-level information if not available elsewhere.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.