NMcheckData | R Documentation |
Check data in various ways for compatibility with Nonmem. Some findings will be reported even if they will not make Nonmem fail but because they are typical dataset issues.
NMcheckData(
data,
file,
covs,
covs.occ,
cols.num,
col.id = "ID",
col.time = "TIME",
col.dv = "DV",
col.mdv = "MDV",
col.cmt = "CMT",
col.amt = "AMT",
col.flagn,
col.row,
col.usubjid,
cols.dup,
type.data = "est",
na.strings,
return.summary = FALSE,
quiet = FALSE,
as.fun
)
data |
The data to check. data.frame, data.table, tibble, anything that can be converted to data.table. |
file |
Alternatively to checking a data object, you can use
file to specify a control stream to check. This can either be
a (working or non-working) input control stream or an output
control stream. In this case, |
covs |
columns that contain subject-level covariates. They are expected to be non-missing, numeric and not varying within subjects. |
covs.occ |
A list specifying columns that contain
subject:occasion-level covariates. They are expected to be
non-missing, numeric and not varying within combinations of
subject and occasion. |
cols.num |
Columns that are expected to be present, numeric and non-NA. If a character vector is given, the columns are expected to be used in all rows. If a column is only used for a subset of rows, use a list and name the elements by subsetting strings. See examples. |
col.id |
The name of the column that holds the subject identifier. Default is "ID". |
col.time |
The name of the column holding actual time. |
col.dv |
The name of the column holding the dependent
variable. For now, only one column can be specified, and
|
col.mdv |
The name of the column holding the binary indicator
of the dependent variable missing. Default is |
col.cmt |
The name(s) of the compartment column(s). These will be checked to be positive integers for all rows. They are also used in checks for row duplicates. |
col.amt |
The name of the dose amount column. |
col.flagn |
Optionally, the name of the column holding
numeric exclusion flags. Default value is |
col.row |
A column with a unique value for each row. Such a
column is recommended to use if possible. Default
( |
col.usubjid |
Optional unique subject identifier. It is recommended to keep a unique subject identifier (typically a character string including an abbreviated study name and the subject id) from the clinical datasets in the analysis set. If you supply the name of the column holding this identifier, NMcheckData will check that it is non-missing, that it is unique within values of col.id (i.e. that the analysis subject ID's are unique across actual subjects), and that col.id is unique within the unique subject ID (a violation of the latter is less likely). |
cols.dup |
Additional column names to consider in search of duplicate events. col.id, col.cmt, col.evid, and col.time are always considered if found in data, and cols.dup is added to this list if provided. |
type.data |
"est" for estimation data (default), and "sim"
for simulation data. Differences are that |
na.strings |
Strings to be accepted when trying to convert
characters to numerics. This will typically be a string that
represents missing values. Default is ".". Notice, actual NA,
i.e. not a string, is allowed independently of na.strings. See
|
return.summary |
If TRUE (not default), the table summary
that is printed if |
quiet |
Keep quiet? Default is not to. |
as.fun |
The default is to return data as a
|
The following checks are performed. The term "numeric" does not refer to a numeric representation in R, but compatibility with Nonmem. The character string "2" is in this sense a valid numeric, "id2" is not.
Column names must be unique and not contain special characters
If an exclusion flag is used (for ACCEPT/IGNORE in Nonmem), elements must be non-missing and integers. Notice, if an exclusion flag is found, the rest of the checks are performed on rows where that flag equals 0 (zero) only.
If a unique row identifier is found, it has to be non-missing, increasing integers.
col.time (TIME), EVID, col.id (ID), col.cmt (CMT), and col.mdv (MDV): If present, elements must be non-missing and numeric.
col.time (TIME) must be non-negative
EVID must be in {0,1,2,3,4}.
CMT must be positive integers. However, can be missing or zero for EVID==3.
MDV must be the binary (1/0) representation of is.na(DV) for dosing records (EVID==0).
AMT must be 0 or NA for EVID 0 and 2
AMT must be positive for EVID 1 and 4
DV must be numeric
DV must be missing for EVID in {1,4}.
If found, RATE must be a numeric, equaling -2 or non-negative for dosing events.
If found, SS must be a numeric, equaling 0 or 1 for dosing records.
If found, ADDL
must be a non-negative integer for dosing
records. II must be present.
If found, II must be a non-negative integer for dosing
records. ADDL
must be present.
ID must be positive and values cannot be disjoint (all records for each ID must be following each other. This is technically not a requirement in Nonmem but most often an error. Use a second ID column if you deliberately want to soften this check)
TIME cannot be decreasing within ID, unless EVID in {3,4}.
all ID's must have doses (EVID in {1,4})
all ID's must have observations (EVID==0)
ID's should not have leading zeros since these will be lost when Nonmem read, then write the data.
If a unique row identifier is used, this must be non-missing, increasing, integer
Character values must not contain commas (they will mess up writing/reading csv)
Columns specified in covs argument must be non-missing, numeric and not varying within subjects.
Columns specified in covs.occ must be non-missing, numeric and not varying within combinations of subject and occasion.
Columns specified in cols.num must be present, numeric and non-NA.
If a unique subject identifier column (col.usubjid) is provided, col.id must be unique within values of col.usubjid and vice versa.
Events should not be duplicated. For all rows, the combination of col.id, col.cmt ,col.evid, col.time plus the optional columns specified in cols.dup must be unique. In other words, if a subject (col.id) that has say observations (col.evid) at the same time (col.time), this is considered a duplicate. The exception is if there is a reset event (col.evid is 3 or 4) in between the two rows. cols.dup can be used to add columns to this analysis. This is useful for different assays run on the same compartment (say a DVID column) or maybe stacked datasets. If col.cmt is of length>1, this search is repeated for each cmt column.
A table with findings
## Not run:
dat <- readRDS(system.file("examples/data/xgxr2.rds", package="NMdata"))
NMcheckData(dat)
dat[EVID==0,LLOQ:=3.5]
## expecting LLOQ only for samples
NMcheckData(dat,cols.num=list(c("STUDY"),"EVID==0"=c("LLOQ")))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.