extractConcTime_mult: Pull concentration-time data from multiple Simcyp Simulator...

View source: R/extractConcTime_mult.R

extractConcTime_multR Documentation

Pull concentration-time data from multiple Simcyp Simulator output files

Description

extractConcTime_mult is meant to be used in conjunction with ct_plot_overlay or ct_plot_mult to create graphs from multiple Simcyp Simulator output files. If you list multiple files, multiple tissues, and/or multiple compounds to extract (see options below), this will extract all possible variations of them. For example, if you ask for data from the files "sim1.xlsx" and "sim2.xlsx" and then also ask for "substrate" and "primary metabolite 1", you will get the substrate and primary metabolite 1 data from both files. NOTE: If ANY of the Excel files you wish to extract data from are open, this WILL CRASH and WILL NOT save whatever progress it has made so far. Be sure to close all of the source Excel files.

Usage

extractConcTime_mult(
  sim_data_files = NA,
  obs_to_sim_assignment = NA,
  ct_dataframe = NA,
  overwrite = FALSE,
  tissues = "plasma",
  compoundsToExtract = "all",
  conc_units_to_use = "ng/mL",
  time_units_to_use = "hours",
  returnAggregateOrIndiv = "aggregate",
  adjust_obs_time = FALSE,
  existing_exp_details = NA,
  obs_data_files = NA,
  ...
)

Arguments

sim_data_files

a character vector of simulator output files, each in quotes and encapsulated with c(...), NA to extract concentration-time data for all the Excel files in the current folder, or "recursive" to extract concentration-time data for all the Excel files in the current folder and all subfolders. Example of acceptable input: c("sim1.xlsx", "sim2.xlsx"). The path should be included with the file names if they are located somewhere other than your working directory. If some of your Excel files are not regular simulator output, e.g. they are sensitivity analyses or a file where you were doing some calculations, those files will be skipped.

obs_to_sim_assignment

the assignment of which observed files go with which simulated files. (NA, which is the default, means no observed data will be extracted.) There are four ways to supply this:

"use existing_exp_details"

If you have already extracted the simulation experimental details with the function extractExpDetails_mult and you included observed data overlay files in your simulations, as long as those XML files have their corresponding Excel files in the same location, we can use that information to figure out which observed Excel file should go with which simulation. Note that this does require you to supply something for the argument existing_exp_details to work. This has been set up to look for that location even if the user who ran the simulation is different from the user extracting the data, e.g., if the original path was something like "C:/Users/FridaKahlo/Rose project simulations" and the current username (the result from running Sys.info()["user"]) is "DiegoRivera", this will look in the folder "C:/Users/DiegoRivera/Rose project simulations" for your XML files. This is assuming that the file path starts with "C:/Users/CurrentUserName/..." and will fail to change the username if that is not the case.

a character vector of the observed data files, each in quotes and encapsulated with c(...)

If all the observed data can be compared to all the simulated data, then an example of acceptable input would be: obs_to_sim_assignment = "obsdata1.xlsx". However, if you would like to specify which observed file goes with which simulated file, you can do this with a named character vector, e.g., c("obsdata1.xlsx" = "simfileA.xlsx", "obsdata2.xlsx" = "simfileB.xlsx"). If one observed file needs to match more than one simulated file but not all the simulated files, you can do that by separating the simulated files with commas, e.g., obs_to_sim_assignment = c("obs data 1.xlsx" = "mdz-5mg-qd.xlsx, mdz-5mg-qd-fa08.xlsx", "obs data 2.xlsx" = "mdz-5mg-qd-cancer.xlsx, mdz-5mg-qd-cancer-fa08.xlsx"). Pay close attention to the position of commas and quotes there! This can get a bit confusing, in our opinions, so you may want to try the other options if you need to link specific observed and simulated files; they can be easier to follow but require more typing.

a data.frame with one column for the observed files and one column for the simulated files they each match

The data.frame must have column names of "ObsFile" and "File" for the observed and simulated files, respectively. Here's an example of acceptable input: obs_to_sim_assignment = data.frame(ObsFile = c("obsdata1.xlsx", "obsdata2.xlsx"), File = c("simfileA.xlsx", "simfileB.xlsx")) Each row should contain one observed file and one simulated file, so if you want to compare a single observed file to multiple simulated files, you'll need to repeat the observed file, e.g., obs_to_sim_assignment = data.frame(ObsFile = c("obsdata1.xlsx", "obsdata2.xlsx", "obsdata2.xlsx", "obsdata2.xlsx"), File = c("simfileA.xlsx", "simfileB.xlsx", "simfileC.xlsx", "simfileD.xlsx"))

a csv file with one column for the observed files and one column for the simulated files they each match

The setup of this csv file should be just like that described for supplying a data.frame, so one row for each pair of simulated and observed files you want to compare to each other. Supply this as a character string, like this: obs_to_sim_assignment = "My obs to sim assignments.csv"

For whichever option you choose, the observed files' paths should be included if they are located somewhere other than your working directory. The observed data files should be for the Excel file that it is ready to be converted to an XML file, not the file that contains only the digitized time and concentration data. This function assumes that the dosing intervals for the observed data match those in the simulated data. See "Details" for more info.

ct_dataframe

(optional) a data.frame that contains previously extracted concentration-time data. This should NOT be in quotes. Because we can see scenarios where you might want to extract some concentration-time data, play around with those data, and then later decide you want to pull more concentration-time data for comparisons, this data.frame can already exist. When that is the case, this function will add data to that data.frame. It will not overwrite existing data unless overwrite is set to TRUE. However, it also will NOT open any simulation files that already exist and look for any possible new tissues and compounds. If you want to add new tissues and compounds that you previously did NOT extract without overwriting the concentration-time data you already have, we recommend running a separate instance of extractConcTime_mult and then using dplyr::bind_rows to add the new data to the existing ct_dataframe.

overwrite

TRUE or FALSE (default) on whether to re-extract the concentration-time data from output files that are already included in ct_dataframe. Since pulling data from Excel files is slow, by default, this will not overwrite existing data and instead will only add data from any Excel files that aren't already included. A situation where you might want to set this to TRUE would be when you have changed input parameters for simulations and re-run them.

tissues

From which tissue(s) should the desired concentrations be extracted? Default is plasma for typical plasma concentration-time data. Other options are "blood" or any tissues included in "Sheet Options", "Tissues" in the simulator. All possible options:

First-order absorption models

"plasma", "blood", "unbound blood", "unbound plasma", "additional organ", "adipose", "bone", "brain", "feto-placenta", "gut tissue", "heart", "kidney", "liver", "lung", "muscle", "pancreas", "peripheral blood", "peripheral plasma", "peripheral unbound blood", "peripheral unbound plasma", "portal vein blood", "portal vein plasma", "portal vein unbound blood", "portal vein unbound plasma", "skin", or "spleen".

ADAM-models

"stomach", "duodenum", "jejunum I", "jejunum II", "jejunum III" (only applies to rodents), "jejunum IV" (only applies to rodents), "ileum I", "ileum II", "ileum III", "ileum IV", "colon", "faeces", "gut tissue", "cumulative absorption", "cumulative fraction released", or "cumulative dissolution".

ADC simulations

NOT YET SET UP. If you need this, please contact Laura Shireman.

Not case sensitive. Acceptable input is all tissues desired as a character vector, e.g., tissues = c("plasma", "blood", "liver") or, if you want all possible tissues and you've got some time to kill, "all". That will make R check for all sorts of possible permutations of tab names, so it does take a while.

compoundsToExtract

For which compound do you want to extract concentration-time data? Options are:

  • "all" (default) for all the possible compounds in the simulation (substrate, metabolites, inhibitors, and ADC-related compounds)

  • "substrate" (default),

  • "primary metabolite 1",

  • "primary metabolite 2",

  • "secondary metabolite",

  • "inhibitor 1" – this can be an inducer, inhibitor, activator, or suppresesor, but it's labeled as "Inhibitor 1" in the simulator,

  • "inhibitor 2" for the 2nd inhibitor listed in the simulation,

  • "inhibitor 1 metabolite" for the primary metabolite of inhibitor 1,

  • "conjugated protein" for DAR1-DARmax for an antibody-drug conjugate; observed data with DV listed as "Conjugated Protein Plasma Total" will match these simulated data,

  • "total protein" for DAR0-DARmax for an ADC; observed data with DV listed as "Total Protein Conjugate Plasma Total" will match these simulated data, or

  • "released payload" for the released drug from an ADC, which shows up as primary metabolite 1 in Simulator output files

Input to this argument should be all desired compounds as a character vector, e.g., c("substrate", "primary metabolite 1"). Note: If your compound is a therapeutic protein or ADC, we haven't tested this very thoroughly, so please be extra careful to check that you're getting the correct data.

conc_units_to_use

concentration units to use so that all data will be comparable. Options are the same as the ones in the Excel form for PE data entry. Default is "ng/mL". Note: ADAM model data concentration units are not converted because there are simply too many units to manage easily, so please check that the units are what you expected in the end.

time_units_to_use

time units to use so that all data will be comparable. Options are "hours" (default), "days", "weeks", or "minutes".

returnAggregateOrIndiv

Return aggregate and/or individual simulated concentration-time data? Options are "aggregate" (default), "individual", or "both". Aggregated data are not calculated here but are pulled from the simulator output rows labeled as "Population Statistics".

adjust_obs_time

TRUE or FALSE (default) for whether to adjust the time listed in the observed data file to match the last dose administered. This only applies to multiple-dosing regimens. If TRUE, the graph will show the observed data overlaid with the simulated data such that the dose in the observed data was administered at the same time as the last dose in the simulated data. If FALSE, the observed data will start at whatever times are listed in the Excel file.

existing_exp_details

If you have already run extractExpDetails_mult to get all the details from the "Input Sheet" (e.g., when you ran extractExpDetails_mult you said exp_details = "Input Sheet" or exp_details = "all"), you can save some processing time by supplying that object here, unquoted. If left as NA, this function will run extractExpDetails behind the scenes to figure out some information about your experimental set up.

obs_data_files

TO BE DEPRECATED. This is the same argument as obs_to_sim_assignment; we just renamed it to try to be clearer about what the argument does and in what order you should list the files.

...

other arguments passed to the function extractConcTime

Details

Regarding dose intervals for observed data: The observed data files don't include information on dosing intervals or dose numbers, which makes it a little tricky to figure out which dose number a given time in an observed data file should have. If the compound IDs in the simulated data match those in the observed data, we will assume that the dosing intervals are the same. This was coded with the assumption that the dosing interval would be a round number (subjects aren't getting dosed on the half hour, for example), so if that's not the case, these dose number assignments will be off.

Value

Returns a large data.frame with multiple sets of concentration-time data, formatted the same way as output from the function extractConcTime

Examples

ConcTimeData <-
      extractConcTime_mult(
            sim_data_files = c("MyFile1.xlsx", "MyFile2.xlsx"),
            ct_dataframe = ConcTimeData,
            overwrite = FALSE,
            tissues = "unbound plasma")


shirewoman2/Consultancy documentation built on Feb. 18, 2025, 10 p.m.