knitr::opts_chunk$set( collapse = TRUE, echo = FALSE, comment = "#>", fig.path = "man/figures/README-", out.width = "50%" ) library(tidyverse) library(SimcypConsultancy)
The SimcypConsultancy R package provides tools for analyzing and reporting PBPK data from the Simcyp Simulator. Our goals in making this package are to:
Important: The SimcypConsultancy R package is separate from the Simcyp R package and serves a different purpose. The SimcypConsultancy package does not interact with the simulator and instead is focused on report-writing tasks.
You can install the SimcypConsultancy package from GitHub like this:
devtools::install_github(repo = "shirewoman2/Consultancy", upgrade = "never")
Current version: r packageVersion("SimcypConsultancy")
NOTE: The SimcypConsultancy package requires that you have tidyverse loaded.
A great place to start for getting help:
launch_package_index()
You do not need to read this document from start to finish to follow it. Instead, please do skip around to only the parts that interest you.
TABLE OF CONTENTS
We're using example files in this document, but you generally can substitute whatever Simcyp Simulator files you're working with instead of the examples. For help on any functions, type a question mark followed by the name of the function. For example, here's how to get help on the ct_plot function:
?ct_plot
Let's dive into some examples!
As we mentioned, one of our goals was to increase accuracy. For this reason, several functions in the SimcypConsultancy package are set up for checking or reporting data in an automated fashion.
First, we'll extract data about how the simulations were set up from the simulation Excel results files and from the workspaces themselves.
Details <- extractExpDetails_mult(sim_data_files = NA, exp_details = "all")
Now that we've got that information pulled into memory, we can save the data in an Excel file to allow for easy comparisons between simulations. Say you want to know some details about how the experiment was set up such as the start and end times of the simulation, what the substrates and inhibitors were and the dosing regimens. You can use the function "annotateDetails" to do this sort of task. Here's how you'd get the first few rows with information on the trial design:
annotateDetails(Details, simulator_section = "Trial Design") %>% head() %>% formatTable_Simcyp(fontsize = 8)
How about some information on what parameters were used for setting up absorption but only for the substrate? We're again only going to show the first few rows of this, which is what "head() %>%" does in the code below.
annotateDetails(Details, simulator_section = "Absorption", compoundID = "substrate") %>% head() %>% formatTable_Simcyp(fontsize = 8)
To see all the possible experimental details, please see the file "All possible experimental details to use with extractExpDetails.csv" (Bold text is a link) or type this into the console:
view(ExpDetailDefinitions)
For more options and examples, please see the help files for extractExpDetails, extractExpDetails_mult, and annotateDetails and please see "Checking-simulation-experimental-details.docx" (bold text is a link)
If someone else set up some workspaces that they would like you to run but those workspaces included observed data XML files on OneDrive, the Simulator won't be able to find that file unless you change the path to include your user name instead of the original person's user name. Here's how to quickly change all the paths for all the workspaces in a folder so that the Simulator can find the observed data files with your name in the path:
make_xml_path_mine()
That will permanently change the workspaces, so please make sure that's what you want.
If you have run some simulations using the workflow tool in the Simulator, it will include a date/time stamp on the output file name. To remove those quickly, try this to remove that stamp from all the files in your current folder:
remove_file_timestamp()
The R Working Group has a few other tricks up our sleeves for changing workspace parameters quickly, but they do require some care since they permanently change workspace files and since only some parameters work well when changed in an automated fashion. If you find yourself needing to change something like the dose, the inhibition parameters, the induction parameters, etc. for a bunch of workspaces and don't want to spend hours tediously opening, changing, saving, and closing them all manually, please talk to Laura Shireman.
The function "ct_plot" will allow you to automatically graph almost any concentration-time data present in Simulator output Excel files. While this function has many options, the default settings will generally create decent graphs that comply with Consultancy Team report templates.
Let's make a graph of substrate concentration-time data in plasma, including some observed data, and let's save the output as "My conc-time graph 1.png".
First, extract the data from the Simulator output file and also from either the observed-data XML overlay file or the Excel PE template file. We're using the function "extractConcTime" to extract the data and the function "ct_plot" to make the graphs.
LMV <- extractConcTime(sim_data_file = "letermovir MD.xlsx", obs_data_file = "Observed data files/letermovir obs data.xlsx") ct_plot(ct_dataframe = LMV, figure_type = "trial means", save_graph = "My conc-time graph example 1.png")
Let's zoom in on the last two doses -- doses 7 and 8 here -- and change the figure type to get percentiles instead of trial means. We'll save this file, too, and this time, let's save it as a Word file so that we can get figure heading and caption data already filled in and ready for copying and pasting into a report. If you supply the output from the function "extractExpDetails_mult" here, you'll get a more informative figure heading and caption. Any text that you'll want to edit when you paste this into a report will be in bold.
ct_plot(ct_dataframe = LMV, time_range = "doses 7 to 8", figure_type = "percentiles", existing_exp_details = Details, save_graph = "My conc-time graph example 2.docx")
Next, let's make a graph of substrate concentration-time data in plasma with and without the presence of a compound in the Inhibitor 1 position in the simulation. First, just like with the other graph, we'll extract the data from our simulation Excel results file and then we'll actually make the graph.
Let's change the figure type to show only the mean predicted concentrations. Let's include a legend, label it for the fact that our perpetrator molecule is an inducer and not an inhibitor (the default), and let's only make the semi-log plot here. We'll save the file so that it is 3 inches high and 5 inches wide. (All of these options are described in more detail in the example files in the folder "Concentration-time plots 1 - one sim at a time".
MDZ <- extractConcTime(sim_data_file = "QD MDZ QD RIF.xlsx", obs_data_file = "Observed data files/QD MDZ QD RIF fake observed data.xlsx", existing_exp_details = Details) ct_plot(ct_dataframe = MDZ, time_range = "last dose", figure_type = "means only", linear_or_log = "log", legend_position = "right", legend_label = "Inducer", save_graph = "My conc-time graph example 3.docx", fig_height = 3, fig_width = 5)
For more options and examples, please see the help file for ct_plot (type ?ct_plot
into the console) and see "Concentration-time-plot-examples-1.docx".
You can also make concentration-time plots with multiple data sets overlaid for easy comparisons.
First, let's get some data. We're using a variation on the function "extractConcTime" called "extractConcTime_mult" that allows us to pull data for multiple files, tissues, and compounds all at once.
CT <- extractConcTime_mult( sim_data_files = c( "Example simulator output SD MDZ plus BID RTV.xlsx"), tissues = c("plasma", "blood", "unbound plasma"), ct_dataframe = "CT", compoundsToExtract = c("substrate", "inhibitor 1"), existing_exp_details = Details)
Next, graph those data, coloring the lines by whether the perpetrator was present and breaking up the graph into small multiples by the tissue and by what compound we're plotting. Let's give a sense of the variability of the data by showing transparent bands for the 5^th^ to 95^th^ percentiles. We'll save this graph as "My overlaid conc-time graph example 1.docx". Since we're saving this as a Word file, we'll again get some figure heading and caption text filled in.
ct_plot_overlay(ct_dataframe = CT, colorBy_column = Inhibitor, facet1_column = Tissue, facet2_column = Compound, figure_type = "percentile ribbon", color_set = "Set 1", save_graph = "My overlaid conc-time graph example 1.docx", fig_width = 6, fig_height = 6)
For more options and examples, please the help file for ct_plot_overlay (type ?ct_plot_overlay
into the console) and see "Concentration-time-plot-examples-3.docx".
You can apply most of the settings from the "ct_plot" function to another function, "enz_plot", which will automatically make enzyme abundance plots for you. Just as with the ct_plot function, we're first going to extract the data using a separate function, this one called "extractEnzAbund".
For this graph, let's include a legend, color the lines according to whether the perpetrator is present using the colors we want, note in the legend that the perpetrator is an inducer rather than the default inhibitor, and let's save the output, too.
CYP3A4_liver <- extractEnzAbund(sim_data_file = "Example simulator output - MD MDZ MD EFV.xlsx", existing_exp_details = Details) enz_plot(CYP3A4_liver, figure_type = "percentile ribbon", line_type = "solid", legend_position = "right", legend_label = "Inducer", line_color = c("dodgerblue3", "darkorchid4"), save_graph = "Enzyme abundance graph example.png", fig_height = 3.5, fig_width = 6)
For more options and examples, please the help file for enz_plot (type ?enz_plot
into the console) and see "Enzyme-abundance-plot-examples.docx".
One of the tasks we would like to automate and make less prone to copy/paste errors is creating tables for reports that summarize simulated PK parameters. The function pk_table does that.
For this, we'll use a file from a hands-on workshop demonstration with letermovir.
pk_table( sim_data_file = "letermovir MD.xlsx", existing_exp_details = Details) %>% formatTable_Simcyp()
If you save the output to a Word file, which is what we recommend, you will automatically get a table heading and caption, which will include information about which compound, tissue, and simulation was used to obtain the data, so the table will only include the columns "Statistic", "AUCtau" and "Cmax".
The default setting is to list geometric means and CVs, but you can see arithmetic instead by specifying that with "mean_type". Additionally, let's see the range of trial means (includeTrialMeans = TRUE) but not the CVs (includeCV = FALSE).
pk_table( sim_data_file = "letermovir MD.xlsx", mean_type = "arithmetic", includeTrialMeans = TRUE, includeCV = FALSE, existing_exp_details = Details, checkDataSource = FALSE) %>% formatTable_Simcyp()
If a perpetrator molecule was included, the table will automatically pull parameters sensible for that scenario. This time, let's additionally ask to check the data source to check where in the simulation Excel results file the data are from. First, here's the table:
MyTable <- pk_table( sim_data_file = "QD MDZ QD RIF.xlsx", existing_exp_details = Details, checkDataSource = TRUE) MyTable$Table %>% formatTable_Simcyp()
Now, to see where those parameters came from, let's look at the other item that got included in our output when we said "checkDataSource = TRUE".
MyTable$QC %>% formatTable_Simcyp(fontsize = 8)
This is a data.frame listing the files, tabs, columns, and rows where the data were found for the purposes of QCing.
For more options and examples, please see the help files for pk_table and see "PK-tables.docx".
If you are interested in comparing the results from two simulations, e.g., comparing fasted vs. fed or healthy subjects vs. subjects with hepatic impairment, please see the instruction file "Calculating-PK-ratios.docx"
This section under construction. We'll add examples here soon.
There are several options for setting up forest plots; here is one example using example data included in the package: BufForestData_20mg.
forest_plot(forest_dataframe = BufForestData_20mg, y_axis_labels = Inhibitor1, y_axis_title = "Perpetrator", graph_title = "Effects of various DDI perpetrator\ndrugs on bufuralol PK", show_numbers_on_right = TRUE, rel_widths = c(3, 1), save_graph = "Bufuralol forest plot.png", fig_height = 6, fig_width = 7.25)
For more information, please see the instruction file "Examples-for-making-forest-plots.docx"
One quick way to check how well your model is capturing observed data is to visually compare simulated PK values to observed. The function so_graph
will use input from making PK tables (from the function pk_table
) to create these graphs, including making Guest-style graphs when the PK parameter in question is a ratio for a drug-drug interaction.
so_graph(PKtable = SOdata, axis_title_y = "Predicted", axis_title_x = "Observed")
For more information, please see the instruction file "Examples-for-simulated-vs-observed-graphs.docx"
The SimcypConsultancy package function inductFit
can fit four possible induction models to in vitro induction data:
Equation 1. E~max~ model
$$ \text{fold induction} = 1 + \frac{E_{max} \times I}{EC_{50} + I} $$
where I is the concentration of the inducer, E~max~ is the maximum change in fold induction, and EC~50~ is the concentration of the inducer that elicits half the maximum fold-change in induction.
Use model = "Emax"
in the inductFit
call to fit this model to your data.
Equation 2. E~max~ slope model
$$ \text{fold induction} = 1 + E_{max} \times \frac{I^\gamma}{EC_{50}^\gamma + I^\gamma} $$
where I is the concentration of the inducer, E~max~ is the maximum change in fold induction, EC~50~ is the concentration of the inducer that elicits half the maximum fold-change in induction, and gamma is the Hill equation coefficient describing the slope.
Syntax for function: model = "EmaxSlope"
Equation 3. Slope model
$$ \text{fold induction} = 1 + I \times n $$
where I is the concentration of the inducer and n is the slope.
Syntax for function: model = "slope"
Equation 4. Sigmoidal three-parameter model
$$ \text{fold induction} = \frac{Ind_{max}}{1 + e^{ \frac{IndC_{50}-I}{\gamma}}} $$
where I is the concentration of the inducer, Ind~max~ is the maximum fold induction, IndC~50~ is the concentration of the inducer that elicits half the maximum fold induction, and gamma is the Hill equation coefficient describing the slope.
Syntax for function: model = "Sig3Param"
Important note: The sigmoidal three-parameter model is the ONLY model that determines Ind~max~ rather than E~max~. The Simcyp Simulator uses Ind~max~, so please note that Ind~max~ = E~max~ + 1
Let's fit some induction data to each of 4 possible models: Indmax, Indmax slope, slope, or sigmoid 3 parameter. For details on what these are, please see the instructions fitting induction data.
For now, we'll call on some pre-existing dummy data we made up for the SimcypConsultancy package called IndData
.
We'll make an object "MyIndFits" to store the output and then look at that output one piece at a time.
MyIndFits <- inductFit(IndData, conc_column = Concentration_uM, fold_change_column = FoldInduction, model = "all", include_fit_stats = FALSE, rounding = "significant 3")
Now that we have run the function, let's see the graphs.
MyIndFits$Graph
You can adjust the look of the graphs; please see the help file or the example Word document for examples.
Let's see the fitted parameters:
MyIndFits$Fit %>% formatTable_Simcyp()
If you're interested in seeing how the demographics of a population vary -- either for comparing multiple simulated populations or a simulated population to an observed population -- there are a couple of data visualization tools in the package to help you do that. We'll focus here on the one for comparing multiple simulated populations, and we'll compare healthy subjects to subjects with varying degrees of liver disease.
First, we'll extract demographic data from our simulations. All of these included the "Demographic Data" tab in the results, which is where R is looking for demographic information. If you don't include that in your results, this will not work.
Demog <- extractDemog(sim_data_files = c("mdz-5mg-sd-hv.xlsx", "mdz-5mg-sd-cpa.xlsx", "mdz-5mg-sd-cpb.xlsx", "mdz-5mg-sd-cpc.xlsx"), demog_dataframe = Demog)
Here's an example of the kind of visual comparisons we can make, focusing on just two parameters: a boxplot comparing liver weights and a scatter plot of height vs. overall body weight.
demog_plot_sim(demog_dataframe = Demog, colorBy_column = File, color_labels = c("mdz-5mg-sd-hv.xlsx" = "healthy", "mdz-5mg-sd-cpa.xlsx" = "Child-Pugh A", "mdz-5mg-sd-cpb.xlsx" = "Child-Pugh B", "mdz-5mg-sd-cpc.xlsx" = "Child-Pugh C"), color_set = "greens", demog_parameters = c("LiverWt_g", "Height vs weight"), variability_display = "boxplot", ncol = 1, save_graph = "Demographics - hepatic impairment.png", fig_height = 6, fig_width = 5)
Note: We're working on the legend but having some trouble with it repeating things at the moment.
For more information and examples, please see Examples-for-checking-subject-demographics.docx
Under construction. See the help file for make_Simcyp_inputs_table
.
Under construction. See the help file for trial_means_plot
and see the instruction file "Trial-means plots.docx".
Example:
ObservedCmax <- data.frame(PKparameter = "Cmax_dose1", ObsValue = c(20, 22.5, 16), ObsVariability = c("10 to 29", "18-25", "12-30"), Study = c("Kolev et al., 2025", "Dinh et al., 2023", "Thakur et al., 2020")) trial_means_plot(sim_data_file = "mdz-5mg-sd.xlsx", color_option = "by study", color_set = "viridis", point_shape = 16, y_axis_limits_lin = c(0, 60), observed_PK = ObservedCmax, legend_position = "bottom", save_graph = "Cmax trial means plot.png", fig_height = 5, fig_width = 6)
Under construction. See the help file for make_simulation_directory
.
Example:
make_simulation_directory(existing_exp_details = Details, save_table = "simulation directory.xlsx")
We'll supply the Excel file with the SA data to the argument "SA_file", we'll tell R which dependent variable we want to see (Cmax), give the graph a title (the "\n" bit is how we can insert a manual carriage return), and then specify whether to save the graph by supplying a file name to "save_graph".
sensitivity_plot(SA_file = "SA on fa - mdz 5mg sd.xlsx", dependent_variable = "Cmax", graph_title = "Sensitivity analysis:\nfa effect on Cmax", save_graph = "SA graph.png")
Other dependent variables are also possible; please see the help file for all options.
Here's the same sensitivity analysis file but looking at oral clearance and setting something manually for the independent variable label on the x axis.
sensitivity_plot(SA_file = "SA on fa - mdz 5mg sd.xlsx", dependent_variable = "CLpo", ind_var_label = "fa for the substrate")
And here is an example of plotting plasma concentrations from a sensitivity-analysis file:
sensitivity_plot(SA_file = "SA on fa - mdz 5mg sd.xlsx", dependent_variable = "plasma", linear_or_log = "both vertical", time_range = c(0, 12))
Please see the help file for more information by typing ?sensitivity_plot
into the console.
Below are descriptions of all the functions in the package, and the functions most users will want to use most frequently are in bold and listed first in each section.
NB: Most functions that actually change workspaces are only available in the beta version of the package.
If you encounter a bug, if you would like help using any of these functions, or if there's an additional feature you would like to have, please contact a member of the R Working Group.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.