demog_plot: Make plots for comparing populations across simulations

View source: R/demog_plot.R

demog_plotR Documentation

Make plots for comparing populations across simulations

Description

demog_plot will make a series of graphs comparing parameters across simulated and, if provided, observed populations. All the parameters available on the "Demographic Data" tab of a Simcyp Simulator output Excel file are available for making comparisons, and you must obtain the input data for making these graphs by running extractDemog. If you are looking at the distributions of one parameter across populations, you can display that with either boxplots or kernel density plots (like a smoothed histogram). If you want to compare across simulations the relationship between pairs of parameters, scatter plots will be used. (Only certain pairs of parameters are available; please see notes for the argument 'demog_parameters'.)

Usage

demog_plot(
  demog_dataframe,
  obs_demog_dataframe = NA,
  sims_to_include = "all",
  demog_parameters = NA,
  variability_display = "kernel density",
  colorBy_column,
  color_set = "default",
  color_labels = NA,
  legend_label_color = NA,
  legend_position = "right",
  graph_title = "Demographics",
  alpha = 0.8,
  ncol = NULL,
  nrow = NULL,
  facet_by_sex = TRUE,
  facet_column_additional,
  border_facets = TRUE,
  graph_labels = TRUE,
  pad_x_axis = TRUE,
  pad_y_axis = TRUE,
  return_indiv_graphs = FALSE,
  save_graph = NA,
  fig_height = 8,
  fig_width = 6
)

Arguments

demog_dataframe

the output from running extractDemog. If you would like to include observed data, you can either provide them in the same data.frame here and include a column titled "SorO" with "simulated" or "observed" to denote what kind of data that row contains or you can supply them separately to the argument 'obs_demog_dataframe', whichever is easiest for you. Either way, the column names in the observed data MUST MATCH the column names in the simulated data so that we know what's what. For example, if the observed data has a column named "WEIGHT", we need that to become "Weight_kg" to match what's in the simulated data.

obs_demog_dataframe

optionally supply observed demographic data for comparison. The columns you want to compare must have exactly the same names as the columns in the simulated data so that we know what's what.

sims_to_include

optionally specify which simulations to include. These must be included in demog_dataframe in the column "File".

demog_parameters

demographic parameters to include. Options:

  • Individual parameters, which will be displayed as either a kernel density plot or a boxplot depending on your choice for variability_display:

    • "Age" (age in years)

    • "AGP_gL" (alpha-1-acid glycoprotein in g/L; "AGP" is fine)

    • "BMI_kgm2" ("BMI" is fine)

    • "BrainWt_g" (brain weight; "Brain" is fine)

    • "BSA_m2" (body surface area in m2; "BSA" is fine)

    • "CardiacOut" (cardiac output in L/h; "Cardiac" is fine)

    • "Creatinine_umolL" (creatinine in umol/L; "Creatinine" is fine)

    • "GFR_mLminm2" (glomerular filtration rate in mL/min/m2; "GFR" is fine)

    • "Haematocrit" (haematocrit)

    • "Height_cm" (height in cm; "Height" is fine)

    • "HSA_gL" (human serum albumin in g/L; "HSA" is fine)

    • "KidneyWt_g" (kidney weight; "Kidney" is fine)

    • "LiverWt_g" (liver weight; "Liver" is fine)

    • "Sex" (graph shows the percent female by population)

    • "Weight_kg" (weight in kg; "Weight" is fine)

    • "RenalFunction" (renal function as calculated by the GFR in mL/min/m squared body surface area divided by the reference GFR for that sex: 120 for female subjects and 130 for male subjects as of V23 of the Simcyp Simulator)

    Comparisons of two parameters, which will create a scatter plot:

    • "Weight vs Height"

    • "Height vs Age"

    • "Weight vs Age"

    • "Sex vs Age"

If you want only a subset of those, list them in a character vector, e.g., demog_parameters = c("Age", "Height_cm", "Weight_kg"). Plots will be in the order you list.

variability_display

How should the variability be shown? Options are "kernel density" (default, a type of smoothed histogram) or "boxplot". Any demographic parameters requested in the form of "X vs Y", e.g., "weight vs height", will always be shown as scatter plots.

colorBy_column

the column in demog_dataframe that should be used for determining which color lines and/or points will be. This should be unquoted, e.g., colorBy_column = File. If left blank, we will color by the simulation file name.

color_set

the set of colors to use. Options:

"default"

a set of colors from Cynthia Brewer et al. from Penn State that are friendly to those with red-green colorblindness. The first three colors are green, orange, and purple. This can also be referred to as "Brewer set 2". If there are only two unique values in the colorBy_column, then Brewer set 1 will be used since red and blue are still easily distinguishable but also more aesthetically pleasing than green and orange.

"Brewer set 1"

colors selected from the Brewer palette "set 1". The first three colors are red, blue, and green.

"ggplot2 default"

the default set of colors used in ggplot2 graphs (ggplot2 is an R package for graphing.)

"rainbow"

colors selected from a rainbow palette. The default palette is limited to something like 6 colors, so if you have more than that, that's when this palette is most useful. It's not very useful when you only need a couple of colors.

"blue-green"

a set of blues fading into greens. This palette can be especially useful if you are comparing a systematic change in some continuous variable – for example, increasing dose or predicting how a change in intrinsic solubility will affect concentration-time profiles – because the direction of the trend will be clear.

"blues"

a set of blues fading from sky to navy. Like "blue-green", this palette can be especially useful if you are comparing a systematic change in some continuous variable.

"greens"

a set of greens fading from chartreuse to forest. Like "blue-green", this palette can be especially useful if you are comparing a systematic change in some continuous variable.

"purples"

a set of purples fading from lavender to aubergine. Like "blue-green", this palette can be especially useful if you are comparing a systematic change in some continuous variable.

"reds"

a set of reds from pink to brick. Great for showing systematic changes in a continuous variable.

"Tableau"

uses the standard Tableau palette; requires the "ggthemes" package

"viridis"

from the eponymous package by Simon Garnier and ranges colors from purple to blue to green to yellow in a manner that is "printer-friendly, perceptually uniform and easy to read by those with colorblindness", according to the package author

a character vector of colors

If you'd prefer to set all the colors yourself to exactly the colors you want, you can specify those colors here. An example of how the syntax should look: color_set = c("dodgerblue3", "purple", "#D8212D") or, if you want to specify exactly which item in colorBy_column gets which color, you can supply a named vector. For example, if you're coloring the lines by the compound ID, you could do this: color_set = c("substrate" = "dodgerblue3", "inhibitor 1" = "purple", "primary metabolite 1" = "#D8212D"). If you'd like help creating a specific gradation of colors, please talk to a member of the R Working Group about how to do that using colorRampPalette.

color_labels

optionally specify a character vector for how you'd like the labels for whatever you choose for colorBy_column to show up in the legend. For example, use color_labels = c("file 1.xlsx" = "healthy subjects", "file 2.xlsx" = "renally impaired subjects") to indicate which simulations represent what. The order in the legend will match the order designated here.

legend_label_color

optionally indicate on the legend something explanatory about what the colors represent. For example, if colorBy_column = File and legend_label_color = "Population", that will make the label above the file names in the legend more explanatory than just "File". The default is to use whatever the column name is for colorBy_column. If you don't want a label for this legend item, set this to "none".

legend_position

specify where you want the legend to be. Options are "left", "right" (default), "bottom", "top", or "none" if you don't want one at all. Note: If you include labels on your graphs (graph_labels = TRUE), we recommend NOT putting the legend on the left or the top because the labels wind up on the outside compared to the legend, and it just looks dorky.

graph_title

title to use on the plots

alpha

how transparent to make the points, with 0 being completely transparent and invisible so I don't know why you'd want that but, hey, you do you, to 1, which is fully opaque.

ncol

optionally specify the number of columns. If left as NULL, a reasonable guess will be used.

nrow

optionally specify the number of rows. If left as NULL, a reasonable guess will be used.

facet_by_sex

TRUE or FALSE (default) for whether to break up the graphs into facets based on the sex of the subjects

facet_column_additional

optionally specify an additional column to facet the graphs by horizontally. If facet_by_sex is set to TRUE, the graphs will be broken up vertically by sex.

border_facets

TRUE (default) or FALSE for whether to include a border around the facets if the graphs are broken up by the sex of the subjects

graph_labels

TRUE or FALSE for whether to include labels (A, B, C, etc.) for each of the small graphs.

pad_x_axis

optionally add a smidge of padding to the the x axis (default is TRUE, which includes some generally reasonable padding). If changed to FALSE, the y axis will be placed right at the beginning of your x axis. If you want a specific amount of x-axis padding, set this to a number; the default is c(0.02, 0.04), which adds 2% more space to the left side and 4% more to the right side of the x axis. If you only specify one number, we'll assume that's the percent you want added to the left side.

pad_y_axis

optionally add a smidge of padding to the y axis (default is TRUE, which includes some generally reasonable padding). As with pad_x_axis, if changed to FALSE, the x axis will be placed right at the bottom of your data, possibly cutting a point in half. If you want a specific amount of y-axis padding, set this to a number; the default is c(0.02, 0), which adds 2% more space to the bottom and nothing to the top of the y axis. If you only specify one number, we'll assume that's the percent you want added to the bottom.

return_indiv_graphs

TRUE or FALSE (default) for whether to return a list of each of the individual graphs

save_graph

optionally save the output graph by supplying a file name in quotes here, e.g., "Demographics comparisons.png". Acceptable graphical file extensions are "eps", "ps", "jpeg", "jpg", "tiff", "png", "bmp", or "svg". Do not include any slashes, dollar signs, or periods in the file name. Leaving this as NA means the file will not be saved to disk.

fig_height

figure height in inches; default is 8

fig_width

figure width in inches; default is 6

Value

a set of ggplot2 graphs

Examples

# none yet

shirewoman2/Consultancy documentation built on June 1, 2025, 6:05 p.m.