
subscreen (subgroup screening) package has been developed to systematically analyze data, e.g., from clinical trials, for subgroup effects and visualize the outcome for all evaluated subgroups simultaneously. The visualization is done by a shiny application called Subgroup Explorer. Typically, shiny applications are hosted on a dedicated shiny server, but due to the sensitivity of patient data in clinical trials, which are usually protected by informed consents, the upload of this data to an external server is prohibited. Therefore, we provide our tool as a stand-alone application that can be launched from any local machine on which the data is stored.
Table of content (click on topic to jump to a specific chapter):
subscreencalc Input subscreencalc Output subscreenvi subscreenshow
Identifying outcome relevant subgroups has now become as simple as possible! The formerly lengthy and tedious search for the needle in a haystack is replaced by a single, comprehensive and coherent presentation.
The central result of a subgroup screening is a diagram, in which each dot stands for a subgroup. The diagram can show thousands of them. The position of the dot in the diagram is determined by the sample size of the subgroup and the statistical measure of the treatment effect in the respective subgroup. The sample size is shown on the horizontal axis while the treatment effect is displayed on the vertical axis. Furthermore, the diagram shows the line of the overall study results. For small subgroups, which are found on the left side of the plot, larger random deviations from the mean study effect are expected, while the deviation from the study mean for larger subgroups tends to be smaller. Therefore, the dots in the figure are expected to form a funnel for studies with no conspicuous subgroup effects. Any deviations from this funnel shape may hint towards conspicuous subgroups.
data data frame with study data
eval_function name of the evaluation function for data analysis
subjectid character of variable name in data that contains the subject identifier, defaults to 'subjid'
factors character vector containing the names of variables that define the subgroups (required)
max_comb maximum number of factor combination levels to define subgroups, defaults to 3
nkernel number of kernels for parallelization (defaults to 1)
par_functions character vector of names of functions used in eval_function to be exported
to cluster (needed only if nkernel > 1)
verbose logical value to switch on/off output of computational information (defaults to TRUE)
factorial logical value to switch on/off calculation of factorial contexts (defaults to FALSE)
use_complement logical value to switch on/off calculation of complement subgroups (defaults to FALSE)
hazardratio <- function(D) {
HRpfs <- tryCatch(exp(coxph(Surv(D$timepfs, D$event.pfs) ~ D$trt )$coefficients[[1]]),
warning=function(w) {NA})
HRpfs <- 1/HRpfs
HR.pfs <- round(HRpfs, 2)
HR.pfs[HR.pfs > 10] <- 10
HR.pfs[HR.pfs < 0.00001] <- 0.00001
data.frame( HR.pfs)
}
which will add a target variable column named `HR.pfs`.
data data frame containing the dependent and independent variables.
y name of the column in data that contains the dependent variable.
cens name of the column in data that contains the censoring variable,
if y is an event time (default=NULL).
trt name of the column in data that contains the treatment variable (default=NULL).
x vector that contains the names of the columns in data with the independent
variables (default=NULL, i.e. all remaining variables)
scresults SubScreenResult object with results from a subscreencalc call
variable_importance variable importance object calculated via subscreenvi to unlock
'variable importance'-tab in the app
host host name or IP address for shiny display
port port number for shiny display
NiceNumbers list of numbers used for a 'nice' scale
windowTitle title which is shown for the browser tab
graphSubtitle subtitle for explorer graph
favour_label_verum_name verum name for label use in explorer graph
favour_label_comparator_name comparator name for label use in explorer graph
None of the parameter is required to start the app.
By entering subscreenshow() to the R console, the app starts on the upload screen.
The app itself will be explained in more detailed version in chapter 3.
If the SubScreenResult object is already entered via the 'scresults' parameter in subscreenshow, the app starts directly on the Explorer page. In this case a third input mode called 'Uploaded data via function call' appears on the upload page. Since it is possible to use different data sets in the same session, you can use this option the re-upload the data set used in the original function call or just to see the data set information.
Since the factorial context calculation changed due to recent versions, the check
for 'context calculation performed' also includes a check for the newest package version.
For older versions features like the ASMUS-tab are no longer supported.
It is important to note that the subgroup screening does not only consider subgroups, which are defined by one single factor, e.g., sex or age-group. The strength of the Subgroup Explorer is that it considers combinations, e.g., 'old' men from Europe or 'young' Asian women. It is possible to analyze all combinations of two factors, three factors, four factors, etc. Typically it make sense to limit this to a maximum of five factors, since combinations of more than five factors define subgroups which are often empty, extremely small in size, or difficult to interpret.
By clicking on a single dot, a subgroup is selected and appears in red. If multiple points are close to each other, a small area around the mouse click is detected and a list of selected subgroups appears. In here one specific subgroup can be selected and apperas in red. For all points an information box can be shown by using mouse hover.
By selecting a subgroup several lists, which include more information about the selected subgroup, appear below the plot.
An interaction plot appears on the right side of the main plot, if a subgroup has a complete (or pseudo-complete) factorial context.
For more details about the concept of a factorial context see chapter 3.5.1.
Several options for the appearance of the diagram are available and explained in chapter 3.2.4.
To save/memorize a subgroup the 'Memorize'-button in the table of the 'Selected Subgroups'-tab can be used. All memorized subgroups appear in green in the Subgroup Explorer graph.
With the switch button above the list of memorized subgroups the label of the memorized subgroups can be in or excluded in the graph.
By switching this button to on, the subgroup information for all saved subgroups are drawn into the diagram. Since their might be a space limitation for the information texts for too many subgroups, it could help to increase the axes sizes via 'Display Options'.
The drop-down combo boxes in the Variable options allow switching between different target variables (y-axis), changing the reference variable (x-axis, in general the number of subjects/observations), or selecting a specific subgroup factor and a corresponding value to be highlighted in the plot ('Subgroup Filter').
Which level of detail with regard to the subgroup factor combinations should be displayed can be chosen via the 'Subgroup level(s)'-slider.
The maximum of this slider can be changed with the parameter `max_comb` in `subscreencalc()`.
The brightness of the gray dots corresponds to the number of factors in the graph. Dots with more factors are brighter than those with less factors.
It is also possible to change the limits of the axes and if possible (only if all values of the target variable are positive) change the y-axis to logarithmic scale.
All options are provided with small help texts, which can be shown by hovering the question mark symbol next to them.
Within the Display options the user can change dot size and click/select radius.
The dot size can be changed to be related to the number of subjects for each subgroup.
The colour brigthness is also adjustable.
A high amount of colors can be individualized by the user. Beside the dot colors the overall app appearance can be changed to a 'print'-version where the background appears in light gray.
The provided information is reliable when the sizes of the subgroups of a factorial context play an important role, but this is not the only criterion. The relation of the treatment group sizes within the subgroups play role as well.
A big subgroup with drastically imbalanced treatment groups may not considered to provide less reliable information than a smaller subgroup with nearly balanced treatment groups.
However, ASMUS is based on the size of subgroups only for simplicity reasons.
Although the reliability of information does not say anything about the reproducibility of the treatment effect it makes sense to include it into the screening strategy.
Even if a treatment effect is remarkable and the subgroup-defining factors explain the treatment effect reasonably the subgroup is not worth pursuing because the reliability of the provided information is poor.
Again it is difficult to define a crisp cut point between subgroup sizes sufficiently big to provide reliable information and those which do not.
It is much easier to define two numbers, rel1 and rel2, such that subgroup sizes
less than rel1 are truly too small greater than rel2 are truly big enough
between rel1 and rel2 are big enough with a certain degree of truth.
The truth value for the remarkability of the treatment effect and the truth value for the reliability of the provided information are combined with a logical “and”.
From the many proposal, which can be found in the literature, to calculate a logical “and” in fuzzy logic (minimum, algebraic product, drastic product, etc. ), we selected the algebraic product because it is simple and convex.
The convexity is appreciated because a lower truth-value for the remarkability requires a compensation with a higher truth-value for the reliability and vice versa.
The central question is when do the subgroup-defining factors explain the treatment effect reasonably.
It can only be answered by experts in pharmacology.
So in ASMUS, the user selects whether the assessability is based on the
complete factorial context only or complete and pseudo-complete factorial contexts.
The user also specifies the numbers a and b for the remarkability and the reliability criterion.
For a given subgroup it is determined if it is assessable.
The truth values for the treatment effect and the size of the subgroup
are calculated and multiplied (algebraic product for a fuzzy logical “and”).
If the subgroup is assessable and the product of the truth values exceeds a user defined threshold the subgroup is proposed to be evaluated whether its subgroup defining factors explain the treatment effect reasonably.
The direction of remarkability can be changed via tickbox.
The multiplicity value requires a value between 0 and 1 and influence the steepness and shape of the curve.
When all settings have been made, the number of subgroups which are remarkable and reliable regarding the selection are displayed and the 'Continue'-button appears in green.
After clicking continue, the second page of ASMUS opens where the remarkable and reliable subgroups can be analysed in more detail.

Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.