designSampleSizeClassificationPlots: Visualization for sample size calculation in classification

Description Usage Arguments Details Value Author(s) Examples

View source: R/designSampleSizeClassificationPlots.R

Description

To illustrate the mean classification accuracy and protein importance under different sample sizes through predictive accuracy plot and protein importance plot.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
designSampleSizeClassificationPlots(
  data,
  list_samples_per_group,
  num_important_proteins_show = 10,
  protein_importance_plot = TRUE,
  predictive_accuracy_plot = TRUE,
  x.axis.size = 10,
  y.axis.size = 10,
  protein_importance_plot_width = 3,
  protein_importance_plot_height = 3,
  predictive_accuracy_plot_width = 4,
  predictive_accuracy_plot_height = 4,
  ylimUp_predictive_accuracy = 1,
  ylimDown_predictive_accuracy = 0,
  address = ""
)

Arguments

data

A list of outputs from function designSampleSizeClassification. Each element represents the results under a specific sample size. The input should include at least two simulation results with different sample sizes.

list_samples_per_group

A vector includes the different sample sizes simulated. This is required. The number of simulated sample sizes in the input ‘data’ should be equal to the length of list_samples_per_group

num_important_proteins_show

The number of proteins to show in protein importance plot.

protein_importance_plot

TRUE(default) draws protein importance plot.

predictive_accuracy_plot

TRUE(default) draws predictive accuracy plot.

x.axis.size

Size of x-axis labeling in predictive accuracy plot and protein importance plot. Default is 10.

y.axis.size

Size of y-axis labels in predictive accuracy plot and protein importance plot. Default is 10.

protein_importance_plot_width

Width of the saved pdf file for protein importance plot. Default is 3.

protein_importance_plot_height

Height of the saved pdf file for protein importance plot. Default is 3.

predictive_accuracy_plot_width

Width of the saved pdf file for predictive accuracy plot. Default is 4.

predictive_accuracy_plot_height

Height of the saved pdf file for predictive accuracy plot. Default is 4.

ylimUp_predictive_accuracy

The upper limit of y-axis for predictive accuracy plot. Default is 1. The range should be 0 to 1.

ylimDown_predictive_accuracy

The lower limit of y-axis for predictive accuracy plot. Default is 0.0. The range should be 0 to 1.

address

the name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of ‘PredictiveAccuracyPlot.pdf’ and ‘ProteinImportancePlot.pdf’. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.

Details

This function visualizes for sample size calculation in classification. Mean predictive accuracy and mean protein importance under each sample size is from the input ‘data’, which is the output from function designSampleSizeClassification.

To illustrate the mean predictive accuracy and protein importance under different sample sizes, it generates two types of plots in pdf files as output: (1) The predictive accuracy plot, The X-axis represents different sample sizes and y-axis represents the mean predictive accuracy. The reported sample size per condition can be used to design future experiment

(2) The protein importance plot includes multiple subplots. The number of subplots is equal to ‘list_samples_per_group’. Each subplot shows the top 'num_important_proteins_show' most important proteins under each sample size. The Y-axis of each subplot is the protein name and X-axis is the mean protein importance under the sample size.

Value

predictive accuracy plot is the mean predictive accuracy under different sample sizes. The X-axis represents different sample sizes and y-axis represents the mean predictive accuracy.

protein importance plot includes multiple subplots. The number of subplots is equal to ‘list_samples_per_group’. Each subplot shows the top ‘num_important_proteins_show’ most important proteins under each sample size. The Y-axis of each subplot is the protein name and X-axis is the mean protein importance under the sample size.

Author(s)

Ting Huang, Meena Choi, Olga Vitek.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
data(OV_SRM_train)
data(OV_SRM_train_annotation)

# simulate different sample sizes
# 1) 10 biological replicats per group
# 2) 25 biological replicats per group
# 3) 50 biological replicats per group
# 4) 100 biological replicats per group
list_samples_per_group <- c(10, 25, 50, 100)

# save the simulation results under each sample size
multiple_sample_sizes <- list()
for(i in seq_along(list_samples_per_group)){
    # run simulation for each sample size
    simulated_datasets <- simulateDataset(data = OV_SRM_train,
                                          annotation = OV_SRM_train_annotation,
                                          num_simulations = 10, # simulate 10 times
                                          expected_FC = "data",
                                          list_diff_proteins =  NULL,
                                          select_simulated_proteins = "proportion",
                                          protein_proportion = 1.0,
                                          protein_number = 1000,
                                          samples_per_group = list_samples_per_group[i],
                                          simulate_valid = FALSE,
                                          valid_samples_per_group = 50)

    # run classification performance estimation for each sample size
    res <- designSampleSizeClassification(simulations = simulated_datasets,
                                          parallel = TRUE)

    # save results
    multiple_sample_sizes[[i]] <- res
}

## make the plots
designSampleSizeClassificationPlots(data = multiple_sample_sizes,
                                    list_samples_per_group = list_samples_per_group)

MSstatsSampleSize documentation built on Nov. 8, 2020, 4:53 p.m.