designSampleSizeHypothesisTestingPlot: Sample size calculation plot for hypothesis testing

Description Usage Arguments Details Value Author(s) Examples

View source: R/designSampleSizeHypothesisTestingPlot.R

Description

Calculate sample size for future experiments based on intensity-based linear model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
designSampleSizeHypothesisTestingPlot(
  data,
  annotation,
  desired_FC = "data",
  select_testing_proteins = "proportion",
  protein_proportion = 1,
  protein_number = 1000,
  FDR = 0.05,
  power = 0.9,
  height = 5,
  width = 5,
  address = ""
)

Arguments

data

Protein abundance data matrix. Rows are proteins and columns are biological replicates (samples).

annotation

Group information for samples in data. ‘BioReplicate’ for sample ID and ‘Condition’ for group information are required. ‘BioReplicate’ information should match with column names of ‘data’.

desired_FC

the range of a desired fold change. The first option (Default) is "data", indicating the range of the desired fold change is directly estimated from the input ‘data’, which are the minimal fold change and the maximal fold change in the input ‘data’. The second option is a vector which includes the lower and upper values of the desired fold change (For example, c(1.25,1.75)).

select_testing_proteins

the standard to select the proteins for hypothesis testing and sample size calculation. The variance (and the range of desired fold change if desiredFC = "data") for sample size calculation will be estimated from the selected proteins. It can be 1) "proportion" of total number of proteins in the input data or 2) "number" to specify the number of proteins. "proportion" indicates that user should provide the value for ‘protein_proportion’ option. "number" indicates that user should provide the value for ‘protein_number’ option.

protein_proportion

Proportion of total number of proteins in the input data to test. For example, input data has 1,000 proteins and user selects ‘protein_proportion=0.1’. Proteins are ranked in decreasing order based on their mean abundance across all the samples. Then, 1,000 * 0.1 = 100 proteins will be selected from the top list to test. Default is 1.0, which meaans that all the proteins will be used.

protein_number

Number of proteins to test. For example, ‘protein_number=1000’. Proteins are ranked in decreasing order based on their mean abundance across all the samples and top ‘protein_number’ proteins will be selected to test. Default is 1000.

FDR

a pre-specified false discovery ratio (FDR) to control the overall false positive. Default is 0.05

power

a pre-specified statistical power which defined as the probability of detecting a true fold change. You should input the average of power you expect. Default is 0.9

height

Height of the saved pdf file. Default is 5.

width

Width of the saved pdf file. Default is 5.

address

The name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of ‘HypothesisTestingSampleSizePlot.pdf’. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.

Details

The function fits intensity-based linear model on the input ‘data’. Then it uses the fitted models and the fold changes estimated from the models to calculate sample size for hypothesis testing through ‘designSampleSize’ function from MSstats package. It outputs the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.

Value

sample size plot for hypothesis testing : the plot for the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.

data frame with columns desiredFC, numSample, FDR, power and CV

Author(s)

Ting Huang, Meena Choi, Olga Vitek

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
data(OV_SRM_train)
data(OV_SRM_train_annotation)

# sample size plot for hypothesis testing
HT_res <- designSampleSizeHypothesisTestingPlot(data = OV_SRM_train,
                                                annotation= OV_SRM_train_annotation,
                                                desired_FC = "data",
                                                select_testing_proteins = "proportion",
                                                protein_proportion = 1.0,
                                                protein_number = 1000,
                                                FDR=0.05,
                                                power=0.9)

# data frame with columns desiredFC, numSample, FDR, power and CV
head(HT_res)

MSstatsSampleSize documentation built on Nov. 8, 2020, 4:53 p.m.