designSampleSizeHypothesisTestingPlot: Sample size calculation plot for hypothesis testing

Description Usage Arguments Details Value Author(s) Examples

View source: R/designSampleSizeHypothesisTestingPlot.R

Description

Calculate sample size for future experiments based on intensity-based linear model.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
designSampleSizeHypothesisTestingPlot(
  data,
  annotation,
  log2Trans = FALSE,
  desired_FC = "data",
  protein_rank = "mean",
  protein_select = "high",
  protein_quantile_cutoff = 0,
  FDR = 0.05,
  power = 0.9,
  height = 5,
  width = 5,
  address = ""
)

Arguments

data

Protein abundance data matrix. Rows are proteins and columns are biological replicates (samples).

annotation

Group information for samples in data. ‘Run’ for MS run, ‘BioReplicate’ for biological subject ID and ‘Condition’ for group information are required. ‘Run’ information should be the same with the column of ‘data’. Multiple ‘Run’ may come from same ‘BioReplicate’.

log2Trans

Default is FALSE. If TRUE, the input ‘data’ is log-transformed with base 2.

desired_FC

the range of a desired fold change. The first option (Default) is "data", indicating the range of the desired fold change is directly estimated from the input ‘data’, which are the minimal fold change and the maximal fold change in the input ‘data’. The second option is a vector which includes the lower and upper values of the desired fold change (For example, c(1.25,1.75)).

protein_rank

The standard to rank the proteins in the input ‘data’. It can be 1) "mean" of protein abundances over all the samples or 2) "sd" (standard deviation) of protein abundances over all the samples or 3) the "combined" of mean abundance and standard deviation. The proteins in the input ‘data’ are ranked based on ‘protein_rank’ and the user can select a subset of proteins for hypothesis testing and sample size calculation.

protein_select

select proteins with "low" or "high" mean abundance or standard deviation (variance) or their combination for hypothesis testing and sample size calculation. The variance (and the range of desired fold change if desiredFC = "data") will be estimated from the selected proteins. If ‘protein_order = "mean"’ or protein_order = "sd"', ‘protein_select’ should be "low" or "high". Default is "high", indicating high abundance or standard deviation proteins are selected. If ‘protein_order = "combined"’, ‘protein_select’ has two elements. The first element corrresponds to the mean abundance. The second element corrresponds to the standard deviation (variance). Default is c("high", "low") (select proteins with high abundance and low variance).

protein_quantile_cutoff

Quantile cutoff(s) for selecting protiens for hypothesis testing and sample size calculation. For example, when ‘protein_rank="mean"’, and ‘protein_select="high"’, ‘protein_quantile_cutoff=0.1’ Proteins are ranked based on their mean abundance across all the samples. Then, the top 10 Default is 0.0, which means that all the proteins are used. If ‘protein_rank = "combined"’, ‘protein_quantile_cutoff’' has two cutoffs. The first element corrresponds to the cutoff for mean abundance. The second element corrresponds to the cutoff for the standard deviation (variance). Default is c(0.0, 1.0), which means that all the proteins will be used.

FDR

a pre-specified false discovery ratio (FDR) to control the overall false positive. Default is 0.05

power

a pre-specified statistical power which defined as the probability of detecting a true fold change. You should input the average of power you expect. Default is 0.9

height

Height of the saved pdf file. Default is 5.

width

Width of the saved pdf file. Default is 5.

address

The name of folder that will store the results. Default folder is the current working directory. The other assigned folder has to be existed under the current working directory. An output pdf file is automatically created with the default name of ‘HypothesisTestingSampleSizePlot.pdf’. The command address can help to specify where to store the file as well as how to modify the beginning of the file name. If address=FALSE, plot will be not saved as pdf file but showed in window.

Details

The function fits intensity-based linear model on the input ‘data’. Then it uses the fitted models and the fold changes estimated from the models to calculate sample size for hypothesis testing through ‘designSampleSize’ function from MSstats package. It outputs the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.

Value

sample size plot for hypothesis testing : the plot for the minimal number of biological replciates per condition to acquire the expected FDR and power under different fold changes.

data frame with columns desiredFC, numSample, FDR, power and CV

Author(s)

Ting Huang, Meena Choi, Olga Vitek

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
data(OV_SRM_train)
data(OV_SRM_train_annotation)

# sample size plot for hypothesis testing
HT_res <- designSampleSizeHypothesisTestingPlot(data = OV_SRM_train,
                                                annotation= OV_SRM_train_annotation,
                                                log2Trans = FALSE,
                                                desired_FC = "data",
                                                protein_rank = "mean",
                                                protein_select = "high",
                                                protein_quantile_cutoff = 0.0,
                                                FDR=0.05,
                                                power=0.9)

# data frame with columns desiredFC, numSample, FDR, power and CV
head(HT_res)

Vitek-Lab/MSstatsSampleSize documentation built on Aug. 28, 2020, 10:39 a.m.