1. Introduction

The Power analysis module supports sample size estimation and power analysis for designing population-based or clinical metabolomic studies. As metabolomics is becoming a more accessible and widely used tool, methods to ensure proper experimental design are crucial to allow for accurate and robust identification of metabolites linked to disease, drugs, environmental or genetic differences. Traditional power analysis methods are unsuitable for metabolomics data as the high-throughput nature of this data means that it is highly dimensional and often correlated. Further, the number of metabolites identified greatly outnumbers the sample size. Thus, modified methods of power analysis are needed to address such concerns. One solution is to use the average power of all metabolites, and to correct for multiple testing using methods such as the false discovery rate (FDR) instead of raw p-values. MetaboAnalystR uses the SSPA R package to perform power analysis (van Iterson et al. 2009, PMID: 19758461).

2 Pilot Data

The power analysis uses the entire set of pilot data to estimate effect size distribution, average power, and sample size. The power analysis module accepts either a compound concentration table, spectral binned data, or a peak intensity table. The format of the data must be specified, identifying whether the samples are in rows or columns, and whether or not the data is paired. The data may either be .csv or .txt files. The pilot data follows the uploading, processing, filtering, and normalization steps as per other modules. Please refer to the "Introduction to MetaboAnalystR" vignette for details.

For the tutorial, we will be using a dataset consisting of concentrations of 77 urine samples from cancer patients (cachexic vs. control) measured by 1H NMR - Eisner et al. 2010.

mSet<-InitDataObjects("conc", "power", FALSE);
mSet<-Read.TextData(mSet, "https://www.metaboanalyst.ca/MetaboAnalyst/resources/data/human_cachexia.csv", "rowu", "disc");
mSet<-SanityCheckData(mSet);
mSet<-ReplaceMin(mSet);
mSet<-PreparePrenormData(mSet);
mSet<-Normalization(mSet, "NULL", "NULL", "NULL", "PIF_178", ratio=FALSE, ratioNum=20);

2.1 Power Analysis

To begin power analysis, you will use the InitPowerAnal function, which uses the SSPA function pilotData to create an object of class PilotData which stores all the values needed to perform the power analysis.

The PlotPowerStat will create an image containing four exploratory plots of the pilot-data, the top two will be bar charts, and the bottom two will be quantile-quantile (QQ) plots of the test-statistics and the p-values, respectively. This image is created to allow users to visually inspect that their data follows a standard normal distribution. For the test-statistic bar chart (top right), you should expect to see a bell-curve. For the p-value bar chart, you should expect that the two groups to perform the power analysis upon are different from eachother, resulting in the majority of the p-values hovering around 0. As for the qq-plots, if the test-statistics and the p-values follow a normal distribution, the data points will follow the straight line.

# Initiate the power analysis 
mSet<-InitPowerAnal(mSet, "NA")

# View the exploratory plots
mSet<-PlotPowerStat(mSet, "powerstat", format="png", dpi=300, width=NA)

Following the above steps, you will once again use the InitPowerAnal function, specifying the two groups for power analysis. After, you will use PerformPowerProfiling to perform the power analysis. The ultimate aim of power analysis is to determine the minimum sample size used to detect an effect size of interest. The sample size x statistical power relationship will be used to guide future study design based upon the pilot data. The function requires that you specify the false-discovery rate (FDR), and the maximum sample size (between 60-1000). The false-discovery rate will represent the significance criterion or the alpha level.

The PlotPowerProfile will create a plot of the predicted power curve. From these plots, you will be able to determine the best minimal sample size and its associated predicted power for future studies. This provides invaluable insight for proper experimental design, as well as strengthen the ability to detect true differences within a metabolomic data set.

mSet<-InitPowerAnal(mSet, "cachexic vs. control")

# Perform power analysis, specifying the FDR and the max sample size
mSet<-PerformPowerProfiling(mSet, 0.1, 200)

# Plot the power profile
mSet<-PlotPowerProfile(mSet, 0.1, 200, "powerprofile", format="png", dpi=300, width=NA)

3. Sweave Report

Following analysis, a comprehensive report can be generated which contains a detailed description of each step performed in the R package, embedded with graphical and tabular outputs. To prepare the sweave report, please use the PreparePDFReport function. You must ensure that you have the nexessary Latex libraries to generate the report (i.e. pdflatex, LaTexiT). The object created must be named mSet, and specify the user name in quotation marks.

PreparePDFReport(mSet, "My Name")


xia-lab/MetaboAnalystR3.0 documentation built on May 6, 2020, 11:03 p.m.