Yen-An Chen 2019-05-07
The purpose of this package is to take a series of data sets containing p biomarkers as represented in the columns of the data frame, and calculate a series of p-values for each column, as well as run a pooling method based on user input.
There are two options to install MetaAnalysis package from source,
Open terminal application and enter command:
bash
R CMD INSTALL [PATH_TO_FILE]/MetaAnalysis_0.1.tar.gz
In RStudio, enter command:
install.packages("[PATH_TO_FOLDER]/MetaAnalysis_0.1.tar", repos = NULL, type="source")
The package will not take less than two data frames and no more than five. In addition, the package will not run unless the number of columns between all the data frames is the same. The amount rows in each data set do not have to be the same, however, each row must be organized into a group in a column placed before all the biomarkers.
For each biomarker, the package will check the group difference (and gets a p-value). The statistical test this package choose to use will be described in next section.
The second function of this package is to take the collection of p-values and pool them by biomarker. Each biomarker will undergo pooling based on four tests (As highlighted in the “Pooling Methods” Section), the Fisher, Stouffer, Minimum p-value (minP), and Maximum p-value (maxP). If one of some of these tests is specified in the “method” parameter, the package will then provide a matrix of p-values based on these tests.
If number of groups in dataframe is greater than 2, we will first check its normality, if yes we then perform one way ANOVA, otherwise we use Kruskal Wallis test
If number of groups in dataframe equal to 2, we will also first check its normality, if yes then we check the equal variance properties, and do the T-test accordingly, if the data is not normal we will perform Wilcox rank sum test.
As the final result, user can see the statistical test performed, one way anova is denoted as Anova
, Kruskal Wallis test as Kruskal
, equal variance T test as T(eq)
, unequal variance T test as T(uneq)
, and Wilcox rank sum test as Wilcox
.
Fisher's Method The Fisher's method sums up the log-transformed p-values obtained from individual studies. Fisher's statistic $\chi_{Fisher}^{2} = -2 \sum_{i=1}^K \log(p_i)$ follows a χ2 distribution with 2K degrees of freedom under the null hypothesis. Smaller p-values contribute larger scores to the Fisher's statistic.
Stouffer's Method Stouffer's methods sums the inverse normal trasformed p-values. Stouffer's statistic $T = \sum_{i=1}^K \frac{\Phi^{-1}(p_i)}{\sqrt{K}}$ ( Φ is the cdf of a standard normal distribution) follows a standard normal distribution under the null hypothesis. Similar to Fisher's method, smaller p-values contribute more to the Stouffer's score.
Minimum p-value Method The minimun p-value method take the minimum p-value among the K studies as the test statistic. It follows a beta distribution with degrees of freedom α = 1 and β = K under the null hypothesis.
Maximum p-value Method The maximum p-value method take the maximum p-value as the test statistic. It follows a beta distribution with degrees of freedom α = K and β = 1 under the null hypothesis.
You can use: meta.analysis(...,method)
to compute pooled p value for different biomarkers by using four different methods: Fisher, Stouffer, Maxium p value and Minium P value.
method
can be a single string or a vector of the collection of the following options:
Pooling Commands
'Fisher'
'Stouffer'
'minP'
'maxP'
Please note that setting the type is case sensitive, and will default to all methods, i.e. c('Fisher', 'Stouffer', 'minP', 'maxP')
mata.analysis()
will automatically check if all inputs meet requirements.Consider a scenario: We have collected data from 3 different institutions, each data belongs to a certain group, and have 5 numerical values of it's different biomarkers. We simulate data for illustration.
library(MetaAnalysis)
set.seed(123) ## Fix seed for reproducible result
p <- 5
# dataframe1 has 5 columns and 3 groups
data1 <- data.frame(group=sample(1:3,200,replace=TRUE),matrix(rnorm(p*200),ncol=p))
head(data1)
#> group X1 X2 X3 X4 X5
#> 1 1 -0.71040656 -0.7152422 -0.60189285 -0.7282191 -1.0141142
#> 2 3 0.25688371 -0.7526890 -0.99369859 -1.5404424 -0.7913139
#> 3 2 -0.24669188 -0.9385387 1.02678506 -0.6930946 0.2995937
#> 4 3 -0.34754260 -1.0525133 0.75106130 0.1188494 1.6390519
#> 5 3 -0.95161857 -0.4371595 -1.50916654 -1.3647095 1.0846170
#> 6 1 -0.04502772 0.3311792 -0.09514745 0.5899827 -0.6245675
data2 and data3 are generated in the same fashion.
# dataframe1 has 5 columns and 2 groups
data2 <- data.frame(group=sample(1:2,150,replace=TRUE),matrix(rnorm(p*150),ncol=p))
# dataframe1 has 5 columns and 4 groups
data3 <- data.frame(group=sample(1:4,400,replace=TRUE),matrix(rnorm(p*400),ncol=p))
To do the meta analysis using this package, input dataframes sequentially in the first three parameters, and specify pooling method/methods to use if wanted (as introduced in previous paragraph). We can also customize significance level.
res = meta.analysis(data1, data2, data3, method=c('Fisher', 'Stouffer', 'minP', 'maxP'), alpha = 0.05)
Input data error: if number of dataframes are out of limit, or number of columns are not equal.
Any row with missing values will be dropped in the package
p.matrix
component of resluting variable.res$p.matrix
#> X1 X2 X3 X4 X5
#> Data1 0.8391408 0.3286983 0.9805006 0.3796453 0.6302030
#> Data2 0.2160643 0.8853164 0.8557419 0.2849488 0.3575808
#> Data3 0.6228494 0.6347043 0.5886171 0.5202245 0.5911930
res$test.performed
#> X1 X2 X3 X4 X5
#> Data1 Kruskal Anova Anova Kruskal Anova
#> Data2 Wilcox T(eq) T(eq) T(eq) T(eq)
#> Data3 Anova Anova Kruskal Kruskal Anova
pooled.p.matrix
element.res$pooled.p.matrix
#> X1 X2 X3 X4 X5
#> Fisher 0.6278110 0.7601185 0.9651910 0.4511969 0.6724216
#> Stouffer 0.6176331 0.7378441 0.9734358 0.3171520 0.5455228
#> minP 0.5182282 0.6974805 0.9303792 0.6343955 0.7348721
#> maxP 0.5908870 0.6938979 0.9426351 0.1407902 0.2502888
Suppose we want to see the visualized "trend" between each pooling method, we can plot the matrix using matplot
function
set.seed(123)
p <- 20
data1 <- data.frame(group=sample(1:3,20,replace=TRUE), matrix(rnorm(p*20),ncol=p))
data2 <- data.frame(group=sample(1:2,15,replace=TRUE), matrix(rnorm(p*15),ncol=p))
data3 <- data.frame(group=sample(1:4,40,replace=TRUE), matrix(rnorm(p*40),ncol=p))
data4 <- data.frame(group=sample(1:3,50,replace=TRUE), matrix(rnorm(p*50),ncol=p))
data5 <- data.frame(group=sample(1:2,40,replace=TRUE), matrix(rnorm(p*40),ncol=p))
res = meta.analysis(data1, data2, data3, data4, data5,
method=c('Fisher', 'maxP', 'Stouffer', 'minP'),
alpha = 0.05)
pooled.p.matrix = res$pooled.p.matrix
matplot(t(pooled.p.matrix), type='l', col = c('black', 'green', 'red', 'blue'), lty = 'solid')
legend(0, 0, inset=c(-10,-10), legend = rownames(pooled.p.matrix), xpd = T, col = c('black', 'green', 'red', 'blue'), lty = 'solid')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.