In smccrave/PMmccrave1: AMS 597 Final Project

Description

This package was written to complete a final project for AMS 597. Students were tasked with writing an R packafe and implementing statistical methods described in the paper "A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach", Stat Med (2013). This paper discussed several methods for handling complete samples and partially matched samples.

Installation

To install this package you must access it through github. You may install the package PMmccrave by running the following code:

#install.packages("devtools")
#library(devtools)
#install_github('smccrave/PMmccrave')
#library(PMmccrave)

Functions

Liptak's Weighted Z-test
Kim et al.’s Modified t-statistic
Looney and Jones’ Corrected Z-test
Lin and Stivers’ MLE Based Test Under Heteroscedasticity
Ekbohm’s MLE-based Test Under Homoscedasticity

Liptak's Weighted Z-Test

Liptak's Z-test essentially is Stouffer's method with added weights. Consider the p value of the weighted Z-test.

$p_z = 1- \phi \left( \frac{\sum\limits_{i=1}^k w_i Z_i}{\sqrt{\sum\limits_{i=1}^k w_i ^2}}\right)$
where $Z_i = \phi^{-1}(1-p_i)$ where $p_i$ is the p-value of the i-th study of k studies in total, $w_i$ are weights, and $\phi$ and $\phi^{-1}$ denote the standard normal cumulative distribution function and its inverse. It was suggested by Liptak that the weights " should be chosen proportional to the‘expected’ difference between the null hypothesis and the real situation and inversely proportional to the standard deviation of the statistic used in the i-th experiment” (liptak, 1958). He later suggested that when everything was exhausted but sample sizes ($n_i$), the square root of $n_i$ can be used as the weight. This claim was later verified by Won et al. in which they showed that the test was strongest when the weights were set to the effect size (expected difference) over the known or estimated standard error (Won et al., 2009). The most common and feasable method of weighting is by the estimated standard or by the square root of the sample size(Zaykin, 2011).

Kim et al.’s modified t-statistic

The modified t-statistic $t_3$ of Kim et al. takes the form

$t_3 = \frac{n_1 \bar D + n_H(\bar T - \bar N)}{\sqrt{n_1 S_D ^2 + n_H ^2 (S_D ^2 / n_3 + S_T ^2 / n_2)}}$.
In this function we have $\bar D$ as the mean difference of the $n_1$ paired samples, $\bar T$ and $\bar N$ are the mean tumor and normal for the $n_2$ and $n_3$ unmatched samples. $S_D, S_T$ and $S_N$ are the corresponding sample deviations and we have $n_H$ as the harmonic mean of $n_2$ and $n_3$. We use a standard Gaussian distribution to approximate the null distribution of $t_3$ (Kuan and Bo, 2013).

Looney and Jones’s corrected Z-test

The corrected Z-test is based on a modified variance estimation for the standard Z-test in which we account for the correlation among the $n_1$ matched pairs. Consider the $Z_{corr}$ function,

$Z_{corr} = \frac{\bar T^ - \bar N^}{\sqrt{S_T ^{2}/ (n_1 +n_2) + S_N ^{2}/ (n_1 +n_3) -2n_1 S_{TN_1}/(n_1+n_2)(n_1+n_3)}}$
where $\bar T^$ and $\bar N^$ are the mean tumor and normal for $n_1+n_2$ and $n_1+n_3$ samples combined, both matched and unmatched. $S_T ^$ and $S_N ^$ denote the correstponding sample deviations. We also have $S_{TN_1}$ which represents the sample covariance of the $n_1$ paired samples. $Z_{corr}$ is reduced to a paired sample or two-sample Z-test when $n_2 = n_3 = 0$ or $n_1 = 0$ (Kuan and Bo, 2013).

Lin and Stivers’s MLE based test under heteroscedasticity

The Lin and Stivers’s MLE based test under heteroscedasticity is a procedure based on a modified maximum likelihood estimator and simple mean difference used for testing two correlated means with missing data under a bivariate Gaussian assumption. The test statistic for this procedure takes the form

$Z_{LS} = \frac{ {f(\bar T_1 -\bar T) - g(\bar N_1 -\bar N) + \bar T - \bar N}}{\sqrt{V_1}}$
where
$V_1 = \frac{{f^2 /n_1 +(1-f)^2/n_2} S_{T_1} ^{2}(n_1-1)+{g^2/n_1 + (1-g)^2/n_3 }S_{N_1} ^{2} (n_1-1) -2fgS_{TN_1}(n_1-1)/n_1}{(n_1-1)}$
$f = n_1(n_1+n_3+n_2S_{TN_1}/S_{T_1}^2) {(n_1+n_2)(n_1+n_3)-n_2n_3r^2}^{-1}$
$g = n_1(n_1+n_2+n_3S_{TN_1}/S_{T_1}^2) {(n_1+n_2)(n_1+n_3)-n_2n_3r^2}^{-1}$
$r = S_{TN_1}/S_{T_1}S_{N_1}$.
$\bar T_1$ and $\bar N_1$ are the mean tumor and normal for the $n_1$ paired samples. $S_{T_1}$ and $S_{N_1}$ are the corresponding sample standard deviations and under the null hypothesis, $Z_{LS}$ is approximately distributed as $t$ and has $n_1$ degrees of freedom (Kuan and Bo, 2013).

Ekbohm’s MLE-based test under homoscedasticity

The Ekbohm test is used when the variances of tumor and normal are equal. When this is the case we consider the following MLE based test statistic:

$Z_E = \frac{{ f^(\bar T_1 - \bar T)-g^(\bar N_1 -\bar N)+ \bar T - \bar N}}{\sqrt{V_1^}}$
where
$V_1^ = \hat \sigma \left{ \frac{2n_1(1-r)+(n_2+n_3)(1-r^2)}{(n_1+n_2)(n_1+n_3)-n_2n_3r^2}\right}$
$\hat \sigma ^2 = \frac{S_{T_1}^2(n_1-1)+S_{N_1}^2(n_1-1)+(1+r^2)[S_T^2(n_2-1)+S_N^2(n_3-1)]}{2(n_1-1)+(1+r^2)(n_2+n_3-2)}$
$f^ = n_1(n_1+n_3+n_2r) {(n_1+n_2)(n_1+n_3)-n_2n_3r^2}^{-1}$
$g^ = n_1(n_1+n_2+n_3r) {(n_1+n_2)(n_1+n_3)-n_2n_3r^2}^{-1}$.
Ekbohm later showed that it was possible to approximate $Z_E$ by $t$ distribution with $n_1$ degrees of freedom (Kuan and Bo, 2013).

Works Cited

Ekbohm G. On comparing means in the paired case with incomplete data on both responses. Biometrika. 1976; 63(2):299–304.

Kim B, Kim I, Lee S, Kim S, Rha S, Chung H. Statistical methods of translating microarray data into clinically relevant diagnostic information in colorectal cancer. Bioinformatics. 2004; 21(4): 517–528. [PubMed: 15374865

Kuan, Pei Fen and Huang, Bo. (2013). A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach. Stat Med. 2013 August 30; 32(19): 3247–3259. doi:10.1002/sim.5758.

Lin P, Stivers L. On differences of means with incomplete data. Biometrika. 1974; 61(2):325–334.

Liptak T. On the combination of independent tests. Magyar Tudom Aanyos Akad Aemia Matematikai Kutat Ao Intezetenek Kozlemenyei. 1958; 3:171–197

Loney S, Jones P. A method for comparing two normal means using combined samples of correlated and uncorrelated data. Statistics in Medicine. 2003; 22:1601–1610. [PubMed: 12704618]

Zaykin, Dmitri. (2011). Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. Journal of evolutionary biology. 24. 1836-41. 10.1111/j.1420-9101.2011.02297.x.