farm.test: Main function performing factor-adjusted robust test for...
In kbose28/FarmTest: Factor Adjusted Robust Multiple Testing

Description Usage Arguments Details Value References See Also Examples

This function is used to conduct robust statistical test for means of multivariate data, after adjusting for known or unknown latent factors using the methods in Fan et al.(2017) and Zhou et al.(2017). It uses the Huber's loss function (Huber (1964)) to robustly estimate data parameters.

farm.test(X, H0 = NULL, fx = NULL, Kx = NULL, Y = NULL, fy = NULL,
  Ky = NULL, alternative = c("two.sided", "lesser", "greater"),
  alpha = NULL, robust = TRUE, cv = TRUE, tau = 2, verbose = FALSE,
  ...)

`X`	a n x p data matrix with each row being a sample. You wish to test a hypothesis for the mean of each column of `X`.
`H0`	an optional p x 1 vector of the true value of the means (or difference in means if you are performing a two sample test). The default is the zero.
`fx`	an optional factor matrix with each column being a factor for `X`. Same number of rows as `X`.
`Kx`	a optional number of factors to be estimated for `X`. Otherwise estimated internally. Kx>=0
`Y`	an optional data matrix that must have the same number of columns as `X`. You wish test the equality of means of each columns of `X` and `Y`.
`fy`	an optional factor matrix with each column being a factor for `Y`. Same number of rows as `Y`. Only used for a two sample test.
`Ky`	a optional number of factors to be estimated for `Y`. Otherwise estimated internally.
`alternative`	an optional character string specifying the alternate hypothesis, must be one of "two.sided" (default), "greater" or "lesser". You can specify just the initial letter.
`alpha`	an optional level for controlling the false discovery rate (in decimals). Default is 0.05. Must be in (0,1).
`robust`	a boolean, specifying whether or not to use robust estimators for mean and variance. Default is TRUE.
`cv`	a boolean, specifying whether or not to run cross-validation for the tuning parameter. Default is TRUE. Only used if `robust` is TRUE.
`tau`	`>0`, multiplier for the tuning parameter for Huber loss function. Default is 2. Only used if `robust` is TRUE and `cv` is FALSE. See details.
`verbose`	a boolean specifying whether to print runtime updates to the console. Default is TRUE.
`...`	Arguments passed to the `farm.FDR` function.

alternative = "greater" is the alternative that X has a larger mean than Y.

If some of the underlying factors are known but it is suspected that there are more confounding factors that are unobserved: Suppose we have data X = μ + Bf + Cg + u, where f is observed and g is unobserved. In the first step, the user passes the data \{X,f\} into the main function. From the output, let us construct the residuals: Xres = X - Bf. Now pass Xres into the main function, without any factors. The output in this step is the final answer to the testing problem.

For two-sample test, the output values means, stderr, n, nfactors,loadings are all lists containing two items, each pertaining to X and Y, indicated by a prefix X. and Y. respectively.

Number of rows and columns of the data matrix must be at least 4 in order to be able to calculate latent factors.

For details about multiple comparison correction, see farm.FDR.

The tuning parameter = tau * sigma * optimal rate where optimal rate is the optimal rate for the tuning parameter. For details, see Fan et al.(2017). sigma is the standard deviation of the data.

An object with S3 class farm.test containing:

`means`	estimated means
`stderr`	estimated standard errors
`pvalue`	unadjusted p values
`rejected`	the indices of rejected hypotheses, along with their corresponding p values, and adjusted p values, ordered from most significant to least significant
`alldata`	all the indices of the tested hypotheses, along with their corresponding p values, adjusted p values, and a column with 1 if declared siginificant and 0 if not
`loadings`	estimated factor loadings
`nfactors`	the number of (estimated) factors
`significant`	the number of means that are found significant
`...`	further arguments passed to methods. For complete list use the function `names` on the output object

Huber, P.J. (1964). "Robust Estimation of a Location Parameter." The Annals of Mathematical Statistics, 35, 73–101.

Fan, J., Ke, Y., Sun, Q. and Zhou, W-X. (2017). "FARM-Test: Factor-Adjusted Robust Multiple Testing with False Discovery Control", https://arxiv.org/abs/1711.05386.

Zhou, W-X., Bose, K., Fan, J. and Liu, H. (2017). "A New Perspective on Robust M-Estimation: Finite Sample Theory and Applications to Dependence-Adjusted Multiple Testing," Annals of Statistics, to appear, https://arxiv.org/abs/1711.05381.

farm.FDR, print.farm.test

set.seed(100)
p = 100
n = 50
epsilon = matrix(rnorm( p*n, 0,1), nrow = n)
B = matrix(runif(p*3,-2,2), nrow=p)
fx = matrix(rnorm(3*n, 0,1), nrow = n)
mu = rep(0, p)
mu[1:5] = 2
X = rep(1,n)%*%t(mu)+fx%*%t(B)+ epsilon
output = farm.test(X, cv=FALSE)#robust, no cross-validation
output

#other robustification options
output = farm.test(X, robust = FALSE, verbose=FALSE) #non-robust
output = farm.test(X, tau = 3, cv=FALSE, verbose=FALSE) #robust, no cross-validation, specified tau
#output = farm.test(X) #robust, cross-validation, longer running

#two sample test
n2 = 25
epsilon = matrix(rnorm( p*n2, 0,1), nrow = n2)
B = matrix(rnorm(p*3,0,1), nrow=p)
fy = matrix(rnorm(3*n2, 0,1), nrow = n2)
Y = fy%*%t(B)+ epsilon
output = farm.test(X=X,Y=Y, robust=FALSE)
output = farm.test(X=X,Y=Y,Kx=0, cv = FALSE) #non-robust
names(output$means)