gof_cal: This function calculates the goodness-of-fit tests

View source: R/gof_functions.R

gof_calR Documentation

This function calculates the goodness-of-fit tests

Description

For ordered p-values p_{(1)}<p_{(2)}<...<p_{(n)}.Define Sn(t)=∑_{i=1}^n 1_{p_{(i)}<=t} . Then define the statistic T as T=(Sn(p_{(i)})-n x p_{(i)})/√(n x p_{(i)} x (1-p_{(i)})) for score statistic. or T=(Sn(p_{(i)})-n x p_{(i)})/√{Sn(p_{(i)})*(n-Sn(p_{(i)}))/n} for wald statistic. or T=n (p_{(i)} log(p_{(i)} n / Sn(p_{(i)}))+(1-p_{(i)})log((1-p_{(i)})n/(n-Sn(p_{(i)})))) for log likelihood ratio

Usage

gof_cal(
  pm,
  t0ratio = 0.1,
  gof_method = NA,
  single_statistic = "score",
  accumulate_option = "max",
  filter = 0,
  weight_option = "none",
  weight = 1
)

Arguments

pm

is a statistics matrix of P-values or weighted pvalues, each row represents a gene (independent tests) and each column represents a dataset (e.g. a permutation or an observation). Pm are not encouraged to have only 1 rows, if that happend, warning massage will produced.

t0ratio

is the ratio for the region c(0,t0ratio) of pvalues for statistic calculation.

gof_method

is the option for the set based analysis methods, available option includes: "higher_criticism","berk_jones","skat","einmahl_mckeague", see our publication for more details.

single_statistic

is the option for single statistic, it can be c("score","likelihood_ratio","wald"),it would be ignored is gof_method is assigned

accumulate_option

is the option for combining method, it can be c("max","sum","rsum","topsum","max2","sum2","rsum2","topsum2"), it would be ignored is gof_method is assigned, the "max2","sum2","rsum2","topsum2" will be replaced into "max","sum","rsum","topsum" if gof method is assigned.

filter

is the threshold to exclude extremely small pvalues to avoid them driving all signals.default 0.

weight_option

defines the external prior information as weight. It can be "none", "in" and "out". "none" assigns no weight, "in" assigns weight towards each single pvalues and "out" assigns weight towards each statistic T.

weight

is a vector which provides weight towards each genes.

Details

The max method calculates the Tmax=max T__{(i)} where 0 < i < t0ration x n The sum method calculates the Tsum=sum T__{(i)} where 0 < i < t0ration x n The rsum is calculated with Trsum=1/n x sum T__{(i)}. where 0<i<=(i_max), where i_max is the index of the maxium of Ts. The topsum is calculated with Ttopsum=1/n max T_(r). where r belongs to the subset that Ts in the subsets are the top t0ratio proportion among all Ts.

Value

a numeric vector with each elements is a Higher criticism values calculated from each colum of the Pm

References

Donoho, D., & Jin, J. (2004). Higher Criticism for Detecting Sparse Heterogeneous Mixtures. The Annals of Statistics, 32(3), 962–994.

Examples

a=matrix(runif(200,0,1),ncol=4,nrow=50)
gof_cal(a)

mqzhanglab/wHC documentation built on April 1, 2022, 6:28 p.m.