# sscore: Distance-based Kernel Score Test In KMDA: Kernel-Based Metabolite Differential Analysis

## Description

This function test whether a metabolite-set is differential expressed using a stratified kernel-based score test.

## Usage

 1 sscore(x, y, lower, upper, m) 

## Arguments

 x numeric measurements of metabolite abundance level. y 0/1 response indicating whether a subject is a case group or a control group. lower lower bound of the kernel parameter. upper upper bound of the kernel parameter. m number of grid points selected in the interval [lower, upper].

## Details

Let x be a p\times n matrix, where each column is a subject, y be a n \times 1 0/1 vector indicating the group label. This function tests whether this p-metabolite set is differentially expressed between two groups (more details can be found in Zhan et al. (2015)). It works in the following way.

A score test can be applied when the kernel parameter ρ is known. First, fit the null logistic model logit(pr(y=1))=β_0 to get estimate of β_0 as \hat{β_0}. Let \hat{μ_0}=invlogit(\hat{β_0}). Second, The n\times n kernel matrix is calculated as K(ρ)_{ij} = k(x_i,x_j,ρ), where x_i is ith column in x, k(\cdot) is the stratified kernel function skernel. Third, the test statistic Q(ρ) is calculated as

Q(ρ)=(y-\hat{μ_0})^T K(ρ) (y-\hat{μ_0}).

An standardized version S(ρ) of Q(ρ) can be calculated as S(ρ)= [Q(ρ)-μ_{Q}]/σ_{Q}. More details can be found in Liu et al.(2008).

When the kernel parameter ρ is not known. Suppose it takes values in [lower, upper]. Davies (1977) and Davies (1987) proposed a test based on the process \{S(ρ), ρ \in [lower,upper]\}. This test has rejection region of the form \{\sup_{L ≤q ρ ≤q U} S(ρ)> c \}. Using this test, an upper-bound for the p-value is given by:

Φ(-M)+V \exp(\frac{1}{2}M^2)/√{8π},

where Φ(\cdot) is the cumulative distribution function of standard normal density, M is the maximum of S(ρ) over the range of ρ and V=|S(ρ_1)-S(lower)|+|S(ρ_2)-S(ρ_1)|+\cdots+|S(upper)-S(ρ_m)| is the total variation of S(ρ) over the interval [lower, upper] and ρ_1,…,ρ_m are m grid points in the interval [lower, upper].

## Value

A p-value indicating whether the metabolite-set is differentially expressed or not.

## References

Davies, R. B. (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 64,247-254.
Davies, R. B. (1987) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika, 74,33-43.
Liu, D., Ghosh, D., & Lin, X. (2008). Estimation and testing for the effect of a genetic pathway on a disease outcome using logistic kernel machine regression via logistic mixed models. BMC bioinformatics, 9(1), 292.
Zhan, X., Patterson, A. D., & Ghosh, D. (2015). Kernel approaches for differential expression analysis of mass spectrometry-based metabolomics data. BMC Bioinformatics, 16(1), 77.

invlogit, skernel
 1 2 3 4 data(hcc) x=hcc[1:3,3:57] ## This metabolite-set contains the first three metabolites in the hcc dataset. y=c(rep(0,35),rep(1,20)) sscore(x,y,10^-3,10^3,10)