drmdel: Fit a density ratio model

Description Usage Arguments Value References Examples

View source: R/drmdel.R

Description

Fit a semiparametric density ratio model (DRM) to m+1 (m>=1) samples using maximum dual empirical likelihood method.

Denote the population cumulative distribution functions of the m+1 samples as F_k(x)'s, k = 0, 1, ..., m. We pick F_0(x) as a baseline distribution. The DRM assumes that the ratio of the density of each non-baseline distribution to the density of the baseline distribution satisfies

dF_k(x)/dF_0(x) = exp(alpha + beta^T*q(x)), k = 1, ..., m

where q(x) is a pre-specified d-dimensional basis function of data, and alpha, beta are model parameters. No parametric form for baseline distribution F_0 is assumed.

Usage

1
2
drmdel(x, n_samples, basis_func, g_null=NULL,
       g_null_jac=NULL, par_dim_null=NULL, ...)

Arguments

x

a vector formed by concatenating multiple samples, x_0, x_1, ..., x_m, in the order of baseline sample (x_0), non-baseline sample 1 (x_1), ..., non-baseline sample m (x_m).

n_samples

a vector of length m+1 specifying the sizes of the multiple samples, in the order of 0, 1, ..., m.

basis_func

basis function q(x) of the DRM; must either be an integer between 1 and 11 or a function of the data, x. The integers represents built-in basis-functions:

1: q(x) = x.

2: q(x) = log(|x|).

3: q(x) = sqrt(|x|).

4: q(x) = x^2.

5: q(x) = (x, x^2); Normal model.

6: q(x) = (x, log(|x|)); Gamma model.

7: q(x) = (log(|x|), sqrt(|x|), x).

8: q(x) = (log(|x|), sqrt(|x|), x^2).

9: q(x) = (log(|x|), x, x^2).

10: q(x) = (sqrt(|x|), x, x^2).

11: q(x) = (log(|x|), sqrt(|x|), x, x^2).

If the basis function one wants to use is in the above list, one should use the built-in function to maximize the speed of model fitting.

g_null

the function specifying the null hypothesis about DRM parameter beta if there is one; The default is NULL.

g_null_jac

a funciton specifying the Jacobian matrix of g_null, which must return a matrix of dimension m*d by dim(par_null), if available. The default is NULL.

par_dim_null

dimension of the parameter vector in null hypothesis if there is one. The default is NULL. If one is carrying out a hypothesis testing problem with a fully specified null hypothesis, one should specify g_null_jac=NULL and par_dim_null=0.

...

further arguments to be passed to the R function optim for maximizing the dual empirical likelihood. See help(optim) for details. In the drmdel function, by default, the "control$method" and "control$maxit" arguments of optim are set to "BFGS" and 10000, respectively.

Value

drm_info

a list of basic information about the DRM:

m: number of samples - 1.

d: dimension of the basis function.

n_smaples: the input vector of length m+1 specifying the size of each sample.

n_total: total sample size.

basis_func: the input basis function of the DRM.

rho: sample proportion: n_samples/n_total.

mele

maximum empirical likelihood estimator (MELE) of the model parameters. The output is a vector organized in the following form:

(alpha[1], beta[1,1], beta[1,2], ..., beta[1,d], alpha[2], beta[2,1], beta[2,2], ..., beta[2,d], ..., alpha[m], beta[m,1], beta[m,2], ..., beta[m,d]).

info_mat

estimated information matrix.

negldl

negative log dual empirical likelihood evaluated at mele.

mele_null

mele of the parameters under the null hypothesis, if available.

negldl_null

negative log dual empirical likelihood evaluated at mele under the null hypothesis, if available.

delr

the value of the dual empirical likelihood ratio statistic. If no null hypotheis (g_null) is given, this is simply -2*negldl.

df

degrees of freedom of the chi-square limiting distribution for DELR statistic under the null.

p_val

p-value of the DELR test.

p_est

estimated dF_k(x)'s at the observed data points, under the DRM. This is a data frame with the following three columns:

k: label for the populations, k = 0, 1, ..., m.

x: data points; at which 'x' value dF_k(x) is estimated.

p_est: estimated dF_k(x).

NOTE: To estimate the density of F_k(x), it is recommended to use densityDRM function.

cdf_est

estimated CDFs, F_k(x)'s, at the observed data points, under the DRM. This is a data frame with the following three columns:

k: label for the populations, k = 0, 1, ..., m.

x: data points; at which 'x' value F_k(x) is estimated.

cdf_est: estimated F_k(x).

NOTE: To estimate CDF dF_k(x), it is recommended to use cdfDRM function instead of looking at this output.

References

S. Cai, J. Chen and J. V. Zidek (2014), Hypothesis testing in the presence of multiple samples under density ratio models. Eprint, arXiv:1309.4740

A. Keziou and S. Leoni-Aubin (2008), On empirical likelihood for semiparametric two-sample density ratio models. Journal of Statistical Planning and Inference, 138:915-928.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Data generation
set.seed(25)
n_samples <- c(100, 200, 180, 150, 175)  # sample sizes
x0 <- rgamma(n_samples[1], shape=5, rate=1.8)
x1 <- rgamma(n_samples[2], shape=12, rate=1.2)
x2 <- rgamma(n_samples[3], shape=12, rate=1.2)
x3 <- rgamma(n_samples[4], shape=18, rate=5)
x4 <- rgamma(n_samples[5], shape=25, rate=2.6)
x <- c(x0, x1, x2, x3, x4)

# Fit a DRM with the basis function q(x) = (x, log(abs(x))),
# which is the basis function for gamma family.

# There are 11 built-in basis function in drmdel(). And q(x)
# = (x, log(abs(x))) is the 6th basis function, so we can
# fit the model by specifying basis_func=6 in drmdel() as
# follows:
drmfit <- drmdel(x=x, n_samples=n_samples, basis_func=6)
names(drmfit)

# A brief summary of the DRM fit
summaryDRM(drmfit)

# Another way of specifying basis function for drmdel() is
# to pass a user-specified R function to the basis_func
# argument of the drmdel() function.
# NOTE: If the basis function one wants to use is included
# in the built-in function list, one should use the built-in
# functions by passing an integer between 1 to 11 to the
# drmdel() function, because the computation will be faster
# with a built-in function than with a user-specified
# function.
basis_gamma <- function(x) return(c(x, log(abs(x))))
drmfit1 <- drmdel(x=x, n_samples=n_samples,
                  basis_func=basis_gamma)

# One can see the summary of this DRM fit is exactly the
# same as that of the previous fit with basis_func=6
summaryDRM(drmfit1)

Example output

[1] "drm_info" "mele"     "info_mat" "negldl"   "delr"     "df"       "p_val"   
[8] "p_est"    "cdf_est" 
Basic information about the DRM:
    Number of samples (m+1): 5 
    Basis function: 6 
    Dimension of the basis function (d): 2 
    Sample sizes: 100 200 180 150 175 
    Sample proportions (rho): 0.124 0.248 0.224 0.186 0.217 

Maximum empirical likelihood estimator (MELE):
   alpha[] beta[,1] beta[,2]
F1  -22.14  -0.1935     13.5
F2  -19.49   0.0243     11.4
F3   -4.95  -4.6389     17.8
F4  -32.88  -1.2457     22.8

Default hypothesis testing problem:
    H_0: All distribution functions, F_k's, are equal.
    H_1: One of the distribution functions is different from the others.
Dual empirical likelihood ratio statistc (DELR): 1035.261 
Degree of freedom: 8 
p-value: 0 

Summary statistics of the estimated F_k's (mean, var -- variance, sd -- standard deviation, Q1 -- first quartile, Q3 -- third quartile, IQR -- inter-quartile range):
    mean  var    sd   Q1    Q3  IQR
F0  2.64 1.59 1.262 1.75  3.25 1.51
F1 10.20 7.70 2.776 8.11 11.82 3.71
F2 10.29 9.31 3.051 7.99 11.96 3.98
F3  3.61 0.71 0.843 3.06  4.20 1.14
F4  9.61 4.10 2.024 8.07 10.96 2.89
Basic information about the DRM:
    Number of samples (m+1): 5 
    Basis function:
function (x) 
return(c(x, log(abs(x))))
<bytecode: 0x2c011a8>
    Dimension of the basis function (d): 2 
    Sample sizes: 100 200 180 150 175 
    Sample proportions (rho): 0.124 0.248 0.224 0.186 0.217 

Maximum empirical likelihood estimator (MELE):
   alpha[] beta[,1] beta[,2]
F1  -22.14  -0.1935     13.5
F2  -19.49   0.0243     11.4
F3   -4.95  -4.6389     17.8
F4  -32.88  -1.2457     22.8

Default hypothesis testing problem:
    H_0: All distribution functions, F_k's, are equal.
    H_1: One of the distribution functions is different from the others.
Dual empirical likelihood ratio statistc (DELR): 1035.261 
Degree of freedom: 8 
p-value: 0 

Summary statistics of the estimated F_k's (mean, var -- variance, sd -- standard deviation, Q1 -- first quartile, Q3 -- third quartile, IQR -- inter-quartile range):
    mean  var    sd   Q1    Q3  IQR
F0  2.64 1.59 1.262 1.75  3.25 1.51
F1 10.20 7.70 2.776 8.11 11.82 3.71
F2 10.29 9.31 3.051 7.99 11.96 3.98
F3  3.61 0.71 0.843 3.06  4.20 1.14
F4  9.61 4.10 2.024 8.07 10.96 2.89

drmdel documentation built on Oct. 25, 2021, 5:07 p.m.

Related to drmdel in drmdel...