DensityTestim: Non-combinatorial T-estimation Hold-Out for density...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Main function:

Estimation of the density of a given sample by a Hold-Out procedure derived from the T-estimation using the algorithms introduced in Magalhães and Rozenholc (2014).

The sample is divided into one training sample, used to build a set of potential estimators via the TBuildList function, and one validation sample, used to select one estimator from this set using T-estimation as introduced in Birgé(2006).

Usage

1
2
3
4
DensityTestim(X,p=1/2,family=NULL,test=c('birge','baraud'),theta=1/4,
	last=c('full','training'),plot=TRUE,verbose=TRUE,wlegend=TRUE,kerneltab=NULL,
	Dmax=NULL,bwtab=NULL,do.MLHO=FALSE,do.LSHO=FALSE,start=c('LSHO','MLHO'),csqrt=1,
	H2dist=NULL,allImageX2=NULL,flist=NULL,...)			

Arguments

X

numeric vector. The sample to which the density T-estimation Hold-Out procedure is applied.

p

proportion of the sample used in the training sample X[1:ceiling(p*n)] to build the family of estimators. Default 1/2.

family

estimator family name(s). If family is NULL (default), use family = c("Kernel", "RegularHisto", "IrregularHisto", "Parametric").

test

either 'birge' (default) or 'baraud'. Controls the test used in the T-estimation. Default value 'birge' implements T-estimation as introduced in Birgé (2006) while 'baraud' use its modified version using the test derived from Baraud (2011).

theta

parameter which controls the radius of test balls. Has to be smaller than 1/2 (cf. Magalhães and Rozenholc (2014)). Default 1/4.

last

either 'training' or 'full' (default) controlling if the resulting estimator is build with the training sample only or the full sample.

plot

logical (default TRUE), controls if plot are displayed.

verbose

logical (default TRUE), controls if the estimator description is printed.

wlegend

logical (default TRUE); controls if a legend is written on the plot.

kerneltab

vector of all desired kernel types. Only required when 'family' contains 'Kernel'.

If NULL (default), use kerneltab = "epanechnikov".

Dmax

maximum number of bins. Only required when 'family' contains 'RegularHisto' or 'IrregularHisto'.

If NULL (default), use Dmax=ceiling(n/log(n)).

bwtab

vector of bandwidth values. Only required when the family argument contains 'Kernel'.

If NULL (default), use bwtab = diff(range(X))/2/(ceiling(n/log(n)):1).

do.MLHO

logical (default FALSE). If TRUE, the Maximum Likelihood Hold-Out is computed.

do.LSHO

logical (default FALSE). If TRUE, the Least-Squares Hold-Out is computed.

start

starting point of the algorithm, either 'LSHO' (default) or 'MLHO'.

csqrt

numeric (Default 1). If 0 the exact T-estimation is computed. Otherwise a faster but approximate T-estimator is computed based on estimators separated by an Hellinger distance larger than c/sqrt((1-p)*n). (See Magalhães and Rozenholc (2014) for more details)

H2dist

not documented. Only for simulation purpose.

allImageX2

not documented. Only for simulation purpose.

flist

not documented. Only for simulation purpose.

...

for other options when plot is TRUE, as in the plot function.

Details

More details about the algorithm and its implementation may be found in Magalhães and Rozenholc (2014).

Value

DensityTestim returns a list with components

THO

descriptor of the T-Hold-Out estimate.

MLHO

descriptor of the Maximum Likelihood Hold-Out estimate if do.MLHO=TRUE

LSHO

descriptor of the Least-Squares Hold-Out estimate if do.LSHO=TRUE

M

number of considered estimators

comput

number of tests needed to select the T-Hold-Out

total

M*(M-1)/2

H2dist

not documented. Only for simulation purpose.

allImageX2

not documented. Only for simulation purpose.

flist

not documented. Only for simulation purpose.

Moreover if plot=TRUE, the chosen estimator is plotted together with the one chosen by the LSHO (default).

Author(s)

Nelo Magalhães and Yves Rozenholc.

References

N. Magalhães and Y. Rozenholc, "A non-combinatorial algorithm for T-estimation Hold-Out" (2014)

L. Birgé, "Model selection via testing: an alternative to (penalized) maximum likelihood estimators.", Ann. Institut Henri Poincaré Probab. et Statist., 42, 273–325, (2006)

See Also

TBuildList, TBuildRegularHisto, TBuildIrregularHisto, TBuildKernel, TBuildParametric

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
## Not run: 
	
### load the package
library(Density.T.HoldOut)

### Estimation of the beta density with parameters 5 and 2 from a sample of size 1000:
X=rbeta(1000,5,2)
DensityTestim(X)
x = seq(min(X),max(X),l=500)
lines(x,dbeta(x,5,2),col='green',lty=3)
title('T-estimation and Least-Squares Held-Out')


### Estimation of the lognormal density from a sample of size 500 via a set of regular 
### histograms and parametric estimators build with 3/4 of the sample,
### provide as final estimator the one build with the training sample only:
X=rlnorm(500)
DensityTestim(X,p=3/4,family=c('RegularHisto','Parametric'),last=c('partial'))
x = seq(min(X),max(X),l=500)
lines(x,dlnorm(x),col='green',lty=3)
title('T-estimation and Least-Squares Held-Out')


### Estimation of the chi-square density with 5 degrees of freedom from a sample of 
### size 250 via a set of regular and irregular histograms and kernel estimators with 
### triangular and epanechnikov kernels, start from the maximum likelihood HO estimator:
X=rchisq(250,5)
DensityTestim(X,family=c('RegularHisto','IrregularHisto','Kernel'),
	kerneltab=c('triangular','epanechnikov'),start=c('MLHO'))
x = seq(min(X),max(X),l=500)
lines(x,dchisq(x,5),col='green',lty=3)
title('T-estimation and Max. Likelihood Hold-Out')


### Estimation of a normal mixture from a sample of size 1000 via a set of kernel 
### estimators, provide also the maximum likelihood HO estimator:
n=ceiling(runif(1)*1000)
X=c(rnorm(n,mean=5,sd=0.1),rnorm(1000-n))
DensityTestim(X,family=c('Kernel'),do.MLHO=TRUE)
x = seq(min(X),max(X),l=500)
lines(x,n/1000*dnorm(x,mean=5,sd=0.1)+(1000-n)/1000*dnorm(x),col='green',lty=3)
title('T-estimation, Least-Squares and Max. Likelihood Hold-Out')


### Estimation of the gaussian density from a sample of size 500 via a set of regular 
### and irregular histograms estimators, start from the maximum likelihood HO estimator,
### uses the greedy version with constant 1/16:
X=rnorm(500)
DensityTestim(X,family=c('RegularHisto','IrregularHisto'),start=c('MLHO'),csqrt=1/16)
x = seq(min(X),max(X),l=500)
lines(x,dnorm(x),col='green',lty=3)
title('T-estimation and Max. Likelihood Hold-Out')

## End(Not run)

Density.T.HoldOut documentation built on May 2, 2019, 2:32 a.m.