Description Usage Arguments Details Value Author(s) References See Also Examples
Main function:
Estimation of the density of a given sample by a Hold-Out procedure derived from the T-estimation using the algorithms introduced in Magalhães and Rozenholc (2014).
The sample is divided into one training sample, used to build a set of potential estimators
via the TBuildList
function, and one validation sample, used to select one estimator
from this set using T-estimation as introduced in Birgé(2006).
1 2 3 4 |
X |
numeric vector. The sample to which the density T-estimation Hold-Out procedure is applied. |
p |
proportion of the sample used in the training sample X[1:ceiling(p*n)] to build the family of estimators. Default 1/2. |
family |
estimator family name(s). If family is NULL (default), use family = c("Kernel", "RegularHisto", "IrregularHisto", "Parametric"). |
test |
either 'birge' (default) or 'baraud'. Controls the test used in the T-estimation. Default value 'birge' implements T-estimation as introduced in Birgé (2006) while 'baraud' use its modified version using the test derived from Baraud (2011). |
theta |
parameter which controls the radius of test balls. Has to be smaller than 1/2 (cf. Magalhães and Rozenholc (2014)). Default 1/4. |
last |
either 'training' or 'full' (default) controlling if the resulting estimator is build with the training sample only or the full sample. |
plot |
logical (default TRUE), controls if plot are displayed. |
verbose |
logical (default TRUE), controls if the estimator description is printed. |
wlegend |
logical (default TRUE); controls if a legend is written on the plot. |
kerneltab |
vector of all desired kernel types. Only required when 'family' contains 'Kernel'. If NULL (default), use kerneltab = "epanechnikov". |
Dmax |
maximum number of bins. Only required when 'family' contains 'RegularHisto' or 'IrregularHisto'. If NULL (default), use Dmax=ceiling(n/log(n)). |
bwtab |
vector of bandwidth values. Only required when the family argument contains 'Kernel'. If NULL (default), use bwtab = diff(range(X))/2/(ceiling(n/log(n)):1). |
do.MLHO |
logical (default FALSE). If TRUE, the Maximum Likelihood Hold-Out is computed. |
do.LSHO |
logical (default FALSE). If TRUE, the Least-Squares Hold-Out is computed. |
start |
starting point of the algorithm, either 'LSHO' (default) or 'MLHO'. |
csqrt |
numeric (Default 1). If 0 the exact T-estimation is computed. Otherwise a faster but approximate T-estimator is computed based on estimators separated by an Hellinger distance larger than c/sqrt((1-p)*n). (See Magalhães and Rozenholc (2014) for more details) |
H2dist |
not documented. Only for simulation purpose. |
allImageX2 |
not documented. Only for simulation purpose. |
flist |
not documented. Only for simulation purpose. |
... |
for other options when plot is TRUE, as in the plot function. |
More details about the algorithm and its implementation may be found in Magalhães and Rozenholc (2014).
DensityTestim returns a list with components
THO |
descriptor of the T-Hold-Out estimate. |
MLHO |
descriptor of the Maximum Likelihood Hold-Out estimate if do.MLHO=TRUE |
LSHO |
descriptor of the Least-Squares Hold-Out estimate if do.LSHO=TRUE |
M |
number of considered estimators |
comput |
number of tests needed to select the T-Hold-Out |
total |
M*(M-1)/2 |
H2dist |
not documented. Only for simulation purpose. |
allImageX2 |
not documented. Only for simulation purpose. |
flist |
not documented. Only for simulation purpose. |
Moreover if plot=TRUE, the chosen estimator is plotted together with the one chosen by the LSHO (default).
Nelo Magalhães and Yves Rozenholc.
N. Magalhães and Y. Rozenholc, "A non-combinatorial algorithm for T-estimation Hold-Out" (2014)
L. Birgé, "Model selection via testing: an alternative to (penalized) maximum likelihood estimators.", Ann. Institut Henri Poincaré Probab. et Statist., 42, 273–325, (2006)
TBuildList
, TBuildRegularHisto
, TBuildIrregularHisto
, TBuildKernel
, TBuildParametric
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | ## Not run:
### load the package
library(Density.T.HoldOut)
### Estimation of the beta density with parameters 5 and 2 from a sample of size 1000:
X=rbeta(1000,5,2)
DensityTestim(X)
x = seq(min(X),max(X),l=500)
lines(x,dbeta(x,5,2),col='green',lty=3)
title('T-estimation and Least-Squares Held-Out')
### Estimation of the lognormal density from a sample of size 500 via a set of regular
### histograms and parametric estimators build with 3/4 of the sample,
### provide as final estimator the one build with the training sample only:
X=rlnorm(500)
DensityTestim(X,p=3/4,family=c('RegularHisto','Parametric'),last=c('partial'))
x = seq(min(X),max(X),l=500)
lines(x,dlnorm(x),col='green',lty=3)
title('T-estimation and Least-Squares Held-Out')
### Estimation of the chi-square density with 5 degrees of freedom from a sample of
### size 250 via a set of regular and irregular histograms and kernel estimators with
### triangular and epanechnikov kernels, start from the maximum likelihood HO estimator:
X=rchisq(250,5)
DensityTestim(X,family=c('RegularHisto','IrregularHisto','Kernel'),
kerneltab=c('triangular','epanechnikov'),start=c('MLHO'))
x = seq(min(X),max(X),l=500)
lines(x,dchisq(x,5),col='green',lty=3)
title('T-estimation and Max. Likelihood Hold-Out')
### Estimation of a normal mixture from a sample of size 1000 via a set of kernel
### estimators, provide also the maximum likelihood HO estimator:
n=ceiling(runif(1)*1000)
X=c(rnorm(n,mean=5,sd=0.1),rnorm(1000-n))
DensityTestim(X,family=c('Kernel'),do.MLHO=TRUE)
x = seq(min(X),max(X),l=500)
lines(x,n/1000*dnorm(x,mean=5,sd=0.1)+(1000-n)/1000*dnorm(x),col='green',lty=3)
title('T-estimation, Least-Squares and Max. Likelihood Hold-Out')
### Estimation of the gaussian density from a sample of size 500 via a set of regular
### and irregular histograms estimators, start from the maximum likelihood HO estimator,
### uses the greedy version with constant 1/16:
X=rnorm(500)
DensityTestim(X,family=c('RegularHisto','IrregularHisto'),start=c('MLHO'),csqrt=1/16)
x = seq(min(X),max(X),l=500)
lines(x,dnorm(x),col='green',lty=3)
title('T-estimation and Max. Likelihood Hold-Out')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.