Description Usage Arguments Value Author(s) References Examples
Implementations of different HPLBs for TV as described in (Michel et al., 2020).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
t |
a numeric vector value corresponding to a natural ordering of the observations. For a two-sample test 0-1 numeric values values should be provided. |
rho |
a numeric vector value providing an ordering. This could be a binary classifier, a regressor, a witness function from a MMD kernel or anything else that would witness a distributional difference. |
s |
a numeric vector value giving split points on t. |
estimator.type |
a character value indicating which estimator to use. One option out of:
|
alpha |
a numeric value giving the overall type-I error control level. |
tv.seq |
a sequence of values between 0 and 1 used as the grid search for the total variation distance in case of tv-search. |
custom.bounding.seq |
a list of bounding functions respecting the order of tv.seq used in case of estimator.type "custom-tv-search". |
direction |
a character vector value made of "left" or "right" giving which distribution witness count to estimate (t<=s or t>s?). |
cutoff |
a numeric value. This is the cutoff used if bayes estimators are used. The theory suggests to use 1/2 but this can be changed. |
verbose.plot |
a boolean value for additional plots. |
seed |
an integer value. The seed for reproducibility. |
... |
additional parameters for the function |
a list
containing the relevant lower bounds estimates. For the total variation distance the relevant entry is tvhat
.
Loris Michel, Jeffrey Naef
L. Michel, J. Naef and N. Meinshausen (2020). High-Probability Lower Bounds for the Total Variation Distance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ## libs
library(HPLB)
library(ranger)
library(distrEx)
## reproducibility
set.seed(0)
## Example 1: TV lower bound based on two samples (bayes estimator), Gaussian mean-shift example
n <- 100
means <- rep(c(0,2), each = n / 2)
x <- stats::rnorm(n, mean = means)
t <- rep(c(0,1), each = n / 2)
bayesRate <- function(x) {
return(stats::dnorm(x, mean = 2) /
(stats::dnorm(x, mean = 2) + stats::dnorm(x, mean = 0)))
}
# estimated HPLB
tvhat <- HPLB(t = t, rho = bayesRate(x), estimator.type = "bayes")
# true TV
TotalVarDist(e1 = Norm(2,1), e2 = Norm(0,1))
## Example 2: optimal mixture detection (adapt estimator), Gaussian mean-shift example
n <- 100
mean.shift <- 2
t.train <- runif(n, 0 ,1)
x.train <- ifelse(t.train>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rf <- ranger::ranger(t~x, data.frame(t=t.train,x=x.train))
n <- 100
t.test <- runif(n, 0 ,1)
x.test <- ifelse(t.test>0.5, stats::rnorm(n, mean.shift), stats::rnorm(n))
rho <- predict(rf, data.frame(t=t.test,x=x.test))$predictions
## out-of-sample
tv.oos <- HPLB(t = t.test, rho = rho, s = seq(0.1,0.9,0.1), estimator.type = "adapt")
## total variation values
tv <- c()
for (s in seq(0.1,0.9,0.1)) {
if (s<=0.5) {
D.left <- Norm(0,1)
} else {
D.left <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, 1, 0.5/s), ifelse(s<=0.5, 0, (s-0.5)/s)))
}
if (s < 0.5) {
D.right <- UnivarMixingDistribution(Dlist = list(Norm(0,1),Norm(mean.shift,1)),
mixCoeff = c(ifelse(s<=0.5, (0.5-s)/(1-s), 0), ifelse(s<=0.5, (0.5/(1-s)), 1)))
} else {
D.right <- Norm(mean.shift,1)
}
tv <- c(tv, TotalVarDist(e1 = D.left, e2 = D.right))
}
## plot
oldpar <- par(no.readonly =TRUE)
par(mfrow=c(2,1))
plot(t.test,x.test,pch=19,xlab="t",ylab="x")
plot(seq(0.1,0.9,0.1), tv.oos$tvhat,type="l",ylim=c(0,1),xlab="t", ylab="TV")
lines(seq(0.1,0.9,0.1), tv, col="red",type="l")
par(oldpar)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.