Description Usage Arguments Value Author(s) References Examples
This function proposes two detection methods to identify influential observations in high dimensional regression setting: a single detection technique and a multiple detection technique.
1 2 3 4 5 6 7 8 9 10 11 12 |
predictors |
Matrix with the values of the predictors. |
response |
Numeric vector of the response variable. |
nsample |
Number of random subsets, default is 5. |
ssize |
Size of the random subsets. The default is half of the initial sample size. |
vtau |
Numeric vector of the asymmetric values. It is suggested to choose 3 asymmetric points within the quartile. |
alpha_shide |
Significance level for the single detection method. The default is set to 0.05. |
alpha_swamp |
Significance level for the swamping stage. The default is set to 0.1. |
alpha_mask |
Significance level for the masking stage. The default is set to 0.01. |
alpha_validate |
Significance level for the validation stage. The default is set to 0.01. |
method |
The parameter option for the detection method. There is two options: single or multiple. |
A dataframe with two variables.
ind |
Index of the subjects of the sample |
outlier_ind |
Influential observations indicator: 1 is influential and 0 otherwise |
Amadou Barry barryhafia@gmail.com
Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.
Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ## Simulate a dataset where the first 10 observations are influentials
require("MASS")
# the vector of asymmetric point
vtau <- c(0.25,0.5,0.75)
# the parameter of interest
beta_param <- c(3,1.5,0,0,2,rep(0,1000-5))
# the contamination parameter
gama_param <- c(0,0,1,1,0,rep(1,1000-5))
# Covariance matrice for the predictors distribution
sigmain <- diag(rep(1,1000))
for (i in 1:1000)
{
for (j in i:1000)
{
sigmain[i,j] <- 0.5^(abs(j-i))
sigmain[j,i] <- sigmain[i,j]
}
}
# set the seed
set.seed(13)
# the predictor matrix
x <- mvrnorm(100, rep(0, 1000), sigmain)
# the error variable
error_var <- rnorm(100)
# the response variable
y <- x %*% beta_param + error_var
y <- as.numeric(y)
### Generate influential observations
# the contaminated response variable
youtlier <- y
youtlier[1:10] <- x[1:10,] %*% (beta_param + 1.2*gama_param) + error_var[1:10]
youtlier <- as.numeric(youtlier)
# number of random subsets
nsample <- 5
# the size of the random subset
ssize <- 100/2
# initial clean set
est_clean_set <- 1:100
# the significance level for the single detection method
alpha_shide <- 0.05
# the significance level for the swamping stage
alpha_swamp <- 0.1
# the significance level for the masking stage
alpha_mask <- 0.01
# the significance level for the validation stage
alpha_validate <- 0.01
# the method of detection
method <- "single"
out <-
hidetify(
x,
youtlier,
nsample,
ssize,
vtau,
alpha_shide,
alpha_swamp,
alpha_mask,
alpha_validate,
method = "single")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.