ease_swamping: Compute the min of the min of a sequence of asymmetric...

Description Usage Arguments Value Author(s) References Examples

View source: R/ease_swamping.R

Description

This function is part of the algorithm which identify multiple influential observations in high dimension linear regression.It computes the min of the min of the asymmetric influence measure to ease the swamping effect

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ease_swamping(
  x,
  y, 
  xquant, 
  yquant, 
  inv_rob_sdx, 
  rob_sdy, 
  number_subset, 
  size_subset, 
  est_clean_set, 
  asymvec,
  ep=0.1,
  alpha
  )

Arguments

x

Matrix with the values of the predictors.

y

Numeric vector of the response variable.

xquant

Matrix with the quantiles of the predictors.

yquant

Numeric vector of the quantiles of the response variable.

inv_rob_sdx

Numeric vector of the inverse of the median absolute deviation of the predictors.

rob_sdy

Median absolute deviation of the response variable.

number_subset

Number of random subsets.

size_subset

Size of the random subsets. The default is half of the initial sample size.

est_clean_set

The subject id of the estimated clean subset. The default is the initial sample.

asymvec

Numeric vector of the asymmetric values. It is suggested to choose 3 asymmetric points within the quartile.

ep

Threshold value to ensure that the estimated clean set is not empty. The default value is 0.1.

alpha

Significance level.

Value

A index vector identifying the estimated non-influential observations using a conservative approach

Author(s)

Amadou Barry barryhafia@gmail.com

References

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
## Simulate a dataset where the first 10 observations are influentials
require("MASS")
# the vector of asymmetric point
asymvec  <- c(0.25,0.5,0.75)

# the parameter of interest
beta_param <- c(3,1.5,0,0,2,rep(0,1000-5))

# the contamination parameter 
gama_param <- c(0,0,1,1,0,rep(1,1000-5))

# Covariance matrice for the predictors distribution 
sigmain <- diag(rep(1,1000))
for (i in 1:1000)
{
  for (j in i:1000) 
  {
    sigmain[i,j] <- 0.5^(abs(j-i))
    sigmain[j,i] <- sigmain[i,j]
  }
}

# set the seed
set.seed(13)

# the predictor matrix
x  <- mvrnorm(100, rep(0, 1000), sigmain)

# the error variable
error_var <- rnorm(100)

# the response variable
y  <- x %*% beta_param + error_var
y <- as.numeric(y)

### Generate influential observations

# the contaminated response variable
youtlier <- y
youtlier[1:10] <- x[1:10,] %*% (beta_param +  1.2*gama_param)  + error_var[1:10]
youtlier <- as.numeric(youtlier)

# the quantile of the predictors
xquant <- apply(x,2,quantile,asymvec)

# the quantile of contaminated response variable
yquant <- quantile(youtlier,asymvec)

# the inverse of the mad predictors
inv_rob_sdx <- 1/apply(x,2,mad)

# the mad contaminated response variable
rob_sdy <- mad(youtlier)

# the number of random subsets
number_subset <- 5

# the size of random subsets
size_subset <- 100/2

# the initial clean set
est_clean_set <- 1:100

# the significance level
alpha <- 0.05

# the function to run
est_clean_set_ease_swamping <-
  ease_swamping(
    x, 
    youtlier, 
    xquant, 
    yquant, 
    inv_rob_sdx, 
    rob_sdy, 
    number_subset,
    size_subset,
    est_clean_set,
    asymvec,
    ep=0.1,
    alpha)

hidetify documentation built on Aug. 20, 2021, 5:06 p.m.