hidetify: Identify the influential observations in high dimensional...

Description Usage Arguments Value Author(s) References Examples

View source: R/hidetify.R

Description

This function proposes two detection methods to identify influential observations in high dimensional regression setting: a single detection technique and a multiple detection technique.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
hidetify(
  predictors, 
  response, 
  nsample=5, 
  ssize=floor(length(response)/2), 
  vtau=c(0.25,0.5,0.75), 
  alpha_shide = 0.05, 
  alpha_swamp = 0.1, 
  alpha_mask = 0.01, 
  alpha_validate = 0.01,
  method = c("single", "multiple")
  )

Arguments

predictors

Matrix with the values of the predictors.

response

Numeric vector of the response variable.

nsample

Number of random subsets, default is 5.

ssize

Size of the random subsets. The default is half of the initial sample size.

vtau

Numeric vector of the asymmetric values. It is suggested to choose 3 asymmetric points within the quartile.

alpha_shide

Significance level for the single detection method. The default is set to 0.05.

alpha_swamp

Significance level for the swamping stage. The default is set to 0.1.

alpha_mask

Significance level for the masking stage. The default is set to 0.01.

alpha_validate

Significance level for the validation stage. The default is set to 0.01.

method

The parameter option for the detection method. There is two options: single or multiple.

Value

A dataframe with two variables.

ind

Index of the subjects of the sample

outlier_ind

Influential observations indicator: 1 is influential and 0 otherwise

Author(s)

Amadou Barry barryhafia@gmail.com

References

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
## Simulate a dataset where the first 10 observations are influentials
require("MASS")
# the vector of asymmetric point
vtau  <- c(0.25,0.5,0.75)

# the parameter of interest
beta_param <- c(3,1.5,0,0,2,rep(0,1000-5))

# the contamination parameter 
gama_param <- c(0,0,1,1,0,rep(1,1000-5))

# Covariance matrice for the predictors distribution 
sigmain <- diag(rep(1,1000))
for (i in 1:1000)
{
  for (j in i:1000) 
  {
    sigmain[i,j] <- 0.5^(abs(j-i))
    sigmain[j,i] <- sigmain[i,j]
  }
}

# set the seed
set.seed(13)

# the predictor matrix
x  <- mvrnorm(100, rep(0, 1000), sigmain)

# the error variable
error_var <- rnorm(100)

# the response variable
y  <- x %*% beta_param + error_var
y <- as.numeric(y)

### Generate influential observations

# the contaminated response variable
youtlier <- y
youtlier[1:10] <- x[1:10,] %*% (beta_param +  1.2*gama_param)  + error_var[1:10]
youtlier <- as.numeric(youtlier)

# number of random subsets
nsample <- 5

# the size of the random subset
ssize <- 100/2

# initial clean set
est_clean_set <- 1:100

# the significance level for the single detection method
alpha_shide <- 0.05

# the significance level for the swamping stage
alpha_swamp <- 0.1

# the significance level for the masking stage
alpha_mask <- 0.01

# the significance level for the validation stage
alpha_validate <- 0.01

# the method of detection
method <- "single"

out <- 
  hidetify(
    x, 
    youtlier, 
    nsample, 
    ssize, 
    vtau, 
    alpha_shide, 
    alpha_swamp, 
    alpha_mask, 
    alpha_validate, 
    method = "single")

hidetify documentation built on Aug. 20, 2021, 5:06 p.m.