shidetify: Single detection asymmetric influential measure for high...

Description Usage Arguments Value Author(s) References Examples

View source: R/shidetify.R

Description

The function computes the asymmetric influential measure to identify influential observations in high dimensional linear regression using the single detection approach.

Usage

1
2
3
4
5
6
shidetify(
  x, 
  y, 
  asymvec, 
  alpha
  )

Arguments

x

Matrix of the predictors.

y

Numeric vector of the response variable.

asymvec

Numeric vector of the asymmetric values. It is suggested to choose 3 asymetric points within the quartile.

alpha

Significance level.

Value

A dataframe with two variables.

ind

Index of the observations

outlier_ind

Influential observations indicator: 1 if influential and 0 otherwise

Author(s)

Amadou Barry barryhafia@gmail.com

References

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
## Simulate a dataset where the first 10 observations are influentials
require("MASS")
# the vector of asymetric point
asymvec  <- c(0.25,0.5,0.75)

# the parameter of interest
beta_param <- c(3,1.5,0,0,2,rep(0,1000-5))

# the contamination parameter 
gama_param <- c(0,0,1,1,0,rep(1,1000-5))

# Covariance matrice for the predictors distribution 
sigmain <- diag(rep(1,1000))
for (i in 1:1000)
{
  for (j in i:1000) 
  {
    sigmain[i,j] <- 0.5^(abs(j-i))
    sigmain[j,i] <- sigmain[i,j]
  }
}

# set the seed
set.seed(13)

# the predictor matrix
x  <- mvrnorm(100, rep(0, 1000), sigmain)

# the error variable
error_var <- rnorm(100)

# the response variable
y  <- x %*% beta_param + error_var
y <- as.numeric(y)

### Generate influential observations
# the contaminated response variable
youtlier <- y
youtlier[1:10] <- x[1:10,] %*% (beta_param +  1.2*gama_param)  + error_var[1:10]
youtlier <- as.numeric(youtlier)

# the significance level 
alpha <- 0.05

df_single_influential <- 
  shidetify(
    x, 
    youtlier, 
    asymvec, 
    alpha)

hidetify documentation built on Aug. 20, 2021, 5:06 p.m.