# ease_swamping: Compute the min of the min of a sequence of asymmetric... In hidetify: Identify Influential Observations in High Dimension

## Description

This function is part of the algorithm which identify multiple influential observations in high dimension linear regression.It computes the min of the min of the asymmetric influence measure to ease the swamping effect

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14``` ```ease_swamping( x, y, xquant, yquant, inv_rob_sdx, rob_sdy, number_subset, size_subset, est_clean_set, asymvec, ep=0.1, alpha ) ```

## Arguments

 `x` Matrix with the values of the predictors. `y` Numeric vector of the response variable. `xquant ` Matrix with the quantiles of the predictors. `yquant ` Numeric vector of the quantiles of the response variable. `inv_rob_sdx` Numeric vector of the inverse of the median absolute deviation of the predictors. `rob_sdy` Median absolute deviation of the response variable. `number_subset` Number of random subsets. `size_subset` Size of the random subsets. The default is half of the initial sample size. `est_clean_set` The subject id of the estimated clean subset. The default is the initial sample. `asymvec` Numeric vector of the asymmetric values. It is suggested to choose 3 asymmetric points within the quartile. `ep` Threshold value to ensure that the estimated clean set is not empty. The default value is 0.1. `alpha` Significance level.

## Value

A index vector identifying the estimated non-influential observations using a conservative approach

## References

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2020). Asymmetric influence measure for high dimensional regression. Communications in Statistics - Theory and Methods.

Barry, A., Bhagwat, N., Misic, B., Poline, J.-B., and Greenwood, C. M. T. (2021). An algorithm-based multiple detection influence measure for high dimensional regression using expectile. arXiv: 2105.12286 [stat]. arXiv: 2105.12286.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81``` ```## Simulate a dataset where the first 10 observations are influentials require("MASS") # the vector of asymmetric point asymvec <- c(0.25,0.5,0.75) # the parameter of interest beta_param <- c(3,1.5,0,0,2,rep(0,1000-5)) # the contamination parameter gama_param <- c(0,0,1,1,0,rep(1,1000-5)) # Covariance matrice for the predictors distribution sigmain <- diag(rep(1,1000)) for (i in 1:1000) { for (j in i:1000) { sigmain[i,j] <- 0.5^(abs(j-i)) sigmain[j,i] <- sigmain[i,j] } } # set the seed set.seed(13) # the predictor matrix x <- mvrnorm(100, rep(0, 1000), sigmain) # the error variable error_var <- rnorm(100) # the response variable y <- x %*% beta_param + error_var y <- as.numeric(y) ### Generate influential observations # the contaminated response variable youtlier <- y youtlier[1:10] <- x[1:10,] %*% (beta_param + 1.2*gama_param) + error_var[1:10] youtlier <- as.numeric(youtlier) # the quantile of the predictors xquant <- apply(x,2,quantile,asymvec) # the quantile of contaminated response variable yquant <- quantile(youtlier,asymvec) # the inverse of the mad predictors inv_rob_sdx <- 1/apply(x,2,mad) # the mad contaminated response variable rob_sdy <- mad(youtlier) # the number of random subsets number_subset <- 5 # the size of random subsets size_subset <- 100/2 # the initial clean set est_clean_set <- 1:100 # the significance level alpha <- 0.05 # the function to run est_clean_set_ease_swamping <- ease_swamping( x, youtlier, xquant, yquant, inv_rob_sdx, rob_sdy, number_subset, size_subset, est_clean_set, asymvec, ep=0.1, alpha) ```

hidetify documentation built on Aug. 20, 2021, 5:06 p.m.