rcpp_mask_swamp_stat: Arma Masking and Swamping Statistics In hidetify: Identify Influential Observations in High Dimension

Description

`rcpp_mask_swamp_stat` computes the `min` of the `min` and the `max` of the `sum` statistics of the asymmetric influence measure which represent the influential score of the observations in the swamping and masking steps, respectively.

Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```rcpp_mask_swamp_stat( x, y, xquant, yquant, inv_rob_sdx, rob_sdy, number_subset, size_subset, est_clean_set, asymvec ) ```

Arguments

 `x` a matrix of elements. `y` a vector of elements. `xquant` quantiles of the columns of `x` stacked in the matrix `xquant`. `yquant` quantiles vector of the vector `y`. `inv_rob_sdx` inverse of the median absolute deviation of the matrix `x`. `rob_sdy` median absolute deviation of the vector `y`. `number_subset` number of random subsets. `size_subset` size of random subsets. `est_clean_set` estimated cleaned set. `asymvec` vector of asymmetric points or percentiles.

Value

A matrix with the `min` of the `min` and the `max` of the `sum` statistics in the first and second column for each observation, respectively.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77``` ```## Not run: ## Simulate a dataset where the first 10 observations are influentials require("MASS") # the vector of asymmetric point asymvec <- c(0.25,0.5,0.75) # the parameter of interest beta_param <- c(3,1.5,0,0,2,rep(0,1000-5)) # the contamination parameter gama_param <- c(0,0,1,1,0,rep(1,1000-5)) # Covariance matrice for the predictors distribution sigmain <- diag(rep(1,1000)) for (i in 1:1000) { for (j in i:1000) { sigmain[i,j] <- 0.5^(abs(j-i)) sigmain[j,i] <- sigmain[i,j] } } # set the seed set.seed(13) # the predictor matrix x <- mvrnorm(100, rep(0, 1000), sigmain) # the error variable error_var <- rnorm(100) # the response variable y <- x %*% beta_param + error_var y <- as.numeric(y) ### Generate influential observations # the contaminated response variable youtlier <- y youtlier[1:10] <- x[1:10,] %*% (beta_param + 1.2*gama_param) + error_var[1:10] youtlier <- as.numeric(youtlier) # the quantile of the predictors xquant <- apply(x,2,quantile,asymvec) # the quantile of contaminated response variable yquant <- quantile(youtlier,asymvec) # the inverse of the mad predictors inv_rob_sdx <- 1/apply(x,2,mad) # the mad contaminated response variable rob_sdy <- mad(youtlier) # the number of random subsets number_subset <- 5 # the size of random subsets size_subset <- 100/2 # the initial clean set est_clean_set <- 1:100 out <- rcpp_mask_swamp_stat( x, y, xquant, yquant, inv_rob_sdx, rob_sdy, number_subset, size_subset, est_clean_set, asymvec ) ## End(Not run) ```

