# scovq: Supervised scatter matrix based on quantiles In ICS: Tools for Exploring Multivariate Data via ICS/ICA

## Description

Function for a supervised scatter matrix that is the weighted covariance matrix of `x` with weights 1/(`q2-q1`) if `y` is between the lower (`q1`) and upper (`q2`) quantile and 0 otherwise (or vice versa).

## Usage

 ```1 2 3``` ```scovq(x, y, q1 = 0, q2 = 0.5, pos = TRUE, type = 7, method = "unbiased", na.action = na.fail, check = TRUE) ```

## Arguments

 `x` numeric data matrix with at least two columns. `y` numerical vector specifying the dependent variable. `q1` percentage for lower quantile of `y`. With 0 <= `q1` < `q2`. See details. `q2` percentage for upper quantile of `y`. With `q1` < `q2` <= 1. See details. `pos` logical. If TRUE then the weights are 1/(`q2-q1`) if `y` is between the `q1`- and `q2`- quantiles and 0 othervise. If FALSE then the weights are 0 if `y` between `q1`- and `q2`-quantiles and 1/(`1-q2+q1`) otherwise. `type` passed on to function `quantile`. `method` passed on to function `cov.wt`. `na.action` a function which indicates what should happen when the data contain 'NA's. Default is to fail. `check` logical. Checks if the input should be checked for consistency. If not needed setting it to FALSE might save some time.

## Details

The weights for this supervised scatter matrix for `pos=TRUE` are w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1). Then `scovq` is calculated as

scovq = ∑ w(y) (x-x_w_bar)'(x-x_w_bar).

where x_w_bar = sum w(y)x.

To see how this function can be used in the context of supervised invariant coordinate selection see the example below.

a matrix.

Klaus Nordhausen

## References

Liski, E., Nordhausen, K. and Oja, H. (2014), Supervised invariant coordinate selection, Statistics: A Journal of Theoretical and Applied Statistics, 48, 711–731. <doi:10.1080/02331888.2013.800067>.

`cov.wt` and `ics`
 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53``` ```# Creating some data # The number of explaining variables p <- 10 # The number of observations n <- 400 # The error variance sigma <- 0.5 # The explaining variables X <- matrix(rnorm(p*n),n,p) # The error term epsilon <- rnorm(n, sd = sigma) # The response y <- X[,1]^2 + X[,2]^2*epsilon # SICS with ics X.centered <- sweep(X,2,colMeans(X),"-") SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25, q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z") # Assuming it is known that k=2, then the two directions # of interest are choosen as: k <- 2 KURTS <- SICS@gKurt KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS) ordKM <- order(KURTS.max, decreasing = TRUE) indKM <- ordKM[1:k] # The two variables of interest Zk <- ics.components(SICS)[,indKM] # The correspondings transformation matrix Bk <- coef(SICS)[indKM,] # The corresponding projection matrix Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk # Visualization pairs(cbind(y,Zk)) # checking the subspace difference # true projection B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2))) P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0 # crone and crosby subspace distance measure, should be small k - sum(diag(P0 %*% Pk)) ```