# scovq: Supervised scatter matrix based on quantiles In ICS: Tools for Exploring Multivariate Data via ICS/ICA

 scovq R Documentation

## Supervised scatter matrix based on quantiles

### Description

Function for a supervised scatter matrix that is the weighted covariance matrix of x with weights 1/(q2-q1) if y is between the lower (q1) and upper (q2) quantile and 0 otherwise (or vice versa).

### Usage

scovq(x, y, q1 = 0, q2 = 0.5, pos = TRUE, type = 7,
method = "unbiased", na.action = na.fail,
check = TRUE)

### Arguments

 x numeric data matrix with at least two columns. y numerical vector specifying the dependent variable. q1 percentage for lower quantile of y. With 0 <= q1 < q2. See details. q2 percentage for upper quantile of y. With q1 < q2 <= 1. See details. pos logical. If TRUE then the weights are 1/(q2-q1) if y is between the q1- and q2- quantiles and 0 othervise. If FALSE then the weights are 0 if y between q1- and q2-quantiles and 1/(1-q2+q1) otherwise. type passed on to function quantile. method passed on to function cov.wt. na.action a function which indicates what should happen when the data contain 'NA's. Default is to fail. check logical. Checks if the input should be checked for consistency. If not needed setting it to FALSE might save some time.

### Details

The weights for this supervised scatter matrix for pos=TRUE are w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1). Then scovq is calculated as

scovq = \sum w(y) (x-\bar{x}_w)'(x-\bar{x}_w).

where \bar{x}_w = \sum w(y) x.

To see how this function can be used in the context of supervised invariant coordinate selection see the example below.

a matrix.

Klaus Nordhausen

### References

Liski, E., Nordhausen, K. and Oja, H. (2014), Supervised invariant coordinate selection, Statistics: A Journal of Theoretical and Applied Statistics, 48, 711–731. <doi:10.1080/02331888.2013.800067>.

cov.wt and ics

### Examples

# Creating some data

# The number of explaining variables
p <- 10
# The number of observations
n <- 400
# The error variance
sigma <- 0.5
# The explaining variables
X <- matrix(rnorm(p*n),n,p)
# The error term
epsilon <- rnorm(n, sd = sigma)
# The response
y <- X[,1]^2 + X[,2]^2*epsilon

# SICS with ics

X.centered <- sweep(X,2,colMeans(X),"-")
SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25,
q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")

# Assuming it is known that k=2, then the two directions
# of interest are choosen as:

k <- 2
KURTS <- SICS@gKurt
KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
ordKM <- order(KURTS.max, decreasing = TRUE)

indKM <- ordKM[1:k]

# The two variables of interest
Zk <- ics.components(SICS)[,indKM]

# The correspondings transformation matrix
Bk <- coef(SICS)[indKM,]

# The corresponding projection matrix
Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk

# Visualization
pairs(cbind(y,Zk))

# checking the subspace difference

# true projection

B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0

# crone and crosby subspace distance measure, should be small
k - sum(diag(P0 %*% Pk))

ICS documentation built on Sept. 21, 2023, 9:07 a.m.