Supervised scatter matrix based on quantiles

Description

Function for a supervised scatter matrix that is the weighted covariance matrix of x with weights 1/(q2-q1) if y is between the lower (q1) and upper (q2) quantile and 0 otherwise (or vice versa).

Usage

1
2
3
scovq(x, y, q1 = 0, q2 = 0.5, pos = TRUE, type = 7, 
      method = "unbiased", na.action = na.fail, 
      check = TRUE)

Arguments

x

numeric data matrix with at least two columns.

y

numerical vector specifying the dependent variable.

q1

percentage for lower quantile of y. With 0 <= q1 < q2. See details.

q2

percentage for upper quantile of y. With q1 < q2 <= 1. See details.

pos

logical. If TRUE then the weights are 1/(q2-q1) if y is between the q1- and q2- quantiles and 0 othervise. If FALSE then the weights are 0 if y between q1- and q2-quantiles and 1/(1-q2+q1) otherwise.

type

passed on to function quantile.

method

passed on to function cov.wt.

na.action

a function which indicates what should happen when the data contain 'NA's. Default is to fail.

check

logical. Checks if the input should be checked for consistency. If not needed setting it to FALSE might save some time.

Details

The weights for this supervised scatter matrix for pos=TRUE are w(y) = I(q1-quantile < y < q2-quantile)/(q2-q1). Then scovq is calculated as

scovq = ∑ w(y) (x-x_w_bar)'(x-x_w_bar).

where x_w_bar = sum w(y)x.

To see how this function can be used in the context of supervised invariant coordinate selection see the example below.

Value

a matrix.

Author(s)

Klaus Nordhausen

References

Liski, E., Nordhausen, K. and Oja, H. (2014), Supervised invariant coordinate selection, Statistics: A Journal of Theoretical and Applied Statistics, 48, 711–731. <doi:10.1080/02331888.2013.800067>.

See Also

cov.wt and ics

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
# Creating some data

# The number of explaining variables
p <- 10
# The number of observations
n <- 400
# The error variance
sigma <- 0.5
# The explaining variables 
X <- matrix(rnorm(p*n),n,p)
# The error term
epsilon <- rnorm(n, sd = sigma)
# The response
y <- X[,1]^2 + X[,2]^2*epsilon


# SICS with ics

X.centered <- sweep(X,2,colMeans(X),"-")
SICS <- ics(X.centered, S1=cov, S2=scovq, S2args=list(y=y, q1=0.25, 
        q2=0.75, pos=FALSE), stdKurt=FALSE, stdB="Z")

# Assuming it is known that k=2, then the two directions 
# of interest are choosen as:

k <- 2
KURTS <- SICS@gKurt 
KURTS.max <- ifelse(KURTS >= 1, KURTS, 1/KURTS)
ordKM <- order(KURTS.max, decreasing = TRUE)

indKM <- ordKM[1:k]

# The two variables of interest
Zk <- ics.components(SICS)[,indKM]

# The correspondings transformation matrix
Bk <- coef(SICS)[indKM,]

# The corresponding projection matrix
Pk <- t(Bk) %*% solve(Bk %*% t(Bk)) %*% Bk

# Visualization
pairs(cbind(y,Zk))

# checking the subspace difference

# true projection

B0 <- rbind(rep(c(1,0),c(1,p-1)),rep(c(0,1,0),c(1,1,p-2)))
P0 <- t(B0) %*% solve(B0 %*% t(B0)) %*% B0

# crone and crosby subspace distance measure, should be small
k - sum(diag(P0 %*% Pk))