This function preprocesses the design matrix by removing those
columns that contain
NA's or are all zero. It also standardizes
non-binary columns to have mean zero and variance one.
It returns a list having the following objects:
The filtered design matrix which can be used in variable selection procedure. Binary columns are moved to the end of the design matrix.
Gene names read from the column names of the filtered design matrix.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
### Constructing a synthetic design matrix for the purpose of preprocessing ### imposing columns with different scales n <- 40 p1 <- 50 p2 <- 150 p <- p1 + p2 X1 <- matrix(rnorm(n*p1, 1, 2), ncol = p1) X2 <- matrix(rnorm(n*p2), ncol = p2) X <- cbind(X1, X2) ### putting NA elements in the matrix X[3,85] <- NA X[25,85] <- NA X[35,43] <- NA X[15,128] <- NA colnames(X) <- paste("gene_",c(1:p),sep="") ### Running the function. Note the intercept column that is added as the ### first column in the "logistic" family Xout <- PreProcess(X) dim(Xout$X) == (p + 1) ## 1 is added because intercept column is included ## This is FALSE because of the removal of columns with NA elements
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.