Description Usage Arguments Value Author(s) See Also Examples
Thin the rows (or columns) of a large matrix or big.matrix in order to reduce the size of the dataset while retaining important information. Percentage of the original size or a new number of rows/columns is selectable, and then there are four methods to choose the data subset. Simple uniform and random selection can be specified. Other methods look at the correlation structure of a subset of the data to derive non-arbitrary selections, using correlation, PCA, or association with a phenotype or some other categorical variable. Each of the four methods has a separate function in this package, which you can see for more information, this function is merely a wrapper to select one of the four.
1 2 3 |
bigMat |
a big.matrix object, or any argument accepted by get.big.matrix(), which includes paths to description files or even a standard matrix object. |
keep |
numeric, by default a proportion (decimal) of the original number of rows/columns to choose for the subset. Otherwise if an integer>2 then will assume this is the size of the desired subset, e.g, for a dataset with 10,000 rows where you want a subset size of 1,000 you could set 'keep' as either 0.1 or 1000. |
how |
character, only the first two characters are required and they are not case sensitive, select what method to use to perform subset selection, options are: 'uniform': evenly spaced selection when random=FALSE, or random selection otherwise; see uniform.select(). 'correlation': most correlated subset when hi.cor=TRUE, least correlated otherwise; see subcor.select(). 'pca': most representative variables of the principle components of a subset; see subpc.select(). 'association': most correlated subset with phenotype if least=FALSE, or least correlated otherwise; see select.least.assoc(). |
dir |
directory containing the filebacked.big.matrix, same as 'dir' for get.big.matrix. |
rows |
logical, whether to choose a subset of rows (TRUE), or columns (FALSE). rows is always TRUE when using 'association' methods. |
random |
logical, whether to use random selections and subsets (TRUE), or whether to use uniform selections that should give the same result each time for the same dataset (FALSE) |
hi.cor |
logical, if using 'correlation' methods, then whether to choose the most correlated (TRUE) or least correlated (FALSE). |
least |
logical, if using 'association' methods, whether to choose the least associated (TRUE) or most associated variables with phenotype |
pref |
character, a prefix for big.matrix backing files generated by this selection |
verbose |
logical, whether to display more information about processing |
ret.obj |
logical, whether to return the result as a big.matrix object (TRUE), or as a reference to the binary file containing the big.matrix.descriptor object [either can be read with get.big.matrix() or prv.big.matrix()] |
... |
other arguments to be passed to uniform.select, subpc.select, subcor.select, or select.least.assoc |
A smaller big.matrix with fewer rows and/or columns than the original matrix
Nicholas Cooper
uniform.select
, subpc.select
, subcor.select
,
select.least.assoc
, big.select
, get.big.matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
if(file.exists("thin.bck")) { unlink(c("thin.bck","thin.dsc")) }
bmat <- generate.test.matrix(5,big.matrix=TRUE)
prv.big.matrix(bmat)
# make 5% random selection:
lmat <- thin(bmat, pref="th2")
prv.big.matrix(lmat)
# make 10% most orthogonal selection (lowest correlations):
lmat <- thin(bmat,.10,"cor",hi.cor=FALSE, pref="th3")
prv.big.matrix(lmat)
# make 10% most representative selection:
lmat <- thin(bmat,.10,"PCA",ret.obj=FALSE, pref="th4") # return file name instead of object
print(lmat)
prv.big.matrix(lmat)
# make 25% selection most correlated to phenotype
# create random phenotype variable
pheno <- rep(1,ncol(bmat)); pheno[which(runif(ncol(bmat))<.5)] <- 2
lmat <- thin(bmat,.25,"assoc",phenotype=pheno,least=FALSE,verbose=TRUE, pref="th5")
prv.big.matrix(lmat)
# tidy up temporary files:
rm(lmat)
unlink(c("thin.bck","thin.dsc","thin.RData",paste0("th",2:5)))
setwd(orig.dir)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.