big.select: Select a subset of a big.matrix

Description Usage Arguments Value Author(s) Examples

View source: R/bigpca.R

Description

Select a subset of big.matrix using indexes for a subset of rows and columns. Essentially a wrapper for bigmemory::deepcopy, but with slightly more flexible parameters. bigMat can be entered in any form accepted by get.big.matrix(), row and column selections can be vectors of indexes, names or file.names containing indexes. Default is to process using deepcopy, but processing without using bigmemory native methods is a faster option when matrices are small versus available RAM. File names for backing files are managed only requiring you to enter a prefix, or optionally use the default and gain filebacked functionality without having to bother choosing filename parameters.

Usage

1
2
big.select(bigMat, select.rows = NULL, select.cols = NULL, dir = getwd(),
  deepC = TRUE, pref = "sel", delete.existing = FALSE, verbose = FALSE)

Arguments

bigMat

a big.matrix, matrix or any object accepted by get.big.matrix()

select.rows

selection of rows of bigMat, can be numbers, logical, rownames, or a file with names. If using a filename argument, must also use a filename argument for select.cols (cannot mix)

select.cols

selection of columns of bigMat, can be numbers, logical, colnames, or a file with names

dir

the directory containing the bigMat backing file (e.g, parameter for get.big.matrix()).

deepC

logical, whether to use bigmemory::deepcopy, which is slowish, but scalable, or alternatively to use standard indexing which converts the result to a regular matrix object, and is fast, but only feasible for matrices small enough to fit in memory.

pref

character, prefix for the big.matrix backingfile and descriptorfile, and optionally an R binary file containing a big.matrix.descriptor object pointing to the big.matrix result.

delete.existing

logical, if a big.matrix already exists with the same name as implied by the current 'pref' and 'dir' arguments, then default behaviour (FALSE) is to return an error. to overwrite any existing big.matrix file(s) of the same name(s), set this parameter to TRUE.

verbose

whether to display extra information about processing and progress

Value

A big.matrix with the selected (in order) rows and columns specified

Author(s)

Nicholas Cooper

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
if(file.exists("sel.bck")) { unlink(c("sel.bck","sel.dsc")) }
bmat <- generate.test.matrix(5,big.matrix=TRUE)
# take a subset of the big.matrix without using deepcopy
sel <- big.select(bmat,c(1,2,8),c(2:10),
 deepC=FALSE,verbose=TRUE, delete.existing=TRUE)
prv.big.matrix(sel)
# now select the same subset using row/column names from text files
writeLines(rownames(bmat)[c(1,2,8)],con="bigrowstemp.txt")
writeLines(colnames(bmat)[c(2:10)],con="bigcolstemp.txt")
sel <- big.select(bmat, "bigrowstemp.txt","bigcolstemp.txt", delete.existing=TRUE, pref="sel2")
prv.big.matrix(sel)
rm(bmat)
rm(sel)  
unlink(c("bigcolstemp.txt","bigrowstemp.txt","sel.RData","sel2.bck","sel2.dsc"))
setwd(orig.dir) # reset working dir to original

Example output

Loading required package: reader
Loading required package: NCmisc

Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext

Loading required package: bigmemory
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
Warning messages:
1: replacing previous import 'reader::cat.path' by 'NCmisc::cat.path' when loading 'bigpca' 
2: replacing previous import 'reader::get.ext' by 'NCmisc::get.ext' when loading 'bigpca' 
3: replacing previous import 'reader::rmv.ext' by 'NCmisc::rmv.ext' when loading 'bigpca' 
sel.dsc 
  "sel" 
 attached matrix with dims: 100,1000 
 calculating selections for rows
 selected 9 listed samples and 3 variables

Reordering Variables and Samples...

INDEXES SUMMARY
3 row indexes range is from 1 to 8 
-->, 1, 2, 8
9 col indexes range is from 2 to 10 
-->, 2, 3, 4, 5, 6, 7

 raw big.matrix summary before selection/ordering:

Big matrix with: 100 rows, 1000 columns
 - data type: numeric 
 - not filebacked
              colnames 
Row# rownames  ID84931  ID29567  .....   ID65470 
   1  rs69562   0.9558  -0.0459   ...    -0.2925 
   2  rs72509   -2.003   0.1262   ...     0.2302 
   3  rs75847  -0.8905   0.8413   ...      1.488 
  ..     ....      ...      ...   ...        ... 
 100  rs89359   0.3583    0.284   ...     0.8124 

 running reorder in system memory
 adding colnames
 adding rownames
 converting matrix to big.matrix
 matrix descr saved as standard description file: sel.dsc 
 created big.matrix description file: sel.dsc 
 created big.matrix backing file: sel.bck 
 created big.matrix binary description file: sel.RData 

Big matrix; 'sel.RData', with: 3 rows, 9 columns
 - data type: numeric 

              colnames 
Row# rownames  ID29567  ID58125  .....   ID67596 
   1  rs69562  -0.0459    1.009   ...      0.582 
   2  rs72509   0.1262  -0.5086   ...    -0.4029 
   3  rs48865   0.0045  -0.3963   ...    -0.2059 

sel2.dsc 
  "sel2" 

Big matrix; 'sel2.RData', with: 3 rows, 9 columns
 - data type: numeric 

              colnames 
Row# rownames  ID29567  ID58125  .....   ID67596 
   1  rs69562  -0.0459    1.009   ...      0.582 
   2  rs72509   0.1262  -0.5086   ...    -0.4029 
   3  rs48865   0.0045  -0.3963   ...    -0.2059 

bigpca documentation built on Nov. 22, 2017, 1:02 a.m.