generate.test.matrix: Generate a test matrix of random data

Description Usage Arguments Value Author(s) Examples

View source: R/bigpca.R

Description

Generates a test matrix of easily specified size and type. Options allow automated row and column names (which might resemble labels for a SNP analysis) and return of several different formats, matrix, data.frame or big.matrix. You can specify the randomisation function (e.g, rnorm, runif, etc), as well as parameters determining the matrix size. Can also generate big.matrix objects, and an important feature is that the method to generate big.matrix objects is scalable so that very large matrices for simulation can be generated only limited by disk space and not by RAM.

Usage

1
2
3
generate.test.matrix(size = 5, row.exp = 2, rand = rnorm,
  dimnames = TRUE, data.frame = FALSE, big.matrix = FALSE,
  file.name = NULL, tracker = TRUE)

Arguments

size

10^size is the total number of datapoints simulated. 6 or less are fairly quick to generate, while 7 takes a few seconds. 8 will take under a minute, 9 around ten minutes, 10, perhaps over an hour. Values are coerced to the range of integers c(2:10).

row.exp

similar to 'nrow' when creating a matrix, except this is exponential, giving 10^row.exp rows.

rand

a function, must return 'n' values, when rand(n) is called, eg., rnorm(), runif(), numeric()

dimnames

logical, whether to generate some row and column names

data.frame

logical, whether to return as a data.frame (FALSE means return a matrix)

big.matrix

logical, whether to return as a big.matrix (overrides data.frame). If a file.name is used then the big.matrix will be filebacked and this function returns a list with a a big.matrix, and the description and backing filenames.

file.name

if a character, then will write the result to tab file instead of returning the object, will return the filename; overrides data.frame. Alternatively, if big.matrix=TRUE, then this provides the basename for a filebacked big.matrix.

tracker

logical, whether to display a progress bar for large matrices (size>7) where progress will be slow

Value

Returns a random matrix of data for testing/simulation, can be a data.frame or big.matrix if those options are selected

Author(s)

Nicholas Cooper

Examples

1
2
3
4
5
6
orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
mat <- (generate.test.matrix(5)); prv(mat)
lst <- (generate.test.matrix(5,3,big.matrix=TRUE,file.name="bigtest"))
mat <- lst[[1]]; prv(mat); headl(lst[2:3]); 
unlink(unlist(lst[2:3]))
setwd(orig.dir) # reset working dir to original

Example output

Loading required package: reader
Loading required package: NCmisc

Attaching package: 'reader'

The following objects are masked from 'package:NCmisc':

    cat.path, get.ext, rmv.ext

Loading required package: bigmemory
Loading required package: biganalytics
Loading required package: foreach
Loading required package: biglm
Loading required package: DBI
Warning messages:
1: replacing previous import 'reader::cat.path' by 'NCmisc::cat.path' when loading 'bigpca' 
2: replacing previous import 'reader::get.ext' by 'NCmisc::get.ext' when loading 'bigpca' 
3: replacing previous import 'reader::rmv.ext' by 'NCmisc::rmv.ext' when loading 'bigpca' 
mat (matrix, 100*1000)

              colnames 
Row# rownames  ID77704  ID42795  .....   ID36382 
   1  rs85708  -1.9362   1.2038   ...    -0.3019 
   2  rs56907   0.6302   0.0483   ...    -0.1212 
   3  rs63958  -0.7912   -0.186   ...    -0.8593 
  ..     ....      ...      ...   ...        ... 
 100  rs88733  -0.5982  -1.1615   ...    -2.9896 


Big matrix; 'mat', with: 1000 rows, 100 columns
 - data type: numeric 

              colnames 
Row# rownames   ID2513   ID6348  .....    ID9112 
   1 rs741921   0.2811   0.6853   ...    -0.7875 
   2 rs556508   1.2744   0.1488   ...     0.3164 
   3 rs405535    1.169   0.0163   ...     0.1688 
  ..     ....      ...      ...   ...        ... 
1000 rs966751   -0.817  -0.7666   ...     0.7756 


$descr:
[1] "bigtest.dsc"
$bck:
[1] "bigtest.bck"

bigpca documentation built on Nov. 22, 2017, 1:02 a.m.