ArtificialData: create artificial data for testing

Description Usage Arguments Value Author(s) Examples

Description

This function allows quick generation of a test data set which can be used with the majority of the Join functions

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
ArtificialData(fakeDataDir = "~/fakeData2/", joinKey = letters[1:20], 


    numFiles = 4, N = rep(15, numFiles), SORT = 1, GZIP = 0, 


    sep = c(" ", ",", "\t", "|")[1], prefix = "file", suffix = ".txt", 


    daten = month.abb, NCOL = rep(3, numFiles), chunkSize = 1000, 


    verbose = 0)

Arguments

fakeDataDir

directory to put the data

joinKey

set of join keys to choose from (has to be longer than N) - this column will be the key for join

numFiles

number of files to split the data across

N

number of rows in each file created, e.g. N = c(15,20,10,30)

SORT

should the join key be sorted?

GZIP

should the data files created by gzipped?

sep

column delimiter; default white space

prefix

file name prefix

suffix

file name suffix

daten

data to sample from

NCOL

number of data columns per file

chunkSize

write that many lines to the file at once

verbose

level of verbosity

Value

invisibly return data and file names

Author(s)

"Markus Loecher, Berlin School of Economics and Law (BSEL)" <markus.loecher@gmail.com>

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
if (0){


  ArtificialData("fakeData2",verbose=1)


  ArtificialData("fakeData2",joinKey = 1:2000, N = rep(1500,4) ,verbose=0)


  


  ret = ArtificialData(fakeDataDir="/tmp/fakeData")


  ret = ArtificialData(fakeDataDir="./fakeData", joinKey=letters[1:10], numFiles = 6, N = rep(5,6))


  ret = ArtificialData(SORT = 1, GZIP = 1)





  ret = ArtificialData(fakeDataDir="fakeData", joinKey = 0:9, N = rep(6, 4), verbose=1)


  #on allegro:


  ret = ArtificialData(fakeDataDir="./fakeData", joinKey=letters, numFiles = 10, 


                       N = rep(18,10), NCOL=rep(5,10))


}

MultiJoin documentation built on May 1, 2019, 7:32 p.m.