generateTestData: generateTestData

Description Usage Arguments Details Value Examples

View source: R/generateTestData.R

Description

Generates randomly test data set(s) of size n. For a detailed application see the vignette.

Usage

1

Arguments

n

number of observations for the test data (default: 500)

...

further named (artificial) variables to add to the data set

Details

The generated data set(s) contains at least eight variables:

sex

sex of respondent (M male, F female)

firstname

first name of respondent

mothername

first name of respondents mother

lastname

last name of respondent

email

e-mail address of respondent

birthplace

birthplace of respondent

birthday

birthday of respondent

code

code generated from respondents data: 2nd and 3rd letter of mothers first name, day of birth, 3rd and 4th letter of birth city and 1st letter of first name

t

character at which time points the observations is included

The basic data files for the generation can be found in the extdata directory of the package

firstname_female.txt

taken from Wiktionary

firstname_male.txt

taken from Wiktionary

lastname.txt

taken from genealogy.net Wiki

email_provider.txt

taken from T. Brian Jones free_email_provider_domains.txt

staedte.txt

taken from Wikipedia list of cities and towns in Germany

The further named parameters are either functions of the form FUN(n) which generates n values or a vector of values from which is sampled by sample(v, size=n, replace=TRUE). You may overwrite the internal functions (sex, ...) to generate your own values.

Value

a data frame

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# create a single data set
d <- generateTestData(25)
str(d)
# create a single data set with an additional 'points' variable
d <- generateTestData(25, points=function(n) { sample(20, size=n, replace=TRUE)} )
str(d)
# generate panel data with two time points:
# 20 obs only in t1, 
# 10 in t1 and t2 and 
# 15 only in t2
n <- list(c(20, 1), c(10, 1, 2), c(15, 2))
d <- generateTestData(n)
str(d)
# generate panel data with three time points:
# 6 obs only in t1, 
# 5 in t1 and t2, 
# 8 in t1 and t3, 
# 7 in t1, t2 and t3
# 4 only in t2
# 3 in t2 and t3
# 2 only in t3
n <- list(c(6, 1), c(5, 1, 2), c(8, 1, 3), c(7, 1, 2, 3), c(4, 2), c(3, 2, 3), c(2, 3))
d <- generateTestData(n)
str(d)

sigbertklinke/findMatch documentation built on July 12, 2019, 9:22 a.m.