genNumeric: Generate a dataframe of numeric variables with known...

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/generators.R

Description

Quickly generate random numeric data with known properties

Usage

1
genNumeric(n, k, rho, seed, pattern, ...)

Arguments

n

Number of rows of data to be generated

k

Number of columns to be generated

rho

Correlation coefficient between pairs of variables

seed

A vector of numerics length n to be used to generate correlations for other variabes from

pattern

List of attributes for columns of data in the data frame created

na.rm

a logical indicating whether to fit the distribution excluding missing values or to fail on missing values

Details

pattern allows the user to specify a list with three elements: dist, rho, and name. Each element should be length k. Dist currently supports the options for norm (normal), binom (binomial), chisq (Chi-squared), pois (poisson), unif (uniform), weibull (Weibull), and gamma (gamma) distributions. Rho should be a numeric between -1 and 1 representing the correlation coefficient between that variable and the first column of the data frame. Names should be characters corresponding to the names of the columns in the resulting dataframe.

Value

An R data frame of n rows and k columns with distributions specified in pattern. If pattern is not specified then variables are normally distributed with sequential bivariate correlations equal to rho.

Note

For low n the value of rho will vary more greatly from the desired value.

Currently, depending on the distribution the correlation is being built against, rho should be the correct sign, but it will not always result in the correct magnitude.

Author(s)

Jared E. Knowles

Examples

1
2
3
4
5
6
7
8
9
dat1 <- genNumeric(1000, 3, rho=0.3)
cor(dat1[, 1], dat1[, 2])
cor(dat1[, 2], dat1[, 3])
# Specify a pattern
struc <- list(dist=c("norm", "pois", "unif"), rho=c(0.2, -.5, .5), 
names=c("super", "cool", "data"))
dat2 <- genNumeric(1000, pattern=struc)
cor(dat2[, 1], dat2[, 2])
cor(dat2[, 1], dat2[, 3])

jknowles/datasynthR documentation built on May 19, 2019, 11:42 a.m.