(Multiply) complete dataset based on marginal properties of each column

Share:

Description

(Multiply) complete dataset based on marginal properties of each column

Usage

1
2
rCatsAndCntInDfr(dfr, maxFullNACatCols = 6, howManyIfTooMany = 1000, weightsName = "weights", orgriName = "orgri", reweightPerRow = FALSE, verbosity = 0, ...)
rCatsInDfr(dfr, maxFullNACatCols=6, howManyIfTooMany=1000, onlyCategorical=FALSE,	weightsName="weights", orgriName="orgri", reweightPerRow=FALSE, verbosity=0,...)

Arguments

dfr

data.frame or numdfr to complete

maxFullNACatCols, howManyIfTooMany

If a row from dfr contains more than maxFullNACatCols (default: 6) NA values, not all combinations are generated, but a +/- random subset of size howManyIfTooMany (default: 1000)

onlyCategorical

if TRUE, only the categorical columns are returned from rCatsInDfr

weightsName

if not NULL, an extra column (with this name) is added to the return value, holding a 'weight' so that all rows that originate from the same row of dfr have total weight 1 (dependent on reweightPerRow). Defaults to "weights"

orgriName

if not NULL, an extra column (with this name) is added to the return value, holding the rownumber in dfr that this row originates from. Defaults to "orgri"

reweightPerRow

If weights are returned, then for rows having more than maxFullNACatCols NA values, the weights (that are originally only relative to all possible combinations) are 'reweighted' so they sum to 1.

verbosity

The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)

...

Ignored for now

Details

The 'random subset' is created by drawing the missing categorical values based on their marginal probability in dfr.

The continuous missing data is simply filled out with the mean.

Value

Object of the same class as dfr. Dependent on onlyCategorical, it may only contain the categorical columns. For the rest it mainly has the same structure as dfr, though it may contain two extra columns based on weightsName and orgriName.

Author(s)

Nick Sabbe (nick.sabbe@ugent.be)

See Also

GLoMo-package, NumDfr

Examples

1
2
3
iris.md<-randomNA(iris, 0.1)
iris.md.nd<-numdfr(iris.md)
iris.nd.rnd<-rCatsAndCntInDfr(iris.md.nd, orgriName=NULL, verbosity=1)