Predict from GLoMo model with conditional rejection

Description

The method predict.GLoMo can sample to fill out missing values in a dataset. There, this happens with only the GLoMo in mind. This function allows to provide an extra function that might reject sampled data based on other criteria. The non-allrows version does so for 1 row at a time.

Usage

1
2
3
4
5
6
7
## S3 method for class 'allrows.GLoMo'
predict.conditional(object, nobs = 1, dfr, forrows = seq(nrow(dfr)), validateFunction = validateFunction.default, guiddata = NULL, otherData = NULL, initialSuccessRateGuess = 0.5, verbosity = 0, minimumSuccessRate=0.001,...)
## S3 method for class 'GLoMo'
predict.conditional(object, nobs=1, dfr, forrows, validateFunction=validateFunction.default, guiddata=NULL, otherData=NULL, initialSuccessRateGuess=0.5, verbosity=0, minimumSuccessRate=0.001,...)
validateFunction.acceptall(attempts, otherData, forrow, verbosity = 0)
validateFunction.useprob(attempts, otherData, forrow, verbosity = 0)
validateFunction.default(attempts, otherData, forrow, verbosity = 0)

Arguments

object

GLoMo object

nobs

number of observations to sample. Can be a single integer or (for predict.conditional.allrows.GLoMo) a vector of the same length as forrows.

dfr

data.frame or numdfr to sample observations for

forrows, forrow

Which of the row(s) from dfr should be considered

validateFunction

After the standard sampling of predict.GLoMo, the rows and otherData are fed to this function that should return the row indices of the accepted rows. The default is validateFunction.default, which randomly accepts about half of the sampled rows.

guiddata

see getGuidData. If not provided, it is calculated.

otherData

Passed on to validateFunction. Typically contains an item per row in dfr

initialSuccessRateGuess

Used to sample too many rows with predict.GLoMo, based on how many the user expects to fail. Default is 0.5.

verbosity

The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)

minimumSuccessRate

To prevent conditional prediction to run 'forever' because all observations are simply unlikely, you can pass along a minimum success rate (between 0 and 1): if the attained success rate goes below this, one more attempt is done, and, if need be, predictions are accepted randomly to get enough of them.

...

Ignored for now

attempts

data.frame or numdfr with unconditionally sampled data.

Details

This function is mostly provided with the MCMC of EMLasso in mind (i.e. reject based on a glmnet fit and matching true outcomes for each row in dfr.

Typically, other validateFunctions will have to be created for this to make sense. It is then up to the creator/user to make sure otherData is consistent with what this specific validateFunction expects.

The signature of a validateFunction can be easily spied from validateFunction.default (and is not repeated here to avoid maintenance issues).

Specifically, validateFunction.acceptall accepts all rows, validateFunction.useprob expects a passed along probability per row in otherData and rejects with this probability, while validateFunction.default does the same, but always with probability 0.5.

Value

List with two items

predicted

data.frame or numdfr (dependent on the dfr that was used in the original call to GLoMo that holds the sampled data

glomorowsused

vector that holds 1 item per row in predicted, holding which rowindex in glomo was used (i.e. which cell was sampled)

Note

The non-allrows version works only for 1 row at a time.

Author(s)

Nick Sabbe (nick.sabbe@ugent.be)

See Also

GLoMo-package, NumDfr, predict

Examples

1
2
3
4
5
6
7
iris.md<-randomNA(iris, 0.1)
iris.md.nd<-numdfr(iris.md)
iris.nd.rnd<-rCatsAndCntInDfr(iris.md.nd, orgriName=NULL, verbosity=1)
iris.weights<-iris.nd.rnd$weights
iris.nd.rnd<-iris.nd.rnd[,1:5]
iris.glomo<-GLoMo(iris.nd.rnd, weights=iris.weights, verbosity=1)
iris.pred.cond<-predict.conditional.allrows.GLoMo(iris.glomo, nobs=5, dfr=iris.md.nd, verbosity=10)