predict.conditional: Predict from GLoMo model with conditional rejection
In GLoMo: Naive General Location Model

Description Usage Arguments Details Value Note Author(s) See Also Examples

View source: R/GLoMo.r

The method predict.GLoMo can sample to fill out missing values in a dataset. There, this happens with only the GLoMo in mind. This function allows to provide an extra function that might reject sampled data based on other criteria. The non-allrows version does so for 1 row at a time.

## S3 method for class 'allrows.GLoMo'
predict.conditional(object, nobs = 1, dfr, forrows = seq(nrow(dfr)), validateFunction = validateFunction.default, guiddata = NULL, otherData = NULL, initialSuccessRateGuess = 0.5, verbosity = 0, minimumSuccessRate=0.001,...)
## S3 method for class 'GLoMo'
predict.conditional(object, nobs=1, dfr, forrows, validateFunction=validateFunction.default, guiddata=NULL, otherData=NULL, initialSuccessRateGuess=0.5, verbosity=0, minimumSuccessRate=0.001,...)
validateFunction.acceptall(attempts, otherData, forrow, verbosity = 0)
validateFunction.useprob(attempts, otherData, forrow, verbosity = 0)
validateFunction.default(attempts, otherData, forrow, verbosity = 0)

`object`	`GLoMo` object
`nobs`	number of observations to sample. Can be a single integer or (for `predict.conditional.allrows.GLoMo`) a vector of the same length as `forrows`.
`dfr`	`data.frame` or `numdfr` to sample observations for
`forrows, forrow`	Which of the row(s) from `dfr` should be considered
`validateFunction`	After the standard sampling of `predict.GLoMo`, the rows and `otherData` are fed to this function that should return the row indices of the accepted rows. The default is `validateFunction.default`, which randomly accepts about half of the sampled rows.
`guiddata`	see `getGuidData`. If not provided, it is calculated.
`otherData`	Passed on to `validateFunction`. Typically contains an item per row in `dfr`
`initialSuccessRateGuess`	Used to sample too many rows with `predict.GLoMo`, based on how many the user expects to fail. Default is 0.5.
`verbosity`	The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)
`minimumSuccessRate`	To prevent conditional prediction to run 'forever' because all observations are simply unlikely, you can pass along a minimum success rate (between 0 and 1): if the attained success rate goes below this, one more attempt is done, and, if need be, predictions are accepted randomly to get enough of them.
`...`	Ignored for now
`attempts`	`data.frame` or `numdfr` with unconditionally sampled data.

This function is mostly provided with the MCMC of EMLasso in mind (i.e. reject based on a glmnet fit and matching true outcomes for each row in dfr.

Typically, other validateFunctions will have to be created for this to make sense. It is then up to the creator/user to make sure otherData is consistent with what this specific validateFunction expects.

The signature of a validateFunction can be easily spied from validateFunction.default (and is not repeated here to avoid maintenance issues).

Specifically, validateFunction.acceptall accepts all rows, validateFunction.useprob expects a passed along probability per row in otherData and rejects with this probability, while validateFunction.default does the same, but always with probability 0.5.

List with two items

`predicted`	`data.frame` or `numdfr` (dependent on the `dfr` that was used in the original call to `GLoMo` that holds the sampled data
`glomorowsused`	vector that holds 1 item per row in `predicted`, holding which rowindex in `glomo` was used (i.e. which cell was sampled)

The non-allrows version works only for 1 row at a time.

Nick Sabbe (nick.sabbe@ugent.be)

GLoMo-package, NumDfr, predict

iris.md<-randomNA(iris, 0.1)
iris.md.nd<-numdfr(iris.md)
iris.nd.rnd<-rCatsAndCntInDfr(iris.md.nd, orgriName=NULL, verbosity=1)
iris.weights<-iris.nd.rnd$weights
iris.nd.rnd<-iris.nd.rnd[,1:5]
iris.glomo<-GLoMo(iris.nd.rnd, weights=iris.weights, verbosity=1)
iris.pred.cond<-predict.conditional.allrows.GLoMo(iris.glomo, nobs=5, dfr=iris.md.nd, verbosity=10)