GLoMo: Fit naive General Location Model
In GLoMo: Naive General Location Model

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/GLoMo.r

Fit a naive General Location Model, only supporting cells that are present in the data.

1	GLoMo(dfr, weights = rep(1, dim(dfr)[1]), uniqueIdentifiersPerRow = NULL, separator = ",", pooledCov = TRUE, verbosity = 0, ...)

`dfr`	`data.frame` or `numdfr` to fit the model to
`weights`	vector of weights attributed to each row in `dfr`
`uniqueIdentifiersPerRow`	List of uids (see `categoricalUniqueIdentifiers`) for each row of `dfr`. If not provided, it is calculated.
`separator`	Only relevant if `uniqueIdentifiersPerRow` was not provided. This parameter is then passed on to `categoricalUniqueIdentifiers`.
`pooledCov`	If `TRUE`, the pooled covariance is used. This is relevant because often some if not all cells contain only 1 observation.
`verbosity`	The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)
`...`	Ignored for now

Finds all 'cells' defined by the combinations of categorical variables in dfr, and finds their (weighted) probability. Then it finds the mean per cell for all continuous variables, and a covariance matrix (which is typically pooled, although theoretically it should be unpooled)

List of class "GLoMo". (NOTE: where I write dataset, this could either be a data.frame or numdfr object):

`uid`	character vector: each item is a unique identifier of a cell. These get longer with more factor columns in the dataset
`pihat`	probability of each cell (numerical vector)
`omegahat`	matrix (named) of the continuous columns within cells (note: homoscedastic)
`orgdatadim`	dimensions of the dataset used to create this
`uniqueFactorCombinationsAndContinuousMeans`	for each uid, the matching factor levels + the means in that cell for the continuous columns. Note: the column and row order is the same as the column order in the original dataset
`factorCols`	named vector of column indices of the factor columns
`guidSeparator`	character used as separator in creating the uids.
`invomega`	inverse of omegahat — often used for prediction

The dfr passed is supposed to not contain any NA values!

Further more, in the return value, the order of the columns (in e.g. omegahat and uniqueFactorCombinationsAndContinuousMeans) is the same as in the original dfr.

The length of uid and pihat is the same as the number of rows in uniqueFactorCombinationsAndContinuousMeans (i.e. the number of unique 'cells' in dfr). Their order also matches (i.e. first item of uid matches the first row of uniqueFactorCombinationsAndContinuousMeans etc.)

The number of columns/rows in omegahat and the number of items in factorCols is also the total number of columns in dfr and thus also in uniqueFactorCombinationsAndContinuousMeans.

Nick Sabbe (nick.sabbe@ugent.be)

"Statistical Analysis with Missing Values"

GLoMo-package, NumDfr, predict, GLoMo-class

iris.md<-randomNA(iris, 0.1)
iris.md.nd<-numdfr(iris.md)
iris.nd.rnd<-rCatsAndCntInDfr(iris.md.nd, orgriName=NULL, verbosity=1)
iris.weights<-iris.nd.rnd$weights
iris.nd.rnd<-iris.nd.rnd[,1:5]
iris.glomo<-GLoMo(iris.nd.rnd, weights=iris.weights, verbosity=1)