Fit naive General Location Model

Description

Fit a naive General Location Model, only supporting cells that are present in the data.

Usage

1
GLoMo(dfr, weights = rep(1, dim(dfr)[1]), uniqueIdentifiersPerRow = NULL, separator = ",", pooledCov = TRUE, verbosity = 0, ...)

Arguments

dfr

data.frame or numdfr to fit the model to

weights

vector of weights attributed to each row in dfr

uniqueIdentifiersPerRow

List of uids (see categoricalUniqueIdentifiers) for each row of dfr. If not provided, it is calculated.

separator

Only relevant if uniqueIdentifiersPerRow was not provided. This parameter is then passed on to categoricalUniqueIdentifiers.

pooledCov

If TRUE, the pooled covariance is used. This is relevant because often some if not all cells contain only 1 observation.

verbosity

The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)

...

Ignored for now

Details

Finds all 'cells' defined by the combinations of categorical variables in dfr, and finds their (weighted) probability. Then it finds the mean per cell for all continuous variables, and a covariance matrix (which is typically pooled, although theoretically it should be unpooled)

Value

List of class "GLoMo". (NOTE: where I write dataset, this could either be a data.frame or numdfr object):

uid

character vector: each item is a unique identifier of a cell. These get longer with more factor columns in the dataset

pihat

probability of each cell (numerical vector)

omegahat

matrix (named) of the continuous columns within cells (note: homoscedastic)

orgdatadim

dimensions of the dataset used to create this

uniqueFactorCombinationsAndContinuousMeans

for each uid, the matching factor levels + the means in that cell for the continuous columns. Note: the column and row order is the same as the column order in the original dataset

factorCols

named vector of column indices of the factor columns

guidSeparator

character used as separator in creating the uids.

invomega

inverse of omegahat — often used for prediction

Note

The dfr passed is supposed to not contain any NA values!

Further more, in the return value, the order of the columns (in e.g. omegahat and uniqueFactorCombinationsAndContinuousMeans) is the same as in the original dfr.

The length of uid and pihat is the same as the number of rows in uniqueFactorCombinationsAndContinuousMeans (i.e. the number of unique 'cells' in dfr). Their order also matches (i.e. first item of uid matches the first row of uniqueFactorCombinationsAndContinuousMeans etc.)

The number of columns/rows in omegahat and the number of items in factorCols is also the total number of columns in dfr and thus also in uniqueFactorCombinationsAndContinuousMeans.

Author(s)

Nick Sabbe (nick.sabbe@ugent.be)

References

"Statistical Analysis with Missing Values"

See Also

GLoMo-package, NumDfr, predict, GLoMo-class

Examples

1
2
3
4
5
6
iris.md<-randomNA(iris, 0.1)
iris.md.nd<-numdfr(iris.md)
iris.nd.rnd<-rCatsAndCntInDfr(iris.md.nd, orgriName=NULL, verbosity=1)
iris.weights<-iris.nd.rnd$weights
iris.nd.rnd<-iris.nd.rnd[,1:5]
iris.glomo<-GLoMo(iris.nd.rnd, weights=iris.weights, verbosity=1)