pimformula: Convert formula to pim formula

Description Usage Arguments Details Value Note See Also Examples

Description

Convert formula to pim formula (incorporating L/R and poset)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
pimformula(formula, data, interpretation = c("difference", "regular",
  "marginal", "symmetric"), verbosity = 0, leftsuffix = "_L",
  rightsuffix = "_R", extra.variables = character(), lhs = c("PO", "<",
  "<="), rhsreplacers = list(F = Freplacetext, O = Oreplacetext, L =
  Lreplacetext, R = Rreplacetext), lhsreplacer = LHSreplacetext,
  interactions.difference = (interpretation != "marginal"),
  extra.nicenames = data.frame(org = character(), nice = character(),
  stringsAsFactors = FALSE))

pim.fit.prep(formula, data, blocking.variables = character(),
  poset = t(combn(nrow(data), 2)), leftsuffix = "_L", rightsuffix = "_R",
  interpretation = c("difference", "regular", "marginal", "symmetric"),
  na.action = na.fail, lhs = c("PO", "<", "<="), verbosity = 0,
  nicenames = TRUE, interactions.difference = (interpretation !=
  "marginal"), extra.nicenames = data.frame(org = character(), nice =
  character(), stringsAsFactors = FALSE), check.symmetric = TRUE, link,
  threshold = 1e-06, weights = NULL,
  pseudoweights = pseudoweights.default)

pseudoweights.default(poset, weights)

Arguments

formula

Original formula

data

Context where the formula formula is to be interpreted

interpretation

If "marginal" (not the default) parts of the formula are converted to imply marginal pim modeling (see e.g. Mainreplacetext). If it is "difference", then the design matrix of the PIM is the difference of the design matrices of each part of the pseudo-observations. The default option is "regular", which will interpret unaltered columns as differences. A new option is "symmetric", which works the same as "regular", but will enforce the symmetry condition by making the sign switch when changing the order (typically, this is achieved by subtracting the inverse for each dummy).

verbosity

The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)

leftsuffix, rightsuffix

Suffixes that will be added to the 'left' and 'right' observation's column name in the pseudo-observation. Note: no checking is done that these suffixes are safe, so the wrong suffixes may lead to unexpected behaviour.

extra.variables

Character vector of column names you want to force present in the pseudo-observations

lhs

"PO", "<" or "<=": Unequality used for the lefthandside of the formula. The default ("PO") is the normal probabilistic index.

rhsreplacers

List of functions (see Lreplacetext and others) that will be used to process the right hand side of the formula. Each function should have the same signature as Lreplacetext.

lhsreplacer

Function like LHSreplacetext that will be used to reformat the left hand side of the formula

interactions.difference

If TRUE (note that the default is interpretation!="marginal") interaction terms will be interpreted as the differences of the onesided interaction terms (if this is possible at all). This is unsupported if unsupported if is "marginal". Some special interaction terms with calculated columns may lead to unexpected behaviour.

extra.nicenames

Should be a data.frame containing two character columns: org and nice. For each "constructed" column name, provide a nicer one, that will make the results more readable. You may also use parts of constructed column names. Note: make sure to use stringsAsFactor=FALSE when creating the data.frame.

blocking.variables

Character vector holding column names that hold blocking variables.

poset

Matrix of two columns indicating what the original observation number is for the left and right real observation in the pseudo-observation.

na.action

Defaults to na.fail: handles missing data in data.

nicenames

Defaults to TRUE: try to make the column names more readable.

check.symmetric

Defaults to TRUE: if the model does not support the symmetry condition, a warning is displayed.

link, threshold

See pim: only needed to check the symmetry condition.

weights

Defaults to NULL: vector of weights for every row of data.

pseudoweights

Defaults to pseudoweights.default: function that can convert weights by observation to weights per pseudo-observation. Should have the same signature and outcome as pseudoweights.default.

Details

Main function, doing the actual work. The idea is to convert the formula to text and replace 4(+) kind of "spiced" variables: O(var) gets replaced with I(var_R<var_L) (see Oreplacetext for exact formulation) F(var) gets replaced with Sum I(var_R=i)I(var_L=j) (see Freplacetext for exact formulation) L(var) gets replaced with var_L R(var) gets replaced with var_R var not in any of the above cases gets replaced by either var_R-varL (interpretation!="marginal") or by var_L (interpretation=="marginal")

Some sanity checks are already performed, but not all of them (I guess)

This is the default for the pseudoweights parameter to pim.fit.prep and will simply multiply the weights of each observation to get to the weight of the pseudo-observations.

Value

For pimformula: an object of class "pimformula". The items in this object are:

newformula

The formula containing all suffixed variable names

left.variables

data.frame containing one row for each variable pertained in the "left" observations, and two columns: org and fixed, containing the original name and the suffixed name of each variable.

right.variables

data.frame containing one row for each variable pertained in the "right" observations, and two columns: org and fixed, containing the original name and the suffixed name of each variable.

names

Character vector holding the names for each individual term in the right hand side of the formula. Note: currently this is in no way cleaned up!

full.colnames

Character vector holding the constructed parts in the formula. Should have the same length as nice.colnames

nice.colnames

Character vector holding nicer names the constructed parts in the formula. Should have the same length as full.colnames

For pim.fit.prep: an object of class "pimfitdata". The items in this object are:

X

The design matrix in pseudo-observation space

Y

The pseudo-observations

poset

Matrix of two columns indicating what the original observation number is for the left and right real observation in the pseudo-observation. Note: in some cases this is not the passed in poset, eg when blocks were present.

intercept

Holds TRUE if the formula contains an intercept.

pimformula

Result of pimformula function.

original.colnames

If nicenames was TRUE, this will hold the column names before "nicing up".

weights

Vector of weights for every item in Y or NULL if no weights are to be applied.

For pseudoweights.default: a vector holding one "pseudo-weight", i.e. a weight per pseudo-observation. May be NULL if the incoming weight was as well.

Note

TODO: Should probably disallow using intercept in some cases Also have to consider whether passing in contrasts is relevant/possible

See Also

Lreplacetext

Lreplacetext

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
set.seed(1)
iris$out<-factor(sample(2, nrow(iris), replace=TRUE))
iris$xord<-as.ordered(iris$Species)
pimformula(out~Sepal.Length, data=iris)
pimformula(out~I((R(Sepal.Length) - L(Sepal.Length))/sqrt(R(Sepal.Length) * L(Sepal.Length)) ), data=iris, interpretation="regular")
pimformula(out~O(xord), data=iris, interpretation="regular")
pimformula(out~F(Species), data=iris, interpretation="regular")
set.seed(1)
iris$out<-factor(sample(2, nrow(iris), replace=TRUE))
iris$xord<-as.ordered(iris$Species)
pim.fit.prep(out~Sepal.Length, data=iris)
pim.fit.prep(out~I((R(Sepal.Length) - L(Sepal.Length))/sqrt(R(Sepal.Length) * L(Sepal.Length)) ), data=iris, interpretation="regular")
pim.fit.prep(out~O(xord), data=iris, interpretation="regular")
pim.fit.prep(out~F(Species), data=iris, interpretation="regular")

pimold documentation built on May 2, 2019, 5:50 p.m.