pimformula: Convert formula to pim formula
In pimold: Probabilistic Index Models

Description Usage Arguments Details Value Note See Also Examples

Convert formula to pim formula (incorporating L/R and poset)

pimformula(formula, data, interpretation = c("difference", "regular",
  "marginal", "symmetric"), verbosity = 0, leftsuffix = "_L",
  rightsuffix = "_R", extra.variables = character(), lhs = c("PO", "<",
  "<="), rhsreplacers = list(F = Freplacetext, O = Oreplacetext, L =
  Lreplacetext, R = Rreplacetext), lhsreplacer = LHSreplacetext,
  interactions.difference = (interpretation != "marginal"),
  extra.nicenames = data.frame(org = character(), nice = character(),
  stringsAsFactors = FALSE))

pim.fit.prep(formula, data, blocking.variables = character(),
  poset = t(combn(nrow(data), 2)), leftsuffix = "_L", rightsuffix = "_R",
  interpretation = c("difference", "regular", "marginal", "symmetric"),
  na.action = na.fail, lhs = c("PO", "<", "<="), verbosity = 0,
  nicenames = TRUE, interactions.difference = (interpretation !=
  "marginal"), extra.nicenames = data.frame(org = character(), nice =
  character(), stringsAsFactors = FALSE), check.symmetric = TRUE, link,
  threshold = 1e-06, weights = NULL,
  pseudoweights = pseudoweights.default)

pseudoweights.default(poset, weights)

`formula`	Original formula
`data`	Context where the formula `formula` is to be interpreted
`interpretation`	If `"marginal"` (not the default) parts of the formula are converted to imply marginal pim modeling (see e.g. `Mainreplacetext`). If it is `"difference"`, then the design matrix of the PIM is the difference of the design matrices of each part of the pseudo-observations. The default option is `"regular"`, which will interpret unaltered columns as differences. A new option is `"symmetric"`, which works the same as `"regular"`, but will enforce the symmetry condition by making the sign switch when changing the order (typically, this is achieved by subtracting the inverse for each dummy).
`verbosity`	The higher this value, the more levels of progress and debug information is displayed (note: in R for Windows, turn off buffered output)
`leftsuffix, rightsuffix`	Suffixes that will be added to the 'left' and 'right' observation's column name in the pseudo-observation. Note: no checking is done that these suffixes are safe, so the wrong suffixes may lead to unexpected behaviour.
`extra.variables`	Character vector of column names you want to force present in the pseudo-observations
`lhs`	`"PO"`, `"<"` or `"<="`: Unequality used for the lefthandside of the formula. The default (`"PO"`) is the normal probabilistic index.
`rhsreplacers`	List of functions (see `Lreplacetext` and others) that will be used to process the right hand side of the formula. Each function should have the same signature as `Lreplacetext`.
`lhsreplacer`	Function like `LHSreplacetext` that will be used to reformat the left hand side of the formula
`interactions.difference`	If `TRUE` (note that the default is `interpretation!="marginal"`) interaction terms will be interpreted as the differences of the onesided interaction terms (if this is possible at all). This is unsupported if `unsupported if` is `"marginal"`. Some special interaction terms with calculated columns may lead to unexpected behaviour.
`extra.nicenames`	Should be a `data.frame` containing two character columns: `org` and `nice`. For each "constructed" column name, provide a nicer one, that will make the results more readable. You may also use parts of constructed column names. Note: make sure to use `stringsAsFactor=FALSE` when creating the `data.frame`.
`blocking.variables`	Character vector holding column names that hold blocking variables.
`poset`	Matrix of two columns indicating what the original observation number is for the left and right real observation in the pseudo-observation.
`na.action`	Defaults to `na.fail`: handles missing data in `data`.
`nicenames`	Defaults to `TRUE`: try to make the column names more readable.
`check.symmetric`	Defaults to `TRUE`: if the model does not support the symmetry condition, a warning is displayed.
`link, threshold`	See `pim`: only needed to check the symmetry condition.
`weights`	Defaults to `NULL`: vector of weights for every row of `data`.
`pseudoweights`	Defaults to `pseudoweights.default`: function that can convert weights by observation to weights per pseudo-observation. Should have the same signature and outcome as `pseudoweights.default`.

Main function, doing the actual work. The idea is to convert the formula to text and replace 4(+) kind of "spiced" variables: O(var) gets replaced with I(var_R<var_L) (see Oreplacetext for exact formulation) F(var) gets replaced with Sum I(var_R=i)I(var_L=j) (see Freplacetext for exact formulation) L(var) gets replaced with var_L R(var) gets replaced with var_R var not in any of the above cases gets replaced by either var_R-varL (interpretation!="marginal") or by var_L (interpretation=="marginal")

Some sanity checks are already performed, but not all of them (I guess)

This is the default for the pseudoweights parameter to pim.fit.prep and will simply multiply the weights of each observation to get to the weight of the pseudo-observations.

For pimformula: an object of class "pimformula". The items in this object are:

`newformula`	The formula containing all suffixed variable names
`left.variables`	`data.frame` containing one row for each variable pertained in the "left" observations, and two columns: `org` and `fixed`, containing the original name and the suffixed name of each variable.
`right.variables`	`data.frame` containing one row for each variable pertained in the "right" observations, and two columns: `org` and `fixed`, containing the original name and the suffixed name of each variable.
`names`	Character vector holding the names for each individual term in the right hand side of the formula. Note: currently this is in no way cleaned up!
`full.colnames`	Character vector holding the constructed parts in the formula. Should have the same length as `nice.colnames`
`nice.colnames`	Character vector holding nicer names the constructed parts in the formula. Should have the same length as `full.colnames`

For pim.fit.prep: an object of class "pimfitdata". The items in this object are:

`X`	The design matrix in pseudo-observation space
`Y`	The pseudo-observations
`poset`	Matrix of two columns indicating what the original observation number is for the left and right real observation in the pseudo-observation. Note: in some cases this is not the passed in `poset`, eg when blocks were present.
`intercept`	Holds `TRUE` if the formula contains an intercept.
`pimformula`	Result of `pimformula` function.
`original.colnames`	If `nicenames` was `TRUE`, this will hold the column names before "nicing up".
`weights`	Vector of weights for every item in `Y` or `NULL` if no weights are to be applied.

For pseudoweights.default: a vector holding one "pseudo-weight", i.e. a weight per pseudo-observation. May be NULL if the incoming weight was as well.

TODO: Should probably disallow using intercept in some cases Also have to consider whether passing in contrasts is relevant/possible

Lreplacetext

set.seed(1)
iris$out<-factor(sample(2, nrow(iris), replace=TRUE))
iris$xord<-as.ordered(iris$Species)
pimformula(out~Sepal.Length, data=iris)
pimformula(out~I((R(Sepal.Length) - L(Sepal.Length))/sqrt(R(Sepal.Length) * L(Sepal.Length)) ), data=iris, interpretation="regular")
pimformula(out~O(xord), data=iris, interpretation="regular")
pimformula(out~F(Species), data=iris, interpretation="regular")
set.seed(1)
iris$out<-factor(sample(2, nrow(iris), replace=TRUE))
iris$xord<-as.ordered(iris$Species)
pim.fit.prep(out~Sepal.Length, data=iris)
pim.fit.prep(out~I((R(Sepal.Length) - L(Sepal.Length))/sqrt(R(Sepal.Length) * L(Sepal.Length)) ), data=iris, interpretation="regular")
pim.fit.prep(out~O(xord), data=iris, interpretation="regular")
pim.fit.prep(out~F(Species), data=iris, interpretation="regular")