pim: Fitting a Probabilistic Index Model

Description Usage Arguments Details Value The enhanced formula interface See Also Examples

View source: R/pim.R

Description

This function fits a probabilistic index model, also known as PIM. It can be used to fit standard PIMs, as well as many different flavours of models that can be reformulated as a pim. The most general models are implemented, but the flexible formula interface allows you to specify a wide variety of different models.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
pim(
  formula,
  data,
  link = c("logit", "probit", "identity"),
  compare = if (model == "marginal") "all" else "unique",
  model = c("difference", "marginal", "regular", "customized"),
  na.action = getOption("na.action"),
  weights = NULL,
  keep.data = FALSE,
  ...
)

Arguments

formula

An object of class formula (or one that can be coerced to that class): A symbolic description of the model to be fitted. The details of model specification are given under 'Details'.

data

an optional data frame, list or environment that contains the variables in the model. Objects that can be coerced by as.data.frame can be used too.

link

a character vector with a single value that determines the used link function. Possible values are "logit", "probit" and "identity". The default is "logit".

compare

a character vector with a single value that describes how the model compares observations. It can take the values "unique" or "all". Alternatively you can pass a matrix with two columns. Each row represents the rownumbers in the original data frame that should be compared to eachother. See Details.

model

a single character value with possible values "difference" (the default), "marginal", "regular" or "customized". If the formula indicates a customized model (by the use of L() or R()), this parameter is set automatically to "customized". Currently, only the options "difference", "marginal" and "customized" are implemented.

na.action

the name of a function which indicates what should happen when the data contains NAs. The default is set by the na.action setting of options, and is na.fail when unset.

weights

Currently not implemented.

keep.data

a logical value indicating whether the model matrix should be saved in the object. Defaults to FALSE. See Details.

...

extra parameters sent to pim.fit

Details

PIMs are based on a set of pseudo-observations constructed from the comparison between a range of possible combinations of 2 observations. We call the set of pseudo observations poset in the context of this package.

By default, this poset takes every unique combination of 2 observations (compare = "unique"). You can either use a character value, or use a matrix or list to identify the set of observation pairs that have to be used as pseudo-observations. Note that the matrix and list should be either nameless, or have the (col)names 'L' and 'R'. If any other names are used, these are ignored and the first column/element is considered to be 'L'. See also new.pim.poset.

It's possible to store the model matrix and psuedo responses in the resulting object. By default this is not done (keep.data = FALSE) as this is less burden on the memory and the pim.formula object contains all information to reconstruct both the model matrix and the pseudo responses. If either the model matrix or the pseudo responses are needed for further calculations, setting keep.data to TRUE might reduce calculation time for these further calculations.

Value

An object of class pim. See pim-class for more information.

The enhanced formula interface

In case you want to fit a standard PIM, you can specify the model in mostly the same way as for lm. There's one important difference: a PIM has by default no intercept. To add an intercept, use + 1 in the formula.

Next to this, you can use the functions L and R in a formula to indicate which part of the poset you refer to. Remember a poset is essentially a matrix-like object with indices refering to the pseudo-observations. Using L() and R() you can define exactly how the pseudo-observations fit in the model. Keep in mind that any calculation done with these functions, has to be wrapped in a call to I(), just like you would do in any other formula interface.

You don't have to specify the model though. If you choose the option model = 'difference', every variable in the formula will be interpreted as I(R(x) - L(x)). If you use the option model = 'marginal', every variable will be interpreted as R(X).

If you don't specify any special function (i.e. L, R, P or PO), the lefthand side of the formula is defined as PO(y). The function PO calculates pseudo observations; it is 1 if the value of the dependent variable for the observation from the L-poset is smaller than, 0 if it is larger than and 0.5 if it is equal to the value for value from the R-poset (see also PO)

See Also

pim-class for more information on the returned object, pim.fit for more information on the fitting itself, pim-getters, coef, confint, vcov etc for how to extract information like coefficients, variance-covariance matrix, ..., summary for some tests on the coefficients.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
data('FEVData')
# The most basic way to use the function
Model <- pim(FEV~ Smoke*Sex , data=FEVData)

# A model with intercept
# The argument xscalm is passed to nleqslv via pim.fit and estimator.nleqslv
# By constructing the estimator functions wisely, you can control most of
# the fitting process from the pim() function.
data('EngelData')
Model2 <- pim(foodexp ~ income + 1, data=EngelData,
   compare="all",
   xscalm = 'auto')

# A marginal model
# It makes sense to use the identity link in combination with the
# score estimator for the variance-covariance matrix
data('DysData')
Model3 <- pim(SPC_D2 ~ out, data = DysData,
  model = 'marginal', link = 'identity',
  vcov.estim = score.vcov)

# A Model using logical comparisons, this is also possible!
# Model the chance that both observations have a different
# outcome in function of whether they had a different Chemo treatment
Model6 <- pim(P(L(out) != R(out)) ~ I(L(Chemo) != R(Chemo)),
   data=DysData,
   compare="all")

# Implementation of the friedman test in the context of a pim
# warpbreaks data where we consider tension as a block
# To do so, you provide the argument compare with a custom
# set of comparisons
data(warpbreaks)
wb <- aggregate(warpbreaks$breaks,
                by = list(w = warpbreaks$wool,
                          t = warpbreaks$tension),
                FUN = mean)
comp <- expand.grid(1:nrow(wb), 1:nrow(wb))
comp <- comp[wb$t[comp[,1]] == wb$t[comp[,2]],] # only compare within blocks
m <- pim(x ~ w, data = wb, compare = comp, link = "identity",  vcov.estim = score.vcov)
summary(m)
friedman.test(x ~ w | t, data = wb)
## Not run: 
# This illustrates how a standard model is actually built in a pim contex
Model4 <- pim(PO(L(Height),R(Height)) ~ I(R(Age) - L(Age)) + I(R(Sex) - L(Sex)),
data=FEVData,
estim = "estimator.BB")
# is the same as
Model5 <- pim(Height ~ Age + Sex, data = FEVData, estim = "estimator.BB")
summary(Model4)
summary(Model5)

## End(Not run)

Example output

Loading pim version 2.0.2.
  If you want to try out the code from the original publications
  on probabilistic index models, please install the package 'pimold'
  from R-Forge. You can use following command:
  install.packages('pimold', repos = 'http://R-Forge.R-project.org')


pim.summary of following model : 
 x ~ w
Type:  difference 
Link:  identity 


   Estimate Std. Error z value Pr(>|z|)
wB  -0.1667     0.2887  -0.577    0.564

Null hypothesis: b = 0 

	Friedman rank sum test

data:  x and w and t
Friedman chi-squared = 0.33333, df = 1, p-value = 0.5637

  Successful convergence.
  Successful convergence.
pim.summary of following model : 
 PO(L(Height), R(Height)) ~ I(R(Age) - L(Age)) + I(R(Sex) - L(Sex))
Type:  difference 
Link:  logit 


                   Estimate Std. Error z value Pr(>|z|)    
I(R(Age) - L(Age))  0.58317    0.03325  17.541  < 2e-16 ***
I(R(Sex) - L(Sex))  0.58043    0.10250   5.663 1.49e-08 ***
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Null hypothesis: b = 0 
pim.summary of following model : 
 Height ~ Age + Sex
Type:  difference 
Link:  logit 


    Estimate Std. Error z value Pr(>|z|)    
Age  0.58317    0.03325  17.541  < 2e-16 ***
Sex  0.58043    0.10250   5.663 1.49e-08 ***
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Null hypothesis: b = 0 

pim documentation built on March 26, 2020, 7:57 p.m.