To carry out a search partition analysis (SPAN)

1 2 |

`formula` |
A formula of the standard form |

`data` |
A data frame with the variables in the formula. |

`weight` |
A frequency weight attached to each row of data. Default, NA, indicates unit weight to each data row. |

`cc` |
Indicates complete case analysis (default FALSE). If TRUE, a row of data is deleted if any one attribute is missing. Otherwise a case is only deleted if any attribute is missing in a Boolean combination, as evaluated during a search. Default FALSE |

`makepos` |
If TRUE, and an attribute is found to be negative, the direction of |

`beta` |
Parameter controlling degree of complexity penalising. Zero for no complexity penalising. NA (default) or negative determines a value for beta automatically as 0.03 times the initial gradient of the compleity hull. |

`size` |
Defines the upper allowable size parameters of a disjunctive normal form used in the initial iteration of a search.
It is a list of length |

`gamma` |
Parameter controlling balance of observations in |

A function to search for an optimal Boolean combination partition. Optimization is with respect to
reduction in mean square error of `y`

by split into partition *(A,!A)*, or if `y`

is a survival object, with respect to log-rank chi-square for survival differences of *(A,!A)*.
The Boolean expression for *A* is output in normal disjunctive form *A= g_1 | g_2 | g_3 | ...* and
the Boolean expression for the complement *!A* is also output in normal disjunctive form
*!A = h_1 | h_2 | h_3 | ...*. Each element of the disjunctive forms, *g_i* of *A*, or *h_i* of *!A*, of the
represents a subgroup. Subgroups are returned data frames.

If variables `x, u, v, w....`

of the formula are not coded binary, a pre-analysis is done to establish
an optimal cut of the variable. This is done, again with respect to reduction in MSE, or log-rank for a survival formula,
over values of the variable. If numeric,
a dictotomy is made by above/below a cut, the possible cuts being unique values of the variable if there are 20 or fewer,
otherwise at 20 equally spaced intervals. If factor variable, according each value of the factor.

Object `spanr`

with attributes:

`A`

Data frame of same length as input data that is a binary indicator of belonging to *A*.

`g`

Data frame of same length as input data, columns indicating belonging to the
subgroups of *A*

`h`

Data frame of same length as input data, columns indicating belonging to the
subgroups of *!A*

Roger Marshall <rj.marshall@auckland.ac.nz>, The University of Auckland, New Zealand

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```
## 1. Simulate Bernoulli binary predictors x1, x2...x10, and outcome y
## For (x1 x2 x3) | (x1 x4) | (x1 x9), make y~N(11,0.5) and N(10,0.5) otherwise.
x <- matrix(data=rbinom(10000,1,0.5),nrow=1000,ncol=10)
colnames(x) <- paste("x", seq(1:10), sep = "")
P <- ifelse((x[,1]& x[,2] & x[,3])|(x[,1] & x[,4])|x[,9] & x[,1], 1,0)
y <- ifelse(P,rnorm(1000,11,0.5),rnorm(1000,10,0.5) )
d <- data.frame(cbind(y,x))
sp <- spanr(formula= y ~ x1 +x2+x3+x4+x5+x6+x7+x8+x9+x10,data=d,size=c(1,2,2),beta=NA)
## 2. Survival analysis of pbc data
library(survival)
data(pbc)
sp <-with(pbc, spanr(formula = Surv(time, status==2) ~ trt + age + sex + ascites
+ hepato + spiders + edema + bili + chol + albumin
+ copper + ast + trig + platelet + protime + stage,
beta=NA,cc=TRUE,gamma=1) )
test <- cbind(pbc,sp$A)
##Kaplan-Meier curves of A versus !A
x <- survfit(Surv(test$time,test$status==2) ~ test$A)
plot(x, col=c(1,2))
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.