# To carry out a search partition analysis (SPAN)

### Description

To carry out a search partition analysis (SPAN)

### Usage

1 2 |

### Arguments

`formula` |
A formula of the standard form |

`data` |
A data frame with the variables in the formula. |

`weight` |
A frequency weight attached to each row of data. Default, NA, indicates unit weight to each data row. |

`cc` |
Indicates complete case analysis (default FALSE). If TRUE, a row of data is deleted if any one attribute is missing. Otherwise a case is only deleted if any attribute is missing in a Boolean combination, as evaluated during a search. Default FALSE |

`makepos` |
If TRUE, and an attribute is found to be negative, the direction of |

`beta` |
Parameter controlling degree of complexity penalising. Zero for no complexity penalising. NA (default) or negative determines a value for beta automatically as 0.03 times the initial gradient of the compleity hull. |

`size` |
Defines the upper allowable size parameters of a disjunctive normal form used in the initial iteration of a search.
It is a list of length |

`gamma` |
Parameter controlling balance of observations in |

### Details

A function to search for an optimal Boolean combination partition. Optimization is with respect to
reduction in mean square error of `y`

by split into partition *(A,!A)*, or if `y`

is a survival object, with respect to log-rank chi-square for survival differences of *(A,!A)*.
The Boolean expression for *A* is output in normal disjunctive form *A= g_1 | g_2 | g_3 | ...* and
the Boolean expression for the complement *!A* is also output in normal disjunctive form
*!A = h_1 | h_2 | h_3 | ...*. Each element of the disjunctive forms, *g_i* of *A*, or *h_i* of *!A*, of the
represents a subgroup. Subgroups are returned data frames.

If variables `x, u, v, w....`

of the formula are not coded binary, a pre-analysis is done to establish
an optimal cut of the variable. This is done, again with respect to reduction in MSE, or log-rank for a survival formula,
over values of the variable. If numeric,
a dictotomy is made by above/below a cut, the possible cuts being unique values of the variable if there are 20 or fewer,
otherwise at 20 equally spaced intervals. If factor variable, according each value of the factor.

### Value

Object `spanr`

with attributes:

`A`

Data frame of same length as input data that is a binary indicator of belonging to *A*.

`g`

Data frame of same length as input data, columns indicating belonging to the
subgroups of *A*

`h`

Data frame of same length as input data, columns indicating belonging to the
subgroups of *!A*

### Author(s)

Roger Marshall <rj.marshall@auckland.ac.nz>, The University of Auckland, New Zealand

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```
## 1. Simulate Bernoulli binary predictors x1, x2...x10, and outcome y
## For (x1 x2 x3) | (x1 x4) | (x1 x9), make y~N(11,0.5) and N(10,0.5) otherwise.
x <- matrix(data=rbinom(10000,1,0.5),nrow=1000,ncol=10)
colnames(x) <- paste("x", seq(1:10), sep = "")
P <- ifelse((x[,1]& x[,2] & x[,3])|(x[,1] & x[,4])|x[,9] & x[,1], 1,0)
y <- ifelse(P,rnorm(1000,11,0.5),rnorm(1000,10,0.5) )
d <- data.frame(cbind(y,x))
sp <- spanr(formula= y ~ x1 +x2+x3+x4+x5+x6+x7+x8+x9+x10,data=d,size=c(1,2,2),beta=NA)
## 2. Survival analysis of pbc data
library(survival)
data(pbc)
sp <-with(pbc, spanr(formula = Surv(time, status==2) ~ trt + age + sex + ascites
+ hepato + spiders + edema + bili + chol + albumin
+ copper + ast + trig + platelet + protime + stage,
beta=NA,cc=TRUE,gamma=1) )
test <- cbind(pbc,sp$A)
##Kaplan-Meier curves of A versus !A
x <- survfit(Surv(test$time,test$status==2) ~ test$A)
plot(x, col=c(1,2))
``` |