# splitMedian: Discretise continuous data in multiple granularities In ecpc: Flexible Co-Data Learning for High-Dimensional Prediction

## Description

Discretise continuous co-data by making groups of covariates of various size. The first group is the group with all covariates. Each group is then recursively split in two at the median co-data value, until some user-specified minimum group size is reached. The discretised groups are used for adaptive discretisation of continuous co-data.

## Usage

 ```1 2``` ```splitMedian(values, index=NULL, depth=NULL, minGroupSize = 50, first = TRUE, split = c("both","lower","higher")) ```

## Arguments

 `values` Vector with the continuous co-data values to be discretised. `index` Index of the covariates corresponding to the values supplied. Useful if part of the continuous co-data is missing and only the non-missing part should be discretised. `depth` (optional): if given, a discretisation is returned with 'depth' levels of granularity. `minGroupSize` Minimum group size that each group of covariates should have. `split` "both", "lower" or "higher": should both split groups of covariates be further split, or only the group of covariates that corresponds to the lower or higher continuous co-data group? `first` Do not change, recursion help variable.

## Value

A list with groups of covariates, which may be used as group set in ecpc.

## See Also

Use `obtainHierarchy` to obtain a group set on group level defining the hierarchy for adaptive discretisation of continuous co-data.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11``` ```cont.codata <- seq(0,1,length.out=20) #continuous co-data #full tree with minimum group size 5 groupset1 <- splitMedian(values=cont.codata,minGroupSize=5) #only split at lower continous co-data group groupset2 <- splitMedian(values=cont.codata,split="lower",minGroupSize=5) part <- sample(1:length(cont.codata),15) #discretise only for a part of the continuous co-data cont.codata[-part] <- NaN #suppose rest is missing #make group set of non-missing values groupset3 <- splitMedian(values=cont.codata[part],index=part,minGroupSize=5) groupset3 <- c(groupset3,list(which(is.nan(cont.codata)))) #add missing data group ```

ecpc documentation built on May 3, 2021, 9:08 a.m.