# survivalFS: Logic Feature Selection for Survival Data In logicFS: Identification of SNP Interactions

## Description

Identification of interactions of binary variables associated with survival time using logic regression.

## Usage

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13``` ```## Default S3 method: survivalFS(x, y, B = 20, replace = FALSE, sub.frac = 0.632, score = c("DPO", "Conc", "Brier", "PL"), addMatImp = TRUE, adjusted = FALSE, neighbor = NULL, ensemble = FALSE, rand = NULL, ...) ## S3 method for class 'formula' survivalFS(formula, data, recdom = TRUE, ...) ## S3 method for class 'logicBagg' survivalFS(x, score = c("DPO", "Conc", "Brier", "PL"), adjusted = FALSE, neighbor = NULL, ensemble = FALSE, addMatImp = TRUE, rand = NULL, ...) ```

## Arguments

 `x` a matrix consisting of 0's and 1's. Alternatively, `x` can also be an object of class `logicBagg`, i.e. the output of `logic.bagging`. If a matrix, each column must correspond to a binary variable and each row to an observation. Missing values are not allowed. `y` a vector of class `Surv` specifying the right-censored survival time for all observations represented in `x`, where no missing values are allowed in `y`. This vector can, e.g., be generated using the function `Surv` from the `R` package `survival`. `B` an integer specifying the number of iterations. `replace` should sampling of the cases be done with replacement? If `TRUE`, a Bootstrap sample of size `length(y)` is drawn from the `length(y)` observations in each of the `B` iterations. If `FALSE`, `ceiling(sub.frac * length(y))` of the observations are drawn without replacement in each iteration. `sub.frac` a proportion specifying the fraction of the observations that are used in each iteration to build a classification rule if `replace = FALSE`. Ignored if `replace = TRUE`. `score` a character string naming the score that should be used in the computation of the importance measure for a survival time analysis. By default, the distance between predicted outcomes (`score = "DPO"`) proposed by Tietz et al.\ (2018) is used in the determination of the importance of the variables. Alternatively, Harrell's C-Index (`"Conc"`), the Brier score (`"Brier"`), or the predictive partial log-likelihood (`"PL"`) can be used. `addMatImp` should the matrix containing the improvements due to the prime implicants in each of the iterations be added to the output if `ensemble = FALSE`? (For each of the prime implicants, the importance is computed by the average over the `B` improvements.) If `ensemble = TRUE` and `addMatImp = TRUE`, the respective score of the full model is added to the output instead of an improvement matrix. `adjusted` logical specifying whether the measures should be adjusted for noise. Often, the interaction actually associated with the response is not exactly found in some iterations of logic bagging, but an interaction is identified that additionally contains one (or seldomly more) noise SNPs. If `adjusted` is set to `TRUE`, the values of the importance measure is corrected for this behaviour. `neighbor` a list consisting of character vectors specifying SNPs that are in LD. If specified, all SNPs need to occur exactly one time in this list. If specified, the importance measures are adjusted for LD by considering the SNPs within a LD block as exchangable. `ensemble` in the case of a survival outcome, should `ensemble` importance measures (as, e.g., in `randomSurvivalSRC` be used? If `FALSE`, importance measures analogous to the ones in the logicFS analysis of other outcomes are used (see Tietz et al., 2018). `rand` numeric value. If specified, the random number generator will be set into a reproducible state. `formula` an object of class `formula` describing the model that should be fitted. `data` a data frame containing the variables in the model. Each row of `data` must correspond to an observation, and each column to a binary variable (coded by 0 and 1) or a factor (for details, see `recdom`) except for the column comprising the response, where no missing values are allowed in `data`. The response must be an object of class `Surv`. `recdom` a logical value or vector of length `ncol(data)` comprising whether a SNP should be transformed into two binary dummy variables coding for a recessive and a dominant effect. If `recdom` is `TRUE` (and a logical value), then all factors/variables with three levels will be coded by two dummy variables as described in `make.snp.dummy`. Each level of each of the other factors (also factors specifying a SNP that shows only two genotypes) is coded by one indicator variable. If `recdom` is`FALSE` (and a logical value), each level of each factor is coded by an indicator variable. If `recdom` is a logical vector, all factors corresponding to an entry in `recdom` that is `TRUE` are assumed to be SNPs and transformed into two binary variables as described above. All variables corresponding to entries of `recdom` that are `TRUE` (no matter whether `recdom` is a vector or a value) must be coded either by the integers 1 (coding for the homozygous reference genotype), 2 (heterozygous), and 3 (homozygous variant), or alternatively by the number of minor alleles, i.e. 0, 1, and 2, where no mixing of the two coding schemes is allowed. Thus, it is not allowed that some SNPs are coded by 1, 2, and 3, and others are coded by 0, 1, and 2. `...` further arguments of `logicFS`. Ignored, if `x` is an object of class `logicBagg`.

## Value

An object of class `logicFS` containing

 `primes` the prime implicants, `vim` the importance of the prime implicants, `prop` the proportion of logic regression models containing the prime implicants, (or the neighbors of the prime implicants, if `neighbor != NULL`; or the extended primes of the prime implicants, if `adjusted = TRUE`; or the extended primes of the neighbors of the prime implicants, if `neighbor != NULL` and `adjusted = TRUE`), `type` the type of model (1: classification, 2: linear regression, 3: logistic regression, 4: Cox regression), `param` further parameters (if `addInfo = TRUE`), `mat.imp` either the matrix containing the improvements if `addMatImp = TRUE` and `ensemble = FALSE`, or the respective score of the full model if `addMatImp = TRUE` and `ensemble = TRUE`, or `NULL` if `addMatImp = FALSE`, `measure` the name of the used importance measure, `neighbor` `neighbor`, `useN` the value of `useN`, `threshold` NULL, `mu` NULL.

## Author(s)

Tobias Tietz, tobias.tietz@hhu.de

## References

Tietz, T., Selinski, S., Golka, K., Hengstler, J.G., Gripp, S., Ickstadt, K., Ruczinski, I., Schwender, H. (2018). Identification of Interactions of Binary Variables Associated with Survival Time Using survivalFS. Submitted.

`logicFS`, `logic.bagging`