Various linear models are fitted to the training samples using lars method. The models differ in the number of features and each is validated by validating samples. A score is also assigned to each feature based on the tendency of LASSO in including that feature in the models.

1 2 3 | ```
train.doctor(F_, L_, training.samples, validating.samples, considered.features,
maximum.features.num, balance = TRUE, return_linear.models = TRUE,
report.fitting.failure = FALSE)
``` |

`F_` |
The feature matrix, each column is a feature. |

`L_` |
The vector of labels named according to the rows of F. |

`training.samples` |
The names of rows of F that should be considered as training samples. |

`validating.samples` |
The names of rows of F that should be considered as validating samples. |

`considered.features` |
The names of columns of F that determine the features of interest. |

`maximum.features.num` |
Upto this number of features are allowed to contribute to each linear model. |

`balance` |
If TRUE, the cases will be balanced for the same number of positive vs. negatives by oversampling before fitting the linear model. |

`return_linear.models` |
The models are memory intensive, so for if they more than 1000, we may decide to ignore them to prevent memory outage. |

`report.fitting.failure` |
If TRUE, any failure in fitting the linear of logistic models will be printed. |

See the reference for more details.

Returns a list of:

`linear.models` |
The result of model fitting computed by lars(). |

`best.number.of.features` |
According to best accuracy. |

`probabilities` |
The best computed logistic score. |

`accuracy` |
The best F-measure. |

`best.logistic.cof` |
According to best accuracy. |

`contribution.to.feature.scores` |
This vector should be added to the total feature scores. |

`contribution.to.feature.scores.frequency ` |
This vector should be added to the total frequency of features. |

`training.samples` |
Input, the names of rows of F that should be considered as training samples. |

`validating.samples` |
Input, the names of rows of F that should be considered as validating samples. |

`precision` |
Ratio of number of true positives to predicted positives. |

`recall` |
Ratio of number of true positives to real positives. |

`selected.features.sequence` |
A list of sets of features which are selected in different models. |

`global.errors` |
A vector of global error of the linear fits. |

`features.with.best.global.error` |
A vector of names of good features in terms of global error of linear fits. |

Logistic regression is also done on top of fitting the linear models.

Habil Zare

"Statistical Analysis of Overfitting Features", manuscript in preparation.

`FeaLect`

, `train.doctor`

, `doctor.validate`

,
`random.subset`

, `compute.balanced`

,`compute.logistic.score`

,
`ignore.redundant`

, `input.check.FeaLect`

1 2 3 4 5 6 7 8 9 10 11 | ```
library(FeaLect)
data(mcl_sll)
F <- as.matrix(mcl_sll[ ,-1]) # The Feature matrix
L <- as.numeric(mcl_sll[ ,1]) # The labels
names(L) <- rownames(F)
message(dim(F)[1], " samples and ",dim(F)[2], " features.")
all.samples <- rownames(F); ts <- all.samples[5:10]; vs <- all.samples[c(1,22)]
doctor <- train.doctor(F_=F, L_=L, training.samples=ts, validating.samples=vs,
considered.features=colnames(F), maximum.features.num=10)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.