Description Usage Arguments Details Value Note Author(s) References See Also Examples

Various linear models are fitted to the training samples using lars method. The models differ in the number of features and each is validated by validating samples. A score is also assigned to each feature based on the tendency of LASSO in including that feature in the models.

1 2 3 | ```
train.doctor(F_, L_, training.samples, validating.samples, considered.features,
maximum.features.num, balance = TRUE, return_linear.models = TRUE,
report.fitting.failure = FALSE)
``` |

`F_` |
The feature matrix, each column is a feature. |

`L_` |
The vector of labels named according to the rows of F. |

`training.samples` |
The names of rows of F that should be considered as training samples. |

`validating.samples` |
The names of rows of F that should be considered as validating samples. |

`considered.features` |
The names of columns of F that determine the features of interest. |

`maximum.features.num` |
Upto this number of features are allowed to contribute to each linear model. |

`balance` |
If TRUE, the cases will be balanced for the same number of positive vs. negatives by oversampling before fitting the linear model. |

`return_linear.models` |
The models are memory intensive, so for if they more than 1000, we may decide to ignore them to prevent memory outage. |

`report.fitting.failure` |
If TRUE, any failure in fitting the linear of logistic models will be printed. |

See the reference for more details.

Returns a list of:

`linear.models` |
The result of model fitting computed by lars(). |

`best.number.of.features` |
According to best accuracy. |

`probabilities` |
The best computed logistic score. |

`accuracy` |
The best F-measure. |

`best.logistic.cof` |
According to best accuracy. |

`contribution.to.feature.scores` |
This vector should be added to the total feature scores. |

`contribution.to.feature.scores.frequency ` |
This vector should be added to the total frequency of features. |

`training.samples` |
Input, the names of rows of F that should be considered as training samples. |

`validating.samples` |
Input, the names of rows of F that should be considered as validating samples. |

`precision` |
Ratio of number of true positives to predicted positives. |

`recall` |
Ratio of number of true positives to real positives. |

`selected.features.sequence` |
A list of sets of features which are selected in different models. |

`global.errors` |
A vector of global error of the linear fits. |

`features.with.best.global.error` |
A vector of names of good features in terms of global error of linear fits. |

Logistic regression is also done on top of fitting the linear models.

Habil Zare

"Statistical Analysis of Overfitting Features", manuscript in preparation.

`FeaLect`

, `train.doctor`

, `doctor.validate`

,
`random.subset`

, `compute.balanced`

,`compute.logistic.score`

,
`ignore.redundant`

, `input.check.FeaLect`

1 2 3 4 5 6 7 8 9 10 11 | ```
library(FeaLect)
data(mcl_sll)
F <- as.matrix(mcl_sll[ ,-1]) # The Feature matrix
L <- as.numeric(mcl_sll[ ,1]) # The labels
names(L) <- rownames(F)
message(dim(F)[1], " samples and ",dim(F)[2], " features.")
all.samples <- rownames(F); ts <- all.samples[5:10]; vs <- all.samples[c(1,22)]
doctor <- train.doctor(F_=F, L_=L, training.samples=ts, validating.samples=vs,
considered.features=colnames(F), maximum.features.num=10)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.