Description Usage Arguments Details Value Note References See Also Examples
View source: R/test_features.R
Performs a feature selection on positioned n-gram data using a Fisher's permutation test.
1 2 3 4 5 6 7 8 9 | test_features(
target,
features,
criterion = "ig",
adjust = "BH",
threshold = 1,
quick = TRUE,
times = 1e+05
)
|
target |
|
features |
|
criterion |
criterion used in permutation test. See Details for the list of possible criterions. |
adjust |
name of p-value adjustment method. See |
threshold |
|
quick |
|
times |
number of times procedure should be repeated. Ignored if |
Since the procedure involves multiple testing, it is advisable to use one
of the avaible p-value adjustment methods. Such methods can be used directly by
specifying the adjust
parameter.
Available criterions:
Information Gain: calc_ig
.
Kullback-Leibler divergence: calc_kl
.
Chi-squared-based measure: calc_cs
.
an object of class feature_test
.
Both target
and features
must be binary, i.e. contain only 0
and 1 values.
Features occuring too often and too rarely are considered not informative and may be removed using the threshold parameter.
Radivojac P, Obradovic Z, Dunker AK, Vucetic S, Feature selection filters based on the permutation test in Machine Learning: ECML 2004, 15th European Conference on Machine Learning, Springer, 2004.
binarize
- binarizes input data.
calc_criterion
- computes selected criterion.
distr_crit
- distribution of criterion used in QuiPT.
summary.feature_test
- summary of results.
cut.feature_test
- aggregates test results in groups based on feature's
p-value.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | # significant feature
tar_feat1 <- create_feature_target(10, 390, 0, 600)
# significant feature
tar_feat2 <- create_feature_target(9, 391, 1, 599)
# insignificant feature
tar_feat3 <- create_feature_target(198, 202, 300, 300)
test_res <- test_features(tar_feat1[, 1], cbind(tar_feat1[, 2], tar_feat2[, 2],
tar_feat3[, 2]))
summary(test_res)
cut(test_res)
# real data example
# we will analyze only a subsample of a dataset to make analysis quicker
ids <- c(1L:100, 701L:800)
deg_seqs <- degenerate(human_cleave[ids, 1L:9],
list(`a` = c(1, 6, 8, 10, 11, 18),
`b` = c(2, 5, 13, 14, 16, 17, 19, 20),
`c` = c(3, 4, 7, 9, 12, 15)))
# positioned n-grams example
bigrams_pos <- count_ngrams(deg_seqs, 2, letters[1L:3], pos = TRUE)
test_features(human_cleave[ids, 10], bigrams_pos)
# unpositioned n-grams example, binarization required
bigrams_notpos <- count_ngrams(deg_seqs, 2, letters[1L:3], pos = TRUE)
test_features(human_cleave[ids, 10], binarize(bigrams_notpos))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.