Description Usage Arguments Value
View source: R/kmer_random_forest.R
Kmer "selection" performed in one (if kmer candidates are supplied) or two steps, the first step looks at odds ratios and p-values in a fisher-test (see 'kmer_freq') to categorize kmers as being significantly associated with mutation probability. These kmers together with trinucleotide patterns are incorporated in a random forest model. The mean decrease in gini from this forest together with odds-ratios and p-values from the fisher test can be used to estimate kmer importance
1 2 3 4 5 6 7 8 9 10 11 | kmer_random_forest(
dataset,
ks = 5,
kmers = NULL,
pval_cutoff = 0.001,
n_keep = 80,
maxnodes = 20,
cores = NULL,
n_trees = 720,
include_fit = FALSE
)
|
dataset |
Granges object, with a 'sequence.pyr' column containing sequence region and 'mut.pyr' column containing mutations |
ks |
Int. Size of kmers to be used in the model |
kmers |
Character vector of candidate kmers (optional). Note that arguments "ks", "pval_cutoff" and "n_keep" is ignored if candidates are already supplied |
pval_cutoff |
Numeric. Parameter for the fisher test |
n_keep |
Positive Int. Number of kmers to include after preselection |
maxnodes |
Parameter controlling the depth of the desicion trees in the random forest |
cores |
Number of cores to use for parallelization |
n_trees |
Number of trees in forest |
include_fit |
Bool. Include resulting fit from random forest training |
A list containing: (1) MeanDecreaseGini information on kmers and (2) a 3D-array of p-values and odd-ratios of kmers and optionally (3) the random forest fit
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.