Description Usage Arguments Value Examples
View source: R/classification_models.R
Classifies the repertoire sequencing data binarily (binding / not binding). This can be done both at the clonotype level or at the single cell level.
1 2 3 4 5 6 7 | classify_data(
features,
unique.sequences = "cdr3s_aa",
encoding = "onehot",
to.use = c("cdr3s_aa", "cdr3s_nt", "aa_sequence_HC", "aa_sequence_LC"),
cv = 5
)
|
features |
List of dataframes containing the extracted features. This is the output of load_data function. |
unique.sequences |
Names of the sequences to be kept. Default is c("aa_sequence_HC", "aa_sequence_LC") which keeps every cell with a unique combination of heavy and light chain sequences. Other options include "clonotype_id", "cdr3s_aa", "aa_sequence_HC". |
encoding |
Character indicating which encoding strategy to use. Options are "onehot", "kmer", "protr". To set the kmer size set encoding = "5mer" for size 5. If only "kmer" is given the default size is 3. The default overall is set to "onehot". |
to.use |
Character vector indicating which features to use. If not supplied all the features will be used |
cv |
Numeric indicating the number of folds used in cross validation. Default is 5. |
This function plots AUC scores for each model, feature importance for XGBoost and return the predicted labels (Binding / not Binding)
1 2 3 4 | ## Not run:
check_classify_data <- classify_data(features = output.load_data, to.use = NULL, unique.sequences = c("aa_sequence_HC", "aa_sequence_LC"), encoding = "onehot")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.