Bindpred: package for antibody specificity and affinity prediction

Description Usage Arguments Value Examples

View source: R/classification_models.R

Classifies the repertoire sequencing data binarily (binding / not binding). This can be done both at the clonotype level or at the single cell level.

classify_data(
  features,
  unique.sequences = "cdr3s_aa",
  encoding = "onehot",
  to.use = c("cdr3s_aa", "cdr3s_nt", "aa_sequence_HC", "aa_sequence_LC"),
  cv = 5
)

`features`	List of dataframes containing the extracted features. This is the output of load_data function.
`unique.sequences`	Names of the sequences to be kept. Default is c("aa_sequence_HC", "aa_sequence_LC") which keeps every cell with a unique combination of heavy and light chain sequences. Other options include "clonotype_id", "cdr3s_aa", "aa_sequence_HC".
`encoding`	Character indicating which encoding strategy to use. Options are "onehot", "kmer", "protr". To set the kmer size set encoding = "5mer" for size 5. If only "kmer" is given the default size is 3. The default overall is set to "onehot".
`to.use`	Character vector indicating which features to use. If not supplied all the features will be used
`cv`	Numeric indicating the number of folds used in cross validation. Default is 5.

This function plots AUC scores for each model, feature importance for XGBoost and return the predicted labels (Binding / not Binding)

## Not run: 
check_classify_data <- classify_data(features = output.load_data, to.use = NULL, unique.sequences = c("aa_sequence_HC", "aa_sequence_LC"), encoding = "onehot")

## End(Not run)