classify_data: Classifies the repertoire sequencing data binarily (binding /...

Description Usage Arguments Value Examples

View source: R/classification_models.R

Description

Classifies the repertoire sequencing data binarily (binding / not binding). This can be done both at the clonotype level or at the single cell level.

Usage

1
2
3
4
5
6
7
classify_data(
  features,
  unique.sequences = "cdr3s_aa",
  encoding = "onehot",
  to.use = c("cdr3s_aa", "cdr3s_nt", "aa_sequence_HC", "aa_sequence_LC"),
  cv = 5
)

Arguments

features

List of dataframes containing the extracted features. This is the output of load_data function.

unique.sequences

Names of the sequences to be kept. Default is c("aa_sequence_HC", "aa_sequence_LC") which keeps every cell with a unique combination of heavy and light chain sequences. Other options include "clonotype_id", "cdr3s_aa", "aa_sequence_HC".

encoding

Character indicating which encoding strategy to use. Options are "onehot", "kmer", "protr". To set the kmer size set encoding = "5mer" for size 5. If only "kmer" is given the default size is 3. The default overall is set to "onehot".

to.use

Character vector indicating which features to use. If not supplied all the features will be used

cv

Numeric indicating the number of folds used in cross validation. Default is 5.

Value

This function plots AUC scores for each model, feature importance for XGBoost and return the predicted labels (Binding / not Binding)

Examples

1
2
3
4
## Not run: 
check_classify_data <- classify_data(features = output.load_data, to.use = NULL, unique.sequences = c("aa_sequence_HC", "aa_sequence_LC"), encoding = "onehot")

## End(Not run)

rodamian/Bindpred documentation built on July 29, 2021, 7:29 p.m.