encode_features: Encodes repertoire features with different methods. Gene...
In rodamian/Bindpred: package for antibody specificity and affinity prediction

Description Usage Arguments Value Examples

View source: R/encode_features.R

Encodes repertoire features with different methods. Gene usage features are encoded using one hot encoding irrespective of the encoding method used. Positions of insertions deletions and mutations are provided if not specified otherwise. Sequence features can be encoded using multiple methods.

encode_features(
  features,
  encoding = "onehot",
  unique.sequences = "cdr3s_aa",
  to.use = c("cdr3s_aa", "cdr3s_nt", "aa_sequence_HC", "aa_sequence_LC"),
  filter.corr = 0.9,
  filter.unique.values = 4
)

`features`	List of dataframes containing the extracted features. This is the output of load_data function.
`encoding`	Character indicating which encoding strategy to use. Options are "onehot", "kmer", "protr", "protr.cdr3". To set the kmer size to size 5 use encoding = "5mer" for example. If only "kmer" is given the default size is 3. The default encoding method overall is set to "onehot".
`unique.sequences`	Level at which unique sequences are filtered. Default is c("aa_sequence_HC", "aa_sequence_LC") which keeps every cell with a unique combination of heavy and light chain amino acid sequences. Other options include "clonotype_id", "cdr3s_aa", "aa_sequence_HC".
`to.use`	Character vector indicating which features to use. If not supplied all the features will be used
`filter.corr`	Numeric indicating the minimum number of unique values per feature. Important when calculating the tripeptide composition for short sequences. Default is 3.

Returns encoded features

## Not run: 
check_encode_features <- encode_features(features = output.load_data, encoding = "onehot", unique.sequences = c("aa_sequence_HC", "aa_sequence_LC"),  to.use = NULL)

## End(Not run)