encode_features: Encodes repertoire features with different methods. Gene...

Description Usage Arguments Value Examples

View source: R/encode_features.R

Description

Encodes repertoire features with different methods. Gene usage features are encoded using one hot encoding irrespective of the encoding method used. Positions of insertions deletions and mutations are provided if not specified otherwise. Sequence features can be encoded using multiple methods.

Usage

1
2
3
4
5
6
7
8
encode_features(
  features,
  encoding = "onehot",
  unique.sequences = "cdr3s_aa",
  to.use = c("cdr3s_aa", "cdr3s_nt", "aa_sequence_HC", "aa_sequence_LC"),
  filter.corr = 0.9,
  filter.unique.values = 4
)

Arguments

features

List of dataframes containing the extracted features. This is the output of load_data function.

encoding

Character indicating which encoding strategy to use. Options are "onehot", "kmer", "protr", "protr.cdr3". To set the kmer size to size 5 use encoding = "5mer" for example. If only "kmer" is given the default size is 3. The default encoding method overall is set to "onehot".

unique.sequences

Level at which unique sequences are filtered. Default is c("aa_sequence_HC", "aa_sequence_LC") which keeps every cell with a unique combination of heavy and light chain amino acid sequences. Other options include "clonotype_id", "cdr3s_aa", "aa_sequence_HC".

to.use

Character vector indicating which features to use. If not supplied all the features will be used

filter.corr

Numeric indicating the minimum number of unique values per feature. Important when calculating the tripeptide composition for short sequences. Default is 3.

Value

Returns encoded features

Examples

1
2
3
4
## Not run: 
check_encode_features <- encode_features(features = output.load_data, encoding = "onehot", unique.sequences = c("aa_sequence_HC", "aa_sequence_LC"),  to.use = NULL)

## End(Not run)

rodamian/Bindpred documentation built on July 29, 2021, 7:29 p.m.