de_novo_TCRs | R Documentation |
De novo generation of cdr3 sequences based on GLIPH or GLIPH2. Based on the position-specific abundance of amino acids in the CDR3 region of the sequences of a GLIPH or GLIPH2 cluster, artificial sequences are simulated as established in Glanville et al.
de_novo_TCRs( convergence_group_tag, result_folder = "", clustering_output = NULL, refdb_beta = "gliph_reference", normalization = FALSE, accept_sequences_with_C_F_start_end = TRUE, sims = 1e+05, num_tops = 1000, min_length = 10, make_figure = FALSE, n_cores = 1 )
convergence_group_tag |
character. Tag of the convergence group that shall be used for prediction. |
result_folder |
character. By default |
clustering_output |
list. By default |
refdb_beta |
character or data frame. By default
|
normalization |
logical. By default |
accept_sequences_with_C_F_start_end |
logical. This logical flag
if |
sims |
numeric. By default 1,000,000. Value of how many de novo cdr3 sequences shall be created. |
num_tops |
numeric. By default 1000. The |
min_length |
Numeric value determining the number of N-terminal positions used for scoring. By default it is set to 10. |
make_figure |
Logical value whether a graph of the |
n_cores |
numeric. Number of cores to use, by default 1. In case of |
This function produces one file in the result_folder
, if specified, named convergence_group_tag
followed by _de_novo.txt)
containing the num_tops
best scoring generated sequences and their corresponding scores.
A list containing this file and additional information will also be returned as follows:
$de_novo_sequences
A data frame containing the num_tops
best scoring generated sequences and their corresponding scores.
$sample_sequences_scores A data frame containing the sequences of the used convergence group and their corresponding scores.
$cdr3_length_probability A data frame with any considered cdr3 length and the probability of occurrence in the convergence group. The distribution of the cdr3 length of all generated sequences resembles this distribution.
$PWM_Scoring A data frame containing the positional weight matrix used for scoring. The columns represent the different amino acids and the rows represent the position relative to the N-terminus.
$PWM_Prediction A list of data frames containing the positional weight matrix for any considered cdr3 length used for generation of new sequences. The columns represent the different amino acids and the rows represent the position relative to the N-terminus.
Glanville, Jacob, et al. "Identifying specificity groups in the T cell receptor repertoire." Nature 547.7661 (2017): 94.
https://github.com/immunoengineer/gliph
utils::data("gliph_input_data") res <- turbo_gliph(cdr3_sequences = gliph_input_data[base::seq_len(200),], sim_depth = 100, n_cores = 1) new_seqs <- de_novo_TCRs(convergence_group_tag = res$cluster_properties$tag[1], clustering_output = res, sims = 10000, make_figure = TRUE, n_cores = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.