searchSeq: Search for a sequence

View source: R/searchSeq.R

searchSeqR Documentation

Search for a sequence

Description

Search for one or more amino acid or junction CDR3 sequences in a study tibble.

Usage

searchSeq(
  study_table,
  sequence,
  seq_type = "junction",
  edit_distance = 0,
  match = "global"
)

findSeq(sequence, query_list, edit_distance, seq_type, match)

Arguments

study_table

A tibble generated by the LymphoSeq2 functions readImmunoSeq() or productiveSeq(). "junction_aa" or "junction", "duplicate_frequency", and "duplicate_count" are required columns.

sequence

A character vector of one ore more amino acid or junction CDR3 sequences to search.

seq_type

A character vector specifying the type of sequences to be searched. Available options are "junction_aa" or "junction".

edit_distance

An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.

match

A string indicating the type of sequence matching to perform. Acceptable values are "global" and "partial". See details below.

query_list

List of productive CDR3 nucleotide or amino acid sequences

Details

An exact partial match means the searched sequence is contained within target sequence. An exact global match means the searched sequence is identical to the target sequence.

Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or junction.

Value

Returns the rows for every instance in the list of data frames where the searched sequence(s) appeared.

Tibble of sequences that differ from the input sequence by the edit distance threshold provided

Functions

  • findSeq(): Find all sequences below edit distance threshold from query list.

Examples

file_path <- system.file("extdata", "TCRB_sequencing",
 package = "LymphoSeq2")
study_table <- LymphoSeq2::readImmunoSeq(path = file_path, threads = 1)
study_table <- LymphoSeq2::topSeqs(study_table, top = 100)
aa1 <- "CASSPVSNEQFF"
aa2 <- "CASSQEVPPYQAFF"
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = aa1,
  seq_type = "junction_aa",
  edit_distance = 0,
  match = "global"
)
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = c(aa1, aa2),
  seq_type = "junction_aa",
  edit_distance = 0,
  match = "global"
)
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = aa1,
  seq_type = "junction_aa",
  edit_distance = 1,
  match = "global"
)
nt <- "CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA"
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = nt,
  seq_type = "junction",
  edit_distance = 3,
  match = "global"
)
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = "CASSPVS",
  seq_type = "junction_aa",
  edit_distance = 0,
  match = "global"
)
LymphoSeq2::searchSeq(
  study_table = study_table,
  sequence = nt,
  seq_type = "junction",
  edit_distance = 0,
  match = "global"
)

shashidhar22/LymphoSeq2 documentation built on Jan. 16, 2024, 4:29 a.m.