searchSeq: Search for a sequence

Description Usage Arguments Details Value Examples

View source: R/searchSeq.R

Description

Search for one or more amino acid or nucleotide CDR3 sequences in a list of data frames.

Usage

1
2
searchSeq(list, sequence, type = "aminoAcid", match = "global",
  editDistance = 0)

Arguments

list

A list of data frames generated by the LymphoSeq functions readImmunoSeq or productiveSeq. "aminoAcid" or "nucleotide", "frequencyCount", and "count" are required columns.

sequence

A character vector of one ore more amino acid or nucleotide CDR3 sequences to search.

type

A character vector specifying the type of sequence(s) to be searched. Available options are "aminoAcid" or "nucleotide".

match

A character vector specifying whether an exact partial or exact global match of the searched sequence(s) is desired. Available options are "partial" and "global".

editDistance

An integer giving the minimum edit distance that the sequence must be less than or equal to. See details below.

Details

An exact partial match means the searched sequence is contained within target sequence. An exact global match means the searched sequence is identical to the target sequence.

Edit distance is a way of quantifying how dissimilar two sequences are to one another by counting the minimum number of operations required to transform one sequence into the other. For example, an edit distance of 0 means the sequences are identical and an edit distance of 1 indicates that the sequences different by a single amino acid or nucleotide.

Value

Returns the rows for every instance in the list of data frames where the searched sequence(s) appeared.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
file.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")

file.list <- readImmunoSeq(path = file.path)

aa1 <- "CASSPVSNEQFF"

aa2 <- "CASSQEVPPYQAFF"

searchSeq(list = file.list, sequence = aa1, type = "aminoAcid", 
   match = "global", editDistance = 0)

searchSeq(list = file.list, sequence = c(aa1, aa2), 
   type = "aminoAcid", match = "global", editDistance = 0)

searchSeq(list = file.list, sequence = aa1, type = "aminoAcid", editDistance = 1)

nt <- "CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA"

searchSeq(list = file.list, sequence = nt, type = "nucleotide", editDistance = 3)

searchSeq(list = file.list, sequence = "CASSPVS", type = "aminoAcid", 
   match = "partial", editDistance = 0)

searchSeq(list = file.list, sequence = nt, type = "nucleotide", editDistance = 0)

Example output

Loading required package: LymphoSeqDB
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
Could not detect number of cores, defaulting to 1.

  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |======================================================================| 100%
          sample    aminoAcid
1 TRB_Unsorted_0 CASSPVSNEQFF
2    TRB_CD8_949 CASSPVSNEQFF
                                                                               nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1   822   0.0373755775                    822     TCRBV28     TCRBD02
2     2   0.0000964982                      2     TCRBV28     TCRBD02
  jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
          sample      aminoAcid
1 TRB_Unsorted_0   CASSPVSNEQFF
2 TRB_Unsorted_0 CASSQEVPPYQAFF
3    TRB_CD8_949   CASSPVSNEQFF
                                                                               nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 ATCAATTCCCTGGAGCTTGGTGACTCTGCTGTGTATTTCTGTGCCAGCAGCCAAGAAGTTCCGCCTTACCAAGCTTTCTTTGGACAA
3 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1   822   0.0373755775                    822     TCRBV28     TCRBD02
2   797   0.0363529676                    797     TCRBV03            
3     2   0.0000964982                      2     TCRBV28     TCRBD02
  jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2     TCRBJ01 unresolved unresolved TCRBJ01-01
3     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
          sample foundSequnece searchSequnece editDistance
1 TRB_Unsorted_0  CASSPVSNEQFF   CASSPVSNEQFF            0
2    TRB_CD8_949  CASSPVSNEQFF   CASSPVSNEQFF            0
                                                                               nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1   822   0.0373755775                    822     TCRBV28     TCRBD02
2     2   0.0000964982                      2     TCRBV28     TCRBD02
  jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
          sample
1 TRB_Unsorted_0
2    TRB_CD8_949
                                                                            foundSequnece
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
                                                                           searchSequnece
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  editDistance    aminoAcid count frequencyCount estimatedNumberGenomes
1            3 CASSPVSNEQFF   822   0.0373755775                    822
2            0 CASSPVSNEQFF     2   0.0000964982                      2
  vFamilyName dFamilyName jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBV28     TCRBD02     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2     TCRBV28     TCRBD02     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
          sample    aminoAcid
1 TRB_Unsorted_0 CASSPVSNEQFF
2    TRB_CD8_949 CASSPVSNEQFF
                                                                               nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGCCCCGTGAGCAATGAGCAGTTCTTCGGGCCA
2 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1   822   0.0373755775                    822     TCRBV28     TCRBD02
2     2   0.0000964982                      2     TCRBV28     TCRBD02
  jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
2     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01
       sample    aminoAcid
1 TRB_CD8_949 CASSPVSNEQFF
                                                                               nucleotide
1 CTGATTCTGGAGTCCGCCAGCACCAACCAGACATCTATGTACCTCTGTGCCAGCAGTCCGGTAAGCAATGAGCAGTTCTTCGGGCCA
  count frequencyCount estimatedNumberGenomes vFamilyName dFamilyName
1     2    9.64982e-05                      2     TCRBV28     TCRBD02
  jFamilyName  vGeneName  dGeneName  jGeneName
1     TCRBJ02 TCRBV28-01 TCRBD02-01 TCRBJ02-01

LymphoSeq documentation built on Nov. 8, 2020, 8:09 p.m.