bc_cure_cluster: Merges barcodes by editing distance

Description Usage Arguments Value Examples

Description

bc_cure_cluster performs clustering of barcodes by editing distance, and merging the barcodes with similar sequence. This function is only applicable for the BarcodeObj object with a cleanBc slot

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
bc_cure_cluster(
  barcodeObj,
  dist_thresh = 1,
  dist_method = "hamm",
  merge_method = "greedy",
  count_threshold = 1000,
  dist_costs = list(replace = 1, insert = 1, delete = 1)
)

## S4 method for signature 'BarcodeObj'
bc_cure_cluster(
  barcodeObj,
  dist_thresh = 1,
  dist_method = "hamm",
  merge_method = "greedy",
  count_threshold = 1000,
  dist_costs = list(replace = 1, insert = 1, delete = 1)
)

Arguments

barcodeObj

A BarcodeObj object.

dist_thresh

A single integer or vector of integers with the length of sample count, specifying the editing distance threshold of merging two similar barcode sequences. If the input is a vector, each value in the vector relates to one sample according to the sample order in BarcodeObj object.

dist_method

A character string, specifying the distance algorithm used for evaluating barcodes similarity. It can be "hamm" for Hamming distance or "leven" for Levenshtein distance.

merge_method

A character string specifying the algorithm used to perform the clustering merging of barcodes. Currently only "greedy" is available, in this case, the least abundant barcode is preferentially merged to the most abundant ones.

count_threshold

An integer, read depth threshold to consider a barcode as a true barcode, when when a barcode with count higher than this threshold it will not be merged into more abundant barcode.

dist_costs

A list, the cost of the events of distance algorithm, applicable when Levenshtein distance is applied. The names of vector have to be insert, delete and replace, specifying the weight of insertion, deletion, replacement events respectively. The default cost for each event is 1.

Value

A BarcodeObj object with cleanBc slot updated.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
data(bc_obj)

d1 <- data.frame(
    seq = c(
        "ACTTCGATCGATCGAAAAGATCGATCGATC",
        "AATTCGATCGATCGAAGAGATCGATCGATC",
        "CCTTCGATCGATCGAAGAAGATCGATCGATC",
        "TTTTCGATCGATCGAAAAGATCGATCGATC",
        "AAATCGATCGATCGAAGAGATCGATCGATC",
        "CCCTCGATCGATCGAAGAAGATCGATCGATC",
        "GGGTCGATCGATCGAAAAGATCGATCGATC",
        "GGATCGATCGATCGAAGAGATCGATCGATC",
        "ACTTCGATCGATCGAACAAGATCGATCGATC",
        "GGTTCGATCGATCGACGAGATCGATCGATC",
        "GCGTCCATCGATCGAAGAAGATCGATCGATC"
        ),
    freq = c(
        30, 60, 9, 10, 14, 5, 10, 30, 6, 4 , 6
        )
    )

pattern <- "([ACTG]{3})TCGATCGATCGA([ACTG]+)ATCGATCGATC"
bc_obj <- bc_extract(list(test = d1), pattern, sample_name=c("test"), 
    pattern_type=c(UMI=1, barcode=2))

# Remove barcodes with depth < 5
(bc_cured <- bc_cure_depth(bc_obj, depth=5))

# Do the clustering, merge the less abundent barcodes to the more abundent
# one by hamming distance <= 1 
bc_cure_cluster(bc_cured, dist_thresh = 1)

# Levenshtein distance <= 1
bc_cure_cluster(bc_cured, dist_thresh = 2, dist_method = "leven",
    dist_costs = list("insert" = 2, "replace" = 1, "delete" = 2))

###

wenjie1991/CellBarocde documentation built on Dec. 23, 2021, 5:11 p.m.