Description Usage Arguments Value Examples
bc_cure_cluster
performs clustering of barcodes by editing distance,
and merging the barcodes with similar sequence. This function is only
applicable for the BarcodeObj object with a cleanBc
slot
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | bc_cure_cluster(
barcodeObj,
dist_thresh = 1,
dist_method = "hamm",
merge_method = "greedy",
count_threshold = 1000,
dist_costs = list(replace = 1, insert = 1, delete = 1)
)
## S4 method for signature 'BarcodeObj'
bc_cure_cluster(
barcodeObj,
dist_thresh = 1,
dist_method = "hamm",
merge_method = "greedy",
count_threshold = 1000,
dist_costs = list(replace = 1, insert = 1, delete = 1)
)
|
barcodeObj |
A BarcodeObj object. |
dist_thresh |
A single integer or vector of integers with the length of
sample count, specifying the editing distance threshold of merging two
similar barcode sequences. If the input is a vector, each value in the vector
relates to one sample according to the sample order in |
dist_method |
A character string, specifying the distance algorithm used for evaluating barcodes similarity. It can be "hamm" for Hamming distance or "leven" for Levenshtein distance. |
merge_method |
A character string specifying the algorithm used to perform the clustering merging of barcodes. Currently only "greedy" is available, in this case, the least abundant barcode is preferentially merged to the most abundant ones. |
count_threshold |
An integer, read depth threshold to consider a barcode as a true barcode, when when a barcode with count higher than this threshold it will not be merged into more abundant barcode. |
dist_costs |
A list, the cost of the events of distance algorithm,
applicable when Levenshtein distance is applied. The
names of vector have to be |
A BarcodeObj object with cleanBc slot updated.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | data(bc_obj)
d1 <- data.frame(
seq = c(
"ACTTCGATCGATCGAAAAGATCGATCGATC",
"AATTCGATCGATCGAAGAGATCGATCGATC",
"CCTTCGATCGATCGAAGAAGATCGATCGATC",
"TTTTCGATCGATCGAAAAGATCGATCGATC",
"AAATCGATCGATCGAAGAGATCGATCGATC",
"CCCTCGATCGATCGAAGAAGATCGATCGATC",
"GGGTCGATCGATCGAAAAGATCGATCGATC",
"GGATCGATCGATCGAAGAGATCGATCGATC",
"ACTTCGATCGATCGAACAAGATCGATCGATC",
"GGTTCGATCGATCGACGAGATCGATCGATC",
"GCGTCCATCGATCGAAGAAGATCGATCGATC"
),
freq = c(
30, 60, 9, 10, 14, 5, 10, 30, 6, 4 , 6
)
)
pattern <- "([ACTG]{3})TCGATCGATCGA([ACTG]+)ATCGATCGATC"
bc_obj <- bc_extract(list(test = d1), pattern, sample_name=c("test"),
pattern_type=c(UMI=1, barcode=2))
# Remove barcodes with depth < 5
(bc_cured <- bc_cure_depth(bc_obj, depth=5))
# Do the clustering, merge the less abundent barcodes to the more abundent
# one by hamming distance <= 1
bc_cure_cluster(bc_cured, dist_thresh = 1)
# Levenshtein distance <= 1
bc_cure_cluster(bc_cured, dist_thresh = 2, dist_method = "leven",
dist_costs = list("insert" = 2, "replace" = 1, "delete" = 2))
###
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.