bc_cure_cluster | R Documentation |
bc_cure_cluster
performs clustering of barcodes by editing distance,
and remove the minority barcodes with a similar sequence. This function is only
applicable for the BarcodeObj object with a cleanBc
slot. The barcodes
with a smaller reads count will be removed.
bc_cure_cluster(
barcodeObj,
dist_threshold = 1,
depth_fold_threshold = 1,
dist_method = "hamm",
cluster_method = "greedy",
count_threshold = 1e+09,
dist_costs = list(replace = 1, insert = 1, delete = 1)
)
## S4 method for signature 'BarcodeObj'
bc_cure_cluster(
barcodeObj,
dist_threshold = 1,
depth_fold_threshold = 1,
dist_method = "hamm",
cluster_method = "greedy",
count_threshold = 1e+07,
dist_costs = list(replace = 1, insert = 1, delete = 1)
)
barcodeObj |
A BarcodeObj object. |
dist_threshold |
A single integer, or vector of integers with the length of
sample number, specifying the editing distance threshold for defining two
similar barcode sequences. If the input is a vector, each value in the vector
relates to one sample according to its order in |
depth_fold_threshold |
A single numeric or vector of numeric with the
length of sample number, specifying the depth fold change threshold of
removing the similar minority barcode. The majority of barcodes should have at
least |
dist_method |
A character string, specifying the editing distance used for evaluating barcode similarity. It can be "hamm" for Hamming distance or "leven" for Levenshtein distance. |
cluster_method |
A character string specifying the algorithm used to perform the clustering of barcodes. Currently only "greedy" is available, in this case, The most and the least abundant barcode will be used for comparing, the least abundant barcode is preferentially removed. |
count_threshold |
An integer, read depth threshold to consider a barcode as a true barcode. If a barcode with a count higher than this threshold it will not be removed, even if the barcode is similar to a more abundant one. Default is 1e9. |
dist_costs |
A list, the cost of the events of distance algorithm,
applicable when Levenshtein distance is applied. The
names of vector have to be |
A BarcodeObj object with cleanBc slot updated.
data(bc_obj)
d1 <- data.frame(
seq = c(
"ACTTCGATCGATCGAAAAGATCGATCGATC",
"AATTCGATCGATCGAAGAGATCGATCGATC",
"CCTTCGATCGATCGAAGAAGATCGATCGATC",
"TTTTCGATCGATCGAAAAGATCGATCGATC",
"AAATCGATCGATCGAAGAGATCGATCGATC",
"CCCTCGATCGATCGAAGAAGATCGATCGATC",
"GGGTCGATCGATCGAAAAGATCGATCGATC",
"GGATCGATCGATCGAAGAGATCGATCGATC",
"ACTTCGATCGATCGAACAAGATCGATCGATC",
"GGTTCGATCGATCGACGAGATCGATCGATC",
"GCGTCCATCGATCGAAGAAGATCGATCGATC"
),
freq = c(
30, 60, 9, 10, 14, 5, 10, 30, 6, 4 , 6
)
)
pattern <- "([ACTG]{3})TCGATCGATCGA([ACTG]+)ATCGATCGATC"
bc_obj <- bc_extract(list(test = d1), pattern, sample_name=c("test"),
pattern_type=c(UMI=1, barcode=2))
# Remove barcodes with depth < 5
(bc_cured <- bc_cure_depth(bc_obj, depth=5))
# Do the clustering, remove the less abundant barcodes
# one by hamming distance <= 1
bc_cure_cluster(bc_cured, dist_threshold = 1)
# Levenshtein distance <= 1
bc_cure_cluster(bc_cured, dist_threshold = 2, dist_method = "leven",
dist_costs = list("insert" = 2, "replace" = 1, "delete" = 2))
###
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.