seq_correct: Sequence clustering
In wenjie1991/CellBarcode: Cellular DNA Barcode Analysis toolkit

seq_correct

R Documentation

Sequence clustering

Description

This function will merge the UMIs by using the hamming distance. If two UMIs have hamming distance no more than 1, only the UMI with more reads will be kept.

Usage

seq_correct(
  seq,
  count,
  count_threshold,
  dist_threshold,
  depth_fold_threshold = 1,
  dist_method = 1L,
  insert_cost = 1L,
  delete_cost = 1L,
  replace_cost = 1L
)

Arguments

`seq`	A string vector.
`count`	An integer vector with the same order and length of UMI
`count_threshold`	An integer, barcode count threshold to consider a barcode as a true barcode, when when a barcode with count higher than this threshold it will not be removed.
`dist_threshold`	A integer, distance threshold to consider two barcodes are related.
`depth_fold_threshold`	An numeric, control the fold cange threshold between the ' major barcodes and the potential contamination that need to be removed.
`dist_method`	A integer, if 2 the levenshtein distance will be used, otherwise the hamming distance will be applied.
`insert_cost`	A integer, the insert cost when levenshtein distance is applied.
`delete_cost`	A integer, the delete cost when levenshtein distance is applied.
`replace_cost`	A integer, the replace cost when levenshtein distance is applied.

Details

This function will return the corrected UMI list.

Value

a list with two data.frame. seq_freq_tab: table with barcode and corrected ' sequence reads; link_tab: data table record for the clustering process with ' first column of barcode be removed and second column of the majority barcode barcode.

wenjie1991/CellBarcode documentation built on June 1, 2025, 11:17 p.m.