fine_clustering: Perform additional clustering of sequences within groups

Description Usage Arguments Value Examples

View source: R/clustering-methods.R

Description

Perform additional clustering of sequences within groups

Usage

1
2
3
4
5
6
7
8
fine_clustering(
  ccdb,
  sequence_key,
  type,
  max_affinity = NULL,
  keep_clustering_details = FALSE,
  ...
)

Arguments

ccdb

A ContigCellDB() object

sequence_key

character naming column in contig_tbl with sequence

type

'AA' or 'DNA'

max_affinity

numeric naming the maximal affinity for the sparse affinity matrix that is constructed. Not currently used.

keep_clustering_details

logical – should output of fine_cluster_seqs be kept as a list column

...

Arguments passed on to fine_cluster_seqs

big_memory_brute

attempt to cluster more than 4000 sequences? Clustering is quadratic, so this will take a long time and might exhaust memory

method

one of 'substitutionMatrix' or 'levenshtein'

substitution_matrix

a character vector naming a substitution matrix available in Biostrings, or a substitution matrix itself

Value

ContigCellDB() object with updated contig_tbl and cluster_tbl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
library(dplyr)
data(ccdb_ex)
ccdb_ex_small = ccdb_ex
ccdb_ex_small$cell_tbl = ccdb_ex_small$cell_tbl[1:200,]
ccdb_ex_small = cdhit_ccdb(ccdb_ex_small,
sequence_key = 'cdr3_nt', type = 'DNA', cluster_name = 'DNA97',
identity = .965, min_length = 12, G = 1)
ccdb_ex_small = fine_clustering(ccdb_ex_small, sequence_key = 'cdr3_nt', type = 'DNA')

# Canonicalize with the medoid contig is probably what is most common
ccdb_medoid = canonicalize_cluster(ccdb_ex_small)

# But there are other possibilities.
# To pass multiple "AND" filter arguments must use &
ccdb_umi = canonicalize_cluster(ccdb_ex_small,
contig_filter_args = chain == 'TRA' & length > 500, tie_break_keys = 'umis',
contig_fields = c('chain', 'length'))
ccdb_umi$cluster_tbl %>% dplyr::select(chain, length) %>% summary()

CellaRepertorium documentation built on Nov. 8, 2020, 7:48 p.m.