| gc_cal | R Documentation |
This function screens contigs for regions that contain a
pre-defined set of “reference” genes (e.g., photosynthetic genes, viral genes)
arranged in a continuous block. Contigs are
first coarsely filtered by the minimum number of reference genes
they carry, then finely scanned for clusters that satisfy user-
defined density and contiguity criteria. Each detected cluster
is returned with a unique gene_cluster identifier.
gc_cal(
Data = bin_genes,
in_gene_list = photosynthesis_gene_list,
AllGeneNum = 30,
MinConSeq = 15
)
Data |
A data frame produced by |
in_gene_list |
A character vector of “reference” gene symbols (e.g.,
|
AllGeneNum |
Integer. Maximum total ORF count (annotated plus hypothetical) that the algorithm is allowed to span when defining a cluster (default: 30). |
MinConSeq |
Integer. Minimum number of reference genes that must be
present and consecutive within the candidate cluster
(default: 15). Must satisfy |
Coarse filter: Contigs with fewer than MinConSeq reference
genes are discarded.
Fine scan: For each remaining contig, the algorithm slides a
window that can encompass up to AllGeneNum consecutive ORFs
and retains windows that contain at least MinConSeq reference
genes in uninterrupted order.
Cluster labelling: Each valid cluster receives a unique ID
(genome_contig---1, genome_contig---2, …).
A data frame identical in structure to Data but filtered to
contain only those rows that belong to valid clusters. An extra
column gene_cluster (format: genome_contig---N) is added
to uniquely label every cluster.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.