gc_cal: Identify and Extract Gene Clusters from Scaled BLAST Data
In gclink: Gene-Cluster Discovery, Annotation and Visualization

gc_cal

R Documentation

Identify and Extract Gene Clusters from Scaled BLAST Data

Description

This function screens contigs for regions that contain a pre-defined set of “reference” genes (e.g., photosynthetic genes, viral genes) arranged in a continuous block. Contigs are first coarsely filtered by the minimum number of reference genes they carry, then finely scanned for clusters that satisfy user- defined density and contiguity criteria. Each detected cluster is returned with a unique gene_cluster identifier.

Usage

gc_cal(
  Data = bin_genes,
  in_gene_list = photosynthesis_gene_list,
  AllGeneNum = 30,
  MinConSeq = 15
)

Arguments

`Data`	A data frame produced by `orf_extract` (i.e., a scaled BLAST table). Must include the columns `genome_contig`, `gene`, and `orf_position`.
`in_gene_list`	A character vector of “reference” gene symbols (e.g., `photosynthesis_gene_list`) that are expected to appear in the target cluster(s).
`AllGeneNum`	Integer. Maximum total ORF count (annotated plus hypothetical) that the algorithm is allowed to span when defining a cluster (default: 30).
`MinConSeq`	Integer. Minimum number of reference genes that must be present and consecutive within the candidate cluster (default: 15). Must satisfy `1 <= MinConSeq <= AllGeneNum`.

Details

Coarse filter: Contigs with fewer than MinConSeq reference genes are discarded.
Fine scan: For each remaining contig, the algorithm slides a window that can encompass up to AllGeneNum consecutive ORFs and retains windows that contain at least MinConSeq reference genes in uninterrupted order.
Cluster labelling: Each valid cluster receives a unique ID (genome_contig---1, genome_contig---2, …).

Value

A data frame identical in structure to Data but filtered to contain only those rows that belong to valid clusters. An extra column gene_cluster (format: genome_contig---N) is added to uniquely label every cluster.

gclink documentation built on Sept. 9, 2025, 5:39 p.m.