pre_gtdb_tk: Preprocess GTDB-Tk Classification Results

pre_gtdb_tkR Documentation

Preprocess GTDB-Tk Classification Results

Description

This function reads and processes the output files from a GTDB-Tk classify workflow. It combines bacterial (bac120) and archaeal (ar53) classification summaries and phylogenetic trees (if available) into a unified format.

Usage

pre_gtdb_tk(classify_dir)

Arguments

classify_dir

A character string specifying the path to the GTDB-Tk classify output directory. This directory should contain files like gtdbtk.bac120.summary.tsv, gtdbtk.bac120.classify.tree, etc.

Details

The function performs the following steps:

  1. Checks if the provided directory exists and contains the necessary ⁠*.summary.tsv⁠ files.

  2. Reads the bacterial backbone tree.

  3. If an archaeal tree file exists, it binds it to the bacterial tree.

  4. Reads and combines all ⁠*.summary.tsv⁠ files in the directory.

  5. Parses the semicolon-separated classification string into separate columns for each taxonomic rank.

  6. Ensures the resulting taxonomy table has standard ranks (Domain, Phylum, Class, Order, Family, Genus, Species).

Value

A list with two components:

gtdb_res

A data frame containing the combined taxonomic classification for all genomes. The classification column is parsed into standard taxonomic ranks (Domain to Species).

tree

A phylogenetic tree (phylo object) combining the bacterial and (if present) archaeal trees.


pctax documentation built on Feb. 9, 2026, 9:06 a.m.