| pre_genomad | R Documentation |
This function automatically processes geNomad output files by detecting sample names from the directory structure and optionally integrates CheckV quality assessment results.
pre_genomad(
genomad_out_dir = "",
checkV_out_dir = NULL,
provirus = TRUE,
filter = TRUE,
checkV_out_prefix = NULL,
min_length = 1000,
min_completeness = 50
)
genomad_out_dir |
Character. Path to the geNomad output directory. This directory should contain sample-specific subdirectories with the pattern "*.contigs_summary". |
checkV_out_dir |
Character. Optional path to the CheckV output directory. If provided, quality summary will be integrated. Default is NULL. |
provirus |
Logical. Whether to identify and separate provirus sequences. Default is TRUE. |
filter |
Logical. Whether to apply quality filtering to viral sequences. Default is TRUE. |
checkV_out_prefix |
Character. Optional prefix to remove from CheckV contig IDs. |
min_length |
Numeric. Minimum sequence length for filtering. Default is 1000. |
min_completeness |
Numeric. Minimum completeness score for CheckV filtering. Default is 50. |
The function automatically detects sample names by searching for directories with the pattern "*.contigs_summary" within the genomad_out_dir. It then extracts the sample name by removing the ".contigs_summary" suffix.
An object of class "virus_res" containing four components:
sample |
Detected sample name |
virus_summary |
Integrated data frame with geNomad and optional CheckV results |
virus_genes |
Gene-level annotations from geNomad |
valid_virus |
Filtered high-quality viral sequences |
## Not run:
# Basic usage - sample name will be automatically detected
virus_results <- pre_genomad(genomad_out_dir = "~/Documents/R/Lung_virome/data/genomad_out2/")
# Access the detected sample name
sample_name <- virus_results$sample
print(paste("Detected sample:", sample_name))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.