Description: A manually curated arsenic functional gene database (AsgeneDB) and R package (Asgene package) are developed for rapid and accurate metagenomic analysis. Authors: Xinwei Song, Yongguan Zhu, Yongming Luo, Jianming Xu, Bin Ma*
Arsenic (As) is a kind of toxic metal-like element widely distributed in the world. To understand the microbial community of arsenic metabolism in the environment, we developed a curated arsenic functional gene database (AsgeneDB) covering five arsenic metabolic pathways (transport, respiratory, reduction, oxidative and methylation processes), 59 arsenic biotransformation functional gene families and 414773 representative sequences. Here, protein sequences for As gene families were recruited from multiple public databases such as UniProt, NCBI RefSeq, KEGG, COG, eggNOG, arCOG and KOG. AsgeneDB covers 46 phyla and 1653 genera of bacterial, archaea and fungi. It can quickly analyze the arsenic metabolism and transformation function of microbial communities by integrating multiple lineal homology databases with high specificity, comprehensiveness, representativeness and accuracy. AsgeneDB and the associated R Package will greatly promote the study of arsenic metabolism in microbial communities in various environments.
AsgeneDB.fa: Fasta format representative sequences obtained by clustering curated sequences at 100% sequence identity. This file can be used for “BLAST” searching arsenic genes in shotgun metagenomes.
asgene.map: A mapping file that maps sequence IDs to gene names, only sequences belonging to arsenic gene families are included. This file is used to generate arsenic gene profiles from BLAST-like results against the database.
id_gene_tax_pathway_total.csv: Species table of sequences in AsgeneDB.
Columns included:
length.txt: The file contains the length of amino acid sequences in AsgeneDB for standardizing arsenic gene abundance statistics.
R Studio
Depends: R ≥ 3.4.0
Imports: dplyr, seqinr
database searching tools:
You can install the development version of Asgene from GitHub with:
install.packages("devtools")
devtools::install_github("XinweiSong/Asgene")
Description: we provide Asgene Package for metagenomic alignment (nucleic acid or protein sequence), subsequent gene family abundance statistics and sample abundance standardization. The database files user needs are built into the Asgene. Therefore, users only need to choose a database search tool according to their needs (e.g., USEARCH, BLAST and DIAMOND) and input three parameters (e.g., working path, search parameters of tool and filetype) to automatically analyze statistics and output statistical results. Users can select gene abundance statistics (Option: abundance) to normalize read counts per kilobase per million reads (RPKM) to eliminate differences in sequencing depth and reference sequence length between samples. In addition, if the user selects functional species statistics (Option: taxonomy), the driveing species of each arsenic metabolism gene at different classification levels in the sample can be generated automatically.
This is a basic example which shows you how to use the package:
library(Asgene)
#Arsenic metabolism gene abundance analysis
Asgene(analysis = "abundance", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "nucl", filetype = "fasta", PE = TRUE , output = "./")
#Arsenic metabolism taxonomy analysis
Asgene(analysis = "taxonomy", workdir = "
./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "nucl", filetype = "fasta",PE = TRUE, output = "./")
#Example datasets using
Asgene(analysis = "abundance", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "prot", output = "./", test.data = TRUE)
Asgene(analysis = "taxonomy", workdir = "./", method = "diamond", toolpath = "./", search_parameters = "-e 1e-4 -p 28 --query-cover 80 --id 50",seqtype = "prot", output = "./", test.data = TRUE)
Output of As metabolic gene abundance analysis
Output of As metabolic taxonomy analysis
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.