R/USiCGs.R

#' Universal Single-Copy Genes
#'
#' Lists of Universal Single Copy Genes for Bacteria and Archaea.
#' These are useful for transforming coverages or tpms into copy numbers.
#' This is an alternative way of normalizing data in order to be able to
#' compare functional profiles in samples with different sequencing depths.
#'
#' @docType data
#'
#' @usage data(USiCGs)
#'
#' @format Character vector with the KEGG identifiers for 15 Universal Single Copy Genes.
#'
#' @keywords datasets
#'
#' @references Carr, Shen-Orr & Borenstein (2013). 
#' Reconstructing the Genomic Content of Microbiome Taxa through Shotgun Metagenomic Deconvolution
#' \emph{PLoS Comput. Biol.} \bold{9}:e1003292.
#' (\href{https://pubmed.ncbi.nlm.nih.gov/24146609/}{PubMed}).
#'
#' @source \href{https://pubmed.ncbi.nlm.nih.gov/24146609/}{Carr \emph{et al.}, 2013. Table S1}.
#'
#' @seealso \code{\link{MGOGs}} and \code{\link{MGKOs}} for an alternative set of single copy genes, and for examples on how to generate copy numbers.
#'
#'
#' @examples
#' data(Hadza)
#' data(USiCGs)
#' ### Let's look at the Universal Single Copy Gene distribution in our samples.
#' KEGG.tpm = Hadza$functions$KEGG$tpm
#' all(USiCGs %in% rownames(KEGG.tpm)) # Are all the USiCGs present in our dataset?
#' # Plot a boxplot of USiCGs tpms and calculate median USiCGs tpm.
#' # This looks weird in the test dataset because it contains only a small subset of the metagenomes.
#' # In a set of complete metagenomes USiCGs should have fairly similar TPM averages
#' # and low dispersion across samples.
#' boxplot(t(KEGG.tpm[USiCGs,]), names=USiCGs, ylab="TPM", col="slateblue2")
#'  
#' ### Now let's calculate the average copy numbers of each function.
#' # We do it for KEGG annotations here, but we could also do it for COGs or PFAMs.
#' USiCGs.cov = apply(Hadza$functions$KEGG$cov[USiCGs,], 2, median)
#' # Sample-wise division by the median USiCG coverage.
#' KEGG.copynumber = t(t(Hadza$functions$KEGG$cov) / USiCGs.cov)
"USiCGs"

Try the SQMtools package in your browser

Any scripts or data that you put into this service are public.

SQMtools documentation built on April 3, 2025, 6:16 p.m.