R/msc.quality.R

Defines functions msc.quality

Documented in msc.quality

#' Check the quality of the assembly
#'
#' The msc.quality function allows you to summarize mapping statistics generated by KOMICS to assess the quality of minicircle assembly and mapping. It focuses on examining the frequencies of various mapped read categories, including read frequency, mapped read frequency, high-quality mapped read frequency, and the proportion of (near-)perfect alignments of CSB3-containing reads.
#'
#' @usage msc.quality(mapstats, groups)
#' @param mapstats a character vector containing the file names of mapping statistics generated by KOMICS. These files provide information about the mapping and quality of the assembled minicircles.
#' @param groups a vector specifying the groups (e.g., species) to which the samples belong.
#' @return 
#'   \item{all}{a table merging the mapping statistics of all samples. The mapping statistics include the number of mapped reads (MR), mapped reads with high quality (MR_HQ), CSB3-containing mapped reads (MR_CSB3), and CSB3-containing mapped reads with high quality (MR_CSB3_HQ).}
#'   \item{proportions}{a list of tables containing the proportions of the mentioned mapping statistics. It provides insights into the relative frequencies and proportions of each category.}
#'   \item{plots}{barplots visualizing the mapping statistics and proportions. These plots help visualize and compare the mapping quality across samples.}
#' @examples
#' data(exData)
#' 
#' ### run function
#' map <- msc.quality(mapstats = system.file("extdata", exData$mapstats, package = "rKOMICS"),
#'                    exData$species)
#' 
#' lapply(map$proportions, mean)$MR_HQ 
#' lapply(map$proportions, mean)$MR_CSB3_HQ
#' 
#' ### visualize results
#' barplot(map$proportions$MR)
#'
#' @importFrom utils read.table read.csv
#' @export

msc.quality <- function(mapstats, groups) {

  
  #############   0   Tests   #############
  
  
  if (sum(file.exists(mapstats))!=length(mapstats)) stop("One or more files doesn't exist")
  if (!is.factor(groups)) stop("groups should be a factor")
  if (length(mapstats) != length(groups)) stop("mapstats and groups are not of equal length")
  
  
  #############   0   Define global functions or variables   #############
  
  
  value <- V2 <- NULL
  
  
  #############   1   Calculate mapping stats   #############

  
  map <- utils::read.table(mapstats[1], sep=':', row.names = 1)
  for (i in 2:length(mapstats)) {
    map[,i] <- utils::read.table(mapstats[i], sep=':')[,2]
  }
  colnames(map) <- gsub(".map.*", "", gsub(".*/", "", mapstats))

  props <- plots <- list()
  props$MR <- apply(map, 2, function(x) x["Number of mapped reads"]/x["Number of reads"]*100)
  props$MR_HQ <- apply(map, 2, function(x) x["Number of reads with mapping quality >= 20"]/x["Number of reads"]*100)
  props$MR_CSB3 <- apply(map, 2, function(x) x["Number of mapped CSB3-containing reads"]/x["Number of CSB3-containing reads"]*100)
  props$MR_CSB3_HQ <- apply(map, 2, function(x) x["Number of CSB3-containing reads with mapping quality >= 20"]/x["Number of CSB3-containing reads"]*100)

  for (i in 1:length(props)) {
    
    melted <- melt(props[[i]])
    melted[,2] <- groups
    
    plots[[i]] <- ggplot(melted, aes(x=rownames(melted), y=value, fill=V2, col=NULL)) + 
      geom_bar(stat = "identity", alpha=0.8, show.legend = F) + 
      facet_grid(. ~ V2, space='free', scales='free') + 
      ylim(0,100) + 
      xlab('') + 
      theme_minimal() + 
      ylab('proportions of reads (%)') + 
      theme(axis.text.x = element_text(angle=90, hjust=1),
            axis.title = element_text(face="bold"))
  }
  
  names(plots) <- names(props)
  
  
  #############   2   Return results    #############

  
  return(list("all" = map,
              "proportions" = props, 
              "plots" = plots))
                
}

Try the rKOMICS package in your browser

Any scripts or data that you put into this service are public.

rKOMICS documentation built on July 9, 2023, 7:46 p.m.