STAR {data-navmenu="Aligning"}

# Call perl one-liner to concatenate all log files in the list
star <- fread(paste0("grep -H '' ",
                            paste(type.list$STAR, collapse=" "),
                            " | perl -F':\\s{2,}|\\s\\|\\t' -lanE'/:$/ ? next : say join qq{\t}, @F'"),
                     header = FALSE)
# Extract lines and get sample names
star <- star[V2 %like% c('of i|Uni|mu|many l|sho|oth')][, V1 := tstrsplit(V1, .Platform$file.sep)[[length(tstrsplit(V1, .Platform$file.sep))]]][, V1 := tstrsplit(V1, '[._]')[[1]]]
# Convert percentage (characters) into decimals
log.percentage <- star[V2 %like% '%'][, ':=' (V2 = toUpperFirstLetter(gsub('% of reads | reads %', '', V2)),
                                                     V3 = as.numeric(sub("%", "", V3, fixed = TRUE)) / 100)]
setnames(log.percentage, c('Sample', 'Type', 'Percentage'))
fig.height <- length(unique(as.vector(t(star[,1])))) * 0.7

##############################################################################################################################
# system.time(
#   log.list[V2 %like% c('Uni')]
# )
#  user  system elapsed
# 0.000   0.000   0.001

# system.time(
#   log.list[V2 %in% c('Uniquely mapped reads number', 'Uniquely mapped reads %')]
# )
#  user  system elapsed
# 0.003   0.000   0.003
##############################################################################################################################

Column

STAR percentage plot

# Make donut chart when there's only one sample
if (nrow(unique(log.percentage[,1])) == 1) {
  log.percentage %>%
  plot_ly(labels = ~Type, values = ~Percentage) %>%
  add_pie(hole = 0.6) %>%
  layout(title = "",  showlegend = TRUE,
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))
} else {
  p <- ggplot() +
    geom_bar(data = log.percentage, aes(x = Sample, y = Percentage, fill = Type), stat = 'identity') +
    scale_y_continuous(labels = scales::percent) + coord_flip() +
    get(paste0('scale_fill_',params$theme))()
  save_plot('STAR.tiff', p, base_height = fig.height, base_width = 11, dpi = 300, compression = 'lzw')
  save_plot('STAR.pdf', p, base_height = fig.height, base_width = 11, dpi = 300)
  ggplotly(p) %>% layout(margin = list(b = 60))
}

Column

Description

This section summarized the STAR mapping stats of reads from multiple samples. aligner parameter set in LncPipe results in different kind of summary report, and LncPipeReporter can also automatically determine the aligner from file content, which means that lncPipeReportered can be run seperatedly based a set of aligner ouputs files. Mapping status usually contains reads count that mapped or unmapped, where mapped reads can also divided into discordinate mapped, unique mapped, multiple mapped or mapped with low quality. This kind of information are necessary for evaluated the sequencing and library quality. When multiple samples are involved in analysis, this overview analysis can quickly detect batch effect or outlier samples.

An typical output of STAR log file can be found from following link STAR Log file

STAR log table

fwrite(star, 'STAR.csv')
DT::datatable(head(star, n = 80L), colnames = c('Sample', 'Type', 'Value'), extensions = 'Buttons') %>% DT::formatRound('V3', digits = 2)


bioinformatist/LncPipeReporter documentation built on May 28, 2019, 7:11 p.m.