load_paired_end_qc_data: Load QC metrics data from paired-end RNA-seq processing

View source: R/rnaseq_workflow_qc.R

load_paired_end_qc_dataR Documentation

Load QC metrics data from paired-end RNA-seq processing

Description

Load QC data files generated by MultiQC for various stages of paired-end RNA-seq data processing using the automated RNA-seq data processing WDL workflow.

Usage

load_paired_end_qc_data(data_dir)

Arguments

data_dir

A string denoting the local path where all the data files are stored.

Details

The automated RNA-seq data processing workflow produces several tables with various metrics. Additionally, several tables can be exported from the MultiQC HTML report. This function will import all of these data into a list for further processing in R. The assumed data files are

  • multiqc_fastqc.txt: An auto-generated workflow file with general FASTQC statistics.

  • multiqc_hisat2.txt: An auto-generated workflow file with HISAT2 alignment statistics.

  • multiqc_rseqc_bam_stat.txt: An auto-generated workflow file with alignment statistics generated by RSeQC.

  • multiqc_rseqc_read_distribution.txt: An auto-generated workflow file with alignment categorizations to different types of genomic regions (introns, exons, integenic, etc.).

  • multiqc_salmon.txt: An auto-generated workflow file with mapping statistics from Salmon.

  • multiqc_trimmomatic.txt: An auto-generated workflow file with trimming and read quality filtering statistics using Trimmomatic.

  • rseqc_inner_distance_plot.tsv: Table exported from the MultiQC HTML report with inner distance statistics between pairs of reads.

  • salmon_plot.tsv: Table exported from the MultiQC HTML report with RNA fragment length distribution statistics as estimated by Salmon.

  • fastqc_per_base_sequence_quality_plot.tsv: Table exported from the MultiQC HTML report with per base PHRED scores statistics generated by FASTQC.

  • fastqc_per_sequence_quality_scores_plot.tsv: Table exported from the MultiQC HTML report with per sequence PHRED score statistics generated by FASTQC.

  • fastqc_per_sequence_gc_content_plot.tsv: Table exported from the MultiQC HTML report with per sequence GC content statistics generated by FASTQC.

  • fastqc_sequence_duplication_levels_plot.tsv: Table exported from the MultiQC HTML report with sequence duplication statistics generated by FASTQC.

  • fastqc_adapter_content_plot.tsv: Table exported from the MultiQC HTML report with adapter content statistics generated by FASTQC.

  • rseqc_known_junction_saturation_plot.tsv: Table exported from the MultiQC HTML report with junction saturation statistics for known splice junctions.

  • rseqc_novel_junction_saturation_plot.tsv: Table exported from the MultiQC HTML report with junction saturation statistics for novel splice junctions.

Value

A list with several elements:

  • fastqc: Data frame imported from multiqc_fastqc.txt.

  • hisat2: Data frame imported from multiqc_hisat2.txt.

  • rseqc_bam: Data frame imported from multiqc_rseqc_bam_stat.txt.

  • rseqc_alignment_category: Data frame imported from multiqc_rseqc_read_distribution.txt.

  • salmon: Data frame imported from multiqc_salmon.txt.

  • trimmomatic: Data frame imported from multiqc_trimmomatic.txt.

  • inner_dist: Data frame imported from rseqc_inner_distance_plot.tsv.

  • frag_length: Data frame imported from salmon_plot.tsv.

  • phred_bp: Data frame imported from fastqc_per_base_sequence_quality_plot.tsv.

  • phred_seq: Data frame imported from fastqc_per_sequence_quality_scores_plot.tsv.

  • gc_content: Data frame imported from fastqc_per_sequence_gc_content_plot.tsv.

  • seq_duplication: Data frame imported from fastqc_sequence_duplication_levels_plot.tsv.

  • adapter_content: Data frame imported from fastqc_adapter_content_plot.tsv.

  • known_junction: Data frame imported from rseqc_known_junction_saturation_plot.tsv.

  • novel_junction: Data frame imported from rseqc_novel_junction_saturation_plot.tsv.


bryancquach/omixjutsu documentation built on Jan. 29, 2023, 3:47 p.m.