RiboseQC_analysis: Perform a Ribo-seQC analysis

View source: R/riboseq_analysis.R

RiboseQC_analysisR Documentation

Perform a Ribo-seQC analysis

Description

This function loads annotation created by the prepare_annotation_files function, and analyzes a BAM file.

Usage

RiboseQC_analysis(annotation_file, bam_files, read_subset = T,
  readlength_choice_method = "max_coverage", chunk_size = 5000000L,
  write_tmp_files = T, dest_names = NA, rescue_all_rls = FALSE,
  fast_mode = T, create_report = T, sample_names = NA,
  report_file = NA, extended_report = F, pdf_plots = T)

Arguments

annotation_file

Full path to the annotation file (*Rannot). Or, a vector with paths to one annotation file per bam file.

bam_files

character vector containing the full path to the bam files

read_subset

Select readlengths up to 99 percent of the reads, defaults to TRUE. Must be of length 1 or same length as bam_files.

readlength_choice_method

Method used to subset relevant read lengths (see choose_readlengths function); defaults to "max_coverage". Must be of length 1 or same length as bam_files.

chunk_size

the number of alignments to read at each iteration, defaults to 5000000, increase when more RAM is available. Must be between 10000 and 100000000

write_tmp_files

Should output all the results (in *results_RiboseQC_all)? Defaults to TRUE. Must be of length 1 or same length as bam_files.

dest_names

character vector containing the prefixes to use for the result output files. Defaults to same as bam_files

rescue_all_rls

Set cutoff of 12 for read lengths ignored because of insufficient coverage. Defaults to FALSE. Must be of length 1 or same length as bam_files.

fast_mode

Use only top 500 genes to build profiles? Defaults to TRUE. Must be of length 1 or same length as bam_files.

create_report

Create an html report showing the RiboseQC analysis results. Defaults to TRUE

sample_names

character vector containing the names for each sample analyzed (for the html report). Defaults to "sample1", "sample2" ...

report_file

desired filename for for the html report file. Defaults to the first entry of bam_files followed by ".html"

extended_report

creates a large html report including codon occupancy for each read length. Defaults to FALSE

pdf_plots

creates a pdf file for each produced plot. Defaults to TRUE

Details

This function loads different genomic regions created in the prepare_annotation_files step, separating features on different recognized organelles. The bam files is then analyzed in chunks to minimize RAM usage.
The complete list of analysis and output is as follows:

read_stats: contains:
read length distribution (rld) per organelle, positions containes mapping statistics on different genomic regions, reads_pos1 contains 5' end mapping positions for each read, separated by read length. counts_cds_genes: contains read mapping statistics on CDS regions of protein coding genes, including gene symbols, counts, RPKM and TPM values counts_all_genes: is a similar object, but contains statistics on all annotated genes. reads_summary: reports mapping statistics on different genomic regions and divided by read length and organelle.

profiles_fivepr contains:
five_prime_bins: a DataFrame object (one for each read length and compartment) with signal values over 50 5'UTR bins, 100 CDS bins and 50 3'UTR bins; one representative transcript (reprentative_mostcommon) is selected for each gene. five_prime_subcodon containes a similar structure, but for 25nt downstream the Transcription Start Site (TSS), 25nt upstream start codons, 33nt donwstream the start codon, 33nt in the middle of the ORF, 33nt upstream the stop codon, 25nt downstream the stop codon, and 25nt upstream the Transcription End Site (TES).

selection_cutoffs contains:
results_choice: containing the calculated cutoffs and selected readlengths, together with data about the different methods. results_cutoffs has statistics about calculated cutoffs, while analysis_frame_cutoff has extensive statistics concerning cutoff calculations and read length selection, see calc_cutoffs_from_profiles for more details.

P_sites_stats: contains the list of calculated P_sites, from all reads (P_sites_all), uniquely mapping reads (P_sites_all_uniq), or uniquely mapping reads with mismatches (P_sites_uniq_mm). junctions contains stastics on read mapping on annotated splice junctions. coverage for entire reads (no 5'ends or P_sites-transformed) on different strands and for all and uniquely mapping reads are also calculated.

profiles_P_sites contains:
P_sites_bins: profiles for each organelle and read length around binned transcript locations.
P_sites_subcodon: profiles for each organelle and read length around transcript start/ends and ORF start/ends.
Codon_counts: codon occurrences in the first 11 codons, middle 11 codons, and last 11 codons for each ORF.
P_sites_percodon: P_sites counts on each codon, separated by ORF positions as described above. Values are separated by organelle and read length.
P_sites_percodon_ratio: ratio of P_sites_percodon/Codon_counts, as a measure of P_site occupancy on each codon, divided again by organelle and read length, for different ORF positions.

sequence_analysis: contains a DataFrame object with the 50top mapping location in the genome, with the corresponding DNA sequence, number of reads mapping (also in percentage of total n of reads), and genomic feature annotation.

summary_P_sites: contains a DataFrame object summarizing the P_sites calculation and read length selection, including statistics on percentage of total reads used.

Value

the function saves a "results_RiboseQC_all" R file appended to the bam_files path including the complete list of outputs described here. In addition, bigwig files for coverage value and P_sites position is appended to the bam_files path, including also a summary of P_sites selection statistics, a smaller "results_RiboseQC" R file used for creating a dynamic html report, and a "for_SaTAnn" R object that can be used in the SaTAnn pipeline.

Author(s)

Lorenzo Calviello, calviello.l.bio@gmail.com

See Also

prepare_annotation_files, calc_cutoffs_from_profiles, choose_readlengths, create_html_report.


ohlerlab/RiboseQC documentation built on Aug. 15, 2023, 7:30 a.m.