extractCoverageData: Extract read coverage data from the bigWig files

View source: R/wiggleplotr.R

extractCoverageDataR Documentation

Extract read coverage data from the bigWig files

Description

Does not work on Windows, because rtracklayer cannot read BigWig files on Windows.

Usage

extractCoverageData(
  exons,
  cdss = NULL,
  transcript_annotations = NULL,
  track_data,
  rescale_introns = TRUE,
  new_intron_length = 50,
  flanking_length = c(50, 50),
  plot_fraction = 0.1,
  mean_only = TRUE,
  region_coords = NULL
)

Arguments

exons

list of GRanges objects, each object containing exons for one transcript. The list must have names that correspond to transcript_id column in transcript_annotations data.frame.

cdss

list of GRanges objects, each object containing the coding regions (CDS) of a single transcript. The list must have names that correspond to transcript_id column in trancsript_annotations data.frame. If cdss is not specified then exons list will be used for both arguments. (default: NULL).

transcript_annotations

Data frame with at least three columns: transcript_id, gene_name, strand. Used to construct transcript labels. (default: NULL)

track_data

data.frame with the metadata for the bigWig read coverage files. Must contain the following columns:

  • sample_id - unique id for each sample.

  • track_id - if multiple samples (bigWig files) have the same track_id they will be overlayed on the same plot, track_id is also used as the facet label on the right.

  • bigWig - path to the bigWig file.

  • scaling_factor - normalisation factor for each sample, useful if different samples sequenced to different depth and bigWig files not normalised for that.

  • colour_group - additional column to group samples into, is used as the colour of the coverage track.

rescale_introns

Specifies if the introns should be scaled to fixed length or not. (default: TRUE)

new_intron_length

length (bp) of introns after scaling. (default: 50)

flanking_length

Lengths of the flanking regions upstream and downstream of the gene. (default: c(50,50))

plot_fraction

Size of the random sub-sample of points used to plot coverage (between 0 and 1). Smaller values make plotting significantly faster. (default: 0.1)

mean_only

Plot only mean coverage within each combination of track_id and colour_group values. Useful for example for plotting mean coverage stratified by genotype (which is specified in the colour_group column) (default: TRUE).

region_coords

Start and end coordinates of the region to plot, overrides flanking_length parameter. The 'both' option tends to give better results for wide regions. (default: area).

Value

List containing all of the necessary data for the plotCoverageData function ()

Examples

require("dplyr")
require("GenomicRanges")
sample_data = dplyr::data_frame(sample_id = c("aipt_A", "aipt_C", "bima_A", "bima_C"), 
    condition = factor(c("Naive", "LPS", "Naive", "LPS"), levels = c("Naive", "LPS")), 
    scaling_factor = 1) %>%
    dplyr::mutate(bigWig = system.file("extdata",  paste0(sample_id, ".str2.bw"), package = "wiggleplotr"))

track_data = dplyr::mutate(sample_data, track_id = condition, colour_group = condition)

selected_transcripts = c("ENST00000438495", "ENST00000392477") #Plot only two transcripts of the gens
## Not run: 
extractCoverageData(ncoa7_exons[selected_transcripts], ncoa7_cdss[selected_transcripts], ncoa7_metadata, track_data)

## End(Not run)


kauralasoo/wiggleplotr documentation built on July 4, 2022, 11:43 a.m.