coverage_matrix: Given a set of regions for a chromosome, compute the coverage...

View source: R/coverage_matrix.R

coverage_matrixR Documentation

Given a set of regions for a chromosome, compute the coverage matrix for a given SRA study.

Description

Given a set of genomic regions as created by expressed_regions, this function computes the coverage matrix for a library size of 40 million 100 bp reads for a given SRA study.

Usage

coverage_matrix(
  project,
  chr,
  regions,
  chunksize = 1000,
  bpparam = NULL,
  outdir = NULL,
  chrlen = NULL,
  verbose = TRUE,
  verboseLoad = verbose,
  scale = TRUE,
  round = FALSE,
  ...
)

Arguments

project

A character vector with one SRA study id.

chr

A character vector with the name of the chromosome.

regions

A GRanges-class object with regions for chr for which to calculate the coverage matrix.

chunksize

A single integer vector defining the chunksize to use for computing the coverage matrix. Regions will be split into different chunks which can be useful when using a parallel instance as defined by bpparam.

bpparam

A BiocParallelParam-class instance which will be used to calculate the coverage matrix in parallel. By default, SerialParam-class will be used.

outdir

The destination directory for the downloaded file(s) that were previously downloaded with download_study. If the files are missing, but outdir is specified, they will get downloaded first. By default outdir is set to NULL which will use the data from the web. We only recommend downloading the full data if you will use it several times.

chrlen

The chromosome length in base pairs. If it's NULL, the chromosome length is extracted from the Rail-RNA runs GitHub repository. Alternatively check the SciServer section on the vignette to see how to access all the recount data via a R Jupyter Notebook.

verbose

If TRUE basic status updates will be printed along the way.

verboseLoad

If TRUE basic status updates for loading the data will be printed.

scale

If TRUE, the coverage counts will be scaled to read counts based on a library size of 40 million reads. Set scale to FALSE if you want the raw coverage counts. The scaling method is by AUC, as in the default option of scale_counts.

round

If TRUE, the counts are rounded to integers. Set to TRUE if you want to match the defaults of scale_counts.

...

Additional arguments passed to download_study when outdir is specified but the required files are missing.

Details

When using outdir = NULL the information will be accessed from the web on the fly. If you encounter internet access problems, it might be best to first download the BigWig files using download_study. This might be the best option if you are accessing all chromosomes for a given project and/or are thinking of using different sets of regions (for example, from different cutoffs applied to expressed_regions). Alternatively check the SciServer section on the vignette to see how to access all the recount data via a R Jupyter Notebook.

If you have bwtool installed, you can use https://github.com/LieberInstitute/recount.bwtool for faster results. Note that you will need to run scale_counts after running coverage_matrix_bwtool().

Value

A RangedSummarizedExperiment-class object with the counts stored in the assays slot.

Author(s)

Leonardo Collado-Torres

See Also

download_study, findRegions, railMatrix

Examples


## Workaround for https://github.com/lawremi/rtracklayer/issues/83
download_study("SRP002001", type = "mean")
download_study("SRP002001", type = "samples")

## Define expressed regions for study DRP002835, chrY
regions <- expressed_regions("SRP002001", "chrY",
    cutoff = 5L,
    maxClusterGap = 3000L,
    outdir = "SRP002001"
)

## Now calculate the coverage matrix for this study
rse <- coverage_matrix("SRP002001", "chrY", regions, outdir = "SRP002001")

## One row per region
identical(length(regions), nrow(rse))

leekgroup/recount documentation built on Dec. 17, 2024, 4:57 p.m.