four_panels: Plot expression coverage in four datasets for candidate probe...

View source: R/four_panels.R

four_panelsR Documentation

Plot expression coverage in four datasets for candidate probe sequences.

Description

four_panels creates four plots for each candidate probe sequence. The first plot (Separation) shows the adjusted read coverage in cytosolic and nuclear RNA from human postmortem cortex. The second plot (Degradation) shows coverage in human cortical samples exposed to room temperature for 0-60 minutes. The third plot (Sorted) shows RNA coverage in nuclei that had been sorted based on reactivity to NeuN-antibody, a neuronal marker. NeuN+ samples are enriched for neurons, and NeuN- samples are enriched for non-neurons. The fourth plot (Single Cells) shows the expression coverage in single cells isolated from human temporal lobe.

Usage

four_panels(
  REGION,
  PDF = "four_panels.pdf",
  OUTDIR = tempdir(),
  JUNCTIONS = FALSE,
  COVERAGE = NULL,
  CODING_ONLY = FALSE,
  VERBOSE = TRUE
)

Arguments

REGION

Either a single hg19 genomic sequence including the chromosome, start, end, and optionally strand separated by colons (e.g., 'chr20:10199446-10288068:+'), or a string of sequences. Must be character. Chromosome must be proceeded by 'chr'.

PDF

The name of the PDF file. Defaults to four_panels.pdf.

OUTDIR

The default directory where PDF will be saved to.

JUNCTIONS

A logical value indicating if the candidate probe sequence spans splice junctions (Default=FALSE).

COVERAGE

The output of brainflowprobes_cov for the input REGION. Defaults to NULL but it can be pre-computed and saved separately since this is the step that takes the longest to run. Also, this is the only step that depends on rtracklayer's functionality for reading BigWig files which does not run on Windows OS. So it could be run on a non-Windows machine, saved, and then shared with Windows users.

CODING_ONLY

A logical vector of length 1 specifying whether to subset the Annotated Genes to only the coding genes. That is, whether to subset the genes by whether they have a non-NA CSS value. The Annotated Genes are downloaded with GenomicState::GenomicStateHub().

VERBOSE

A logical value indicating whether to print updates from the process of loading the data from the BigWig files.

Value

four_panels() first annotates the input candidate probe sequence(s) in REGION using bumphunter::matchGenes(), and then cuts the expression coverage for each sequence from each sample in four different datasets (see the BrainFlow publication for references) using derfinder::getRegionCoverage(). The coverage is normalized to the total mapped reads per sample and kilobase width of each probe region before log2 transformation. The four plots are labeled by the dataset and the plots are topped by the sequence coordinates, sequence width, and the name of the nearest gene.

A good candidate probe sequence will have several characteristics. In the Separation data, the sequence should be relatively highly expressed in nuclear RNA, at least in your age of interest. The sequence should also show stable expression over the 60 minutes of room temperature exposure in the Degradation data. The sequence should also be expressed in the appropriate NeuN fraction (depending on cell type specificity) in the Sorted dataset, and also be expressed in the right cell type in the Single Cell dataset.

four_panels() saves the results as four_panels.pdf in a temporary directory unless otherwise specified with OUTDIR.

⁠if(JUNCTIONS)⁠, this means that the candidate probe sequence spans splice junctions. In this case, the character vector of regions should represent the coordinates of each exon spanned in the sequence. ⁠if(JUNCTIONS)⁠, four_panels() will sum the coverage of each exon and plot that value for each dataset instead of creating an independent set of plots for each exon. This is a way to avoid deflating coverage by including lowly-expressed intron coverage in the plots.

Author(s)

Amanda J Price

Examples


## Here we use the pre-saved example coverage data such that this example
## will run fast!
four_panels("chr20:10286777-10288069:+",
    COVERAGE = four_panels_example_cov
)
## Not run: 
## Without using COVERAGE, this function reads BigWig files from the web
## using rtracklayer and this functionality is not supported on Windows
## machines.
if (.Platform$OS.type != "windows") {
    ## This example takes 10 minutes to run!
    four_panels("chr20:10286777-10288069:+")
}

## These examples will take several minutes to run depending on your
## internet connection
four_panels(c(
    "chr20:10286777-10288069:+",
    "chr18:74690788-74692427:-",
    "chr19:49932861-49933829:-"
))

PENK_exons <- c(
    "chr8:57353587-57354496:-",
    "chr8:57358375-57358515:-",
    "chr8:57358985-57359040:-",
    "chr8:57359128-57359292:-"
)

## General syntax
four_panels(PENK_exons,
    JUNCTIONS = TRUE,
    PDF = "PDF_file.pdf", OUTDIR = "/path/to/directory/"
)

four_panels("chr20:10286777-10288069:+",
    PDF = "PDF_file.pdf", OUTDIR = "/path/to/directory/"
)


## Explore the effect of changing CODING_ONLY
## Check how gene name changes in the title of the plot
## (everything else stays the same)
cov <- brainflowprobes_cov("chr10:135379301-135379311:+")
four_panels("chr10:135379301-135379311:+", COVERAGE = cov)
four_panels("chr10:135379301-135379311:+",
    COVERAGE = cov,
    PDF = "coding_only_four_panels", CODING_ONLY = TRUE
)

## End(Not run)

LieberInstitute/brainflowprobes documentation built on Dec. 13, 2024, 11:19 p.m.