read_profiling_data: Read .bam files and pre-process them for further analysis
In celalp/ribofootprintR: Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

Usage Arguments Details Value Author(s) See Also Examples

1	read_profiling_data(ribo, ribo_index, rna = NA, rna_index = NA, genedf, cores = 1, A_site_adjust = T)

`ribo`	sorted and indexed .bam file for ribosome profiling libraries
`ribo_index`	index file for ribosome profiling
`rna`	sorted indexed .bam file for RNA-Seq control, can be ignored defaults to NA
`rna_index`	index file for RNA-Seq control, can be ignored defaults to NA
`genedf`	a data frame with the start and stop coordinated of the ORF. DO NOT USE A GTF OR A GFF FILE. This data frame can be generated through `generate_transcriptome`, function or manually.
`cores`	integer, number of threads to use. See details.
`A_site_adjust`	Whether to adjust frames and start positions to the ribosome A site. See details. Defaults to T

For ribosome profiling libraries the frame and is calculated by using the formula below

mapped nucleotide = 1 based read start - orf start (based on genedf)

the frame is the modulus of

frame = mapped nucleotide - ORF-start + 1 + (read length - 15)

when divided by 3. This is done to make sure that all usable read lengths (18-30) all show predominantly 0 frame when there is good phasing. However this assumes that RNAseI is used for digestion and therefore is not compatible with MNase treated samples. For those samples please set A_site_adjust=F

For RNA-Seq libraries simple coverage is calculated by using the coverage function of Rsamtools.

Currently PSOCK multithreadding is not supported. For Windows set cores to 1. For operating systems that support forking the number of threads that are used can be set by the cores argument. If the cores>1 then the data is processed by mclapply instead of lapply.

A list of lists. The returned value is a list with two elements [[ribo]] and [[rna]] each of which are lists themselves. Within each sub-list each gene is a separate data frame. For profiling the columns are nucleotide # (- if in 5' UTR), frame, codon (- if in 5' UTR), read length and the number of reads that map for that frame and read_length. For RNA-Seq data, just nucleotide and coverage. If there is no RNA-Seq data then the RNA portion of the list is set to NULL

Alper Celik

generate_transcriptome, phasing

##---- Should be DIRECTLY executable !! ----
##-- ==>  Define data, use random,
##--	or do  help(data=index)  for the standard data sets.

## The function is currently defined as
function (ribo, ribo_index, rna = NA, rna_index = NA, genedf,
    cores = 1, A_site_adjust = T)
{
    message("Loading ribo data")
    read_data <- function(data, index) {
        message("read bam file")
        gr <- readGAlignments(file = data, index = index)
        invisible(gc())
        message("Split into individual genes")
        gr <- split(gr, seqnames(gr))
        invisible(gc)
        return(gr)
    }
    get_coverage <- function(gene) {
        covs <- cov_list[[gene]]
        if (is.null(covs)) {
            NULL
        }
        else {
            covs <- as.data.frame(covs)
            b <- genedf[genedf$gene == gene, ]
            coverage <- data.frame(nucleotide = as.integer(1:nrow(covs)),
                coverage = covs$value)
            coverage$nucleotide <- coverage$nucleotide - b$start
            coverage
        }
    }
    get_frames <- function(gr) {
        if (is.null(gr)) {
            gr <- data.frame()
        }
        else {
            b <- genedf[genedf$gene == as.character(gr@seqnames@values),
                ]
            nucleotide <- gr@start - b$start
            if (A_site_adjust) {
                frames <- data.frame(frame = nucleotide - b$start +
                  1 + (width(gr) - 15))%%3
                frames$codon <- floor((start(gr) - b$start +
                  (round(width(gr)/2)))/3) + 1
                frames$width <- width(gr)
            }
            else {
                frames <- data.frame(frame = ((nucleotide)%%3))
                frames$codon <- floor(nucleotide/3)
            }
            frames$width <- width(gr)
            frames$nucleotide <- frames$codon * 3 - 2
            frames <- plyr::count(frames)
            frames <- frames[, c(4, 1, 2, 3, 5)]
            frames
        }
    }
    ribo <- read_data(data = ribo, index = ribo_index)
    message("Calling Frames")
    if (cores > 1) {
        ribo <- mclapply(ribo, get_frames, mc.cleanup = T, mc.cores = cores)
    }
    else {
        ribo <- lapply(ribo, get_frames)
    }
    invisible(gc())
    message("Done")
    if (is.na(rna)) {
        warning("No rna data")
        coverages <- NULL
    }
    else {
        message("Generating coverage list")
        cov_list <- coverage(rna)
        message("Done")
        message("Split into individual genes")
        genes <- genedf$gene
        coverages <- lapply(X = genes, FUN = get_coverage)
        names(coverages) <- genedf$gene
        invisible(gc())
        message("Done")
    }
    invisible(gc)
    data <- list(ribo = ribo, rna = coverages)
    invisible(data)
  }

celalp/ribofootprintR documentation built on May 12, 2019, 12:04 p.m.

celalp/ribofootprintR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

celalp/ribofootprintR
Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

read_profiling_data: Read .bam files and pre-process them for further analysis
In celalp/ribofootprintR: Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to read_profiling_data in celalp/ribofootprintR...

R Package Documentation

Browse R Packages

We want your feedback!

celalp/ribofootprintR Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

read_profiling_data: Read .bam files and pre-process them for further analysis In celalp/ribofootprintR: Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to read_profiling_data in celalp/ribofootprintR...

R Package Documentation

Browse R Packages

We want your feedback!

celalp/ribofootprintR
Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments

read_profiling_data: Read .bam files and pre-process them for further analysis
In celalp/ribofootprintR: Fast data extraction and visualization from .bam files for ribosome footprint profiling experiments