Usage Arguments Details Value Author(s) See Also Examples
1 | read_profiling_data(ribo, ribo_index, rna = NA, rna_index = NA, genedf, cores = 1, A_site_adjust = T)
|
ribo |
sorted and indexed .bam file for ribosome profiling libraries |
ribo_index |
index file for ribosome profiling |
rna |
sorted indexed .bam file for RNA-Seq control, can be ignored defaults to NA |
rna_index |
index file for RNA-Seq control, can be ignored defaults to NA |
genedf |
a data frame with the start and stop coordinated of the ORF. DO NOT USE A GTF OR A GFF FILE. This data frame can be generated through |
cores |
integer, number of threads to use. See details. |
A_site_adjust |
Whether to adjust frames and start positions to the ribosome A site. See details. Defaults to T |
For ribosome profiling libraries the frame and is calculated by using the formula below
mapped nucleotide = 1 based read start - orf start (based on genedf)
the frame is the modulus of
frame = mapped nucleotide - ORF-start + 1 + (read length - 15)
when divided by 3. This is done to make sure that all usable read lengths (18-30) all show predominantly 0 frame when there is good phasing. However this assumes that RNAseI is used for digestion and therefore is not compatible with MNase treated samples. For those samples please set A_site_adjust=F
For RNA-Seq libraries simple coverage is calculated by using the coverage
function of Rsamtools.
Currently PSOCK multithreadding is not supported. For Windows set cores to 1. For operating systems that support forking the number of threads that are used can be set by the cores argument. If the cores>1 then the data is processed by mclapply instead of lapply.
A list of lists. The returned value is a list with two elements [[ribo]] and [[rna]] each of which are lists themselves. Within each sub-list each gene is a separate data frame. For profiling the columns are nucleotide # (- if in 5' UTR), frame, codon (- if in 5' UTR), read length and the number of reads that map for that frame and read_length. For RNA-Seq data, just nucleotide and coverage. If there is no RNA-Seq data then the RNA portion of the list is set to NULL
Alper Celik
generate_transcriptome
, phasing
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function (ribo, ribo_index, rna = NA, rna_index = NA, genedf,
cores = 1, A_site_adjust = T)
{
message("Loading ribo data")
read_data <- function(data, index) {
message("read bam file")
gr <- readGAlignments(file = data, index = index)
invisible(gc())
message("Split into individual genes")
gr <- split(gr, seqnames(gr))
invisible(gc)
return(gr)
}
get_coverage <- function(gene) {
covs <- cov_list[[gene]]
if (is.null(covs)) {
NULL
}
else {
covs <- as.data.frame(covs)
b <- genedf[genedf$gene == gene, ]
coverage <- data.frame(nucleotide = as.integer(1:nrow(covs)),
coverage = covs$value)
coverage$nucleotide <- coverage$nucleotide - b$start
coverage
}
}
get_frames <- function(gr) {
if (is.null(gr)) {
gr <- data.frame()
}
else {
b <- genedf[genedf$gene == as.character(gr@seqnames@values),
]
nucleotide <- gr@start - b$start
if (A_site_adjust) {
frames <- data.frame(frame = nucleotide - b$start +
1 + (width(gr) - 15))%%3
frames$codon <- floor((start(gr) - b$start +
(round(width(gr)/2)))/3) + 1
frames$width <- width(gr)
}
else {
frames <- data.frame(frame = ((nucleotide)%%3))
frames$codon <- floor(nucleotide/3)
}
frames$width <- width(gr)
frames$nucleotide <- frames$codon * 3 - 2
frames <- plyr::count(frames)
frames <- frames[, c(4, 1, 2, 3, 5)]
frames
}
}
ribo <- read_data(data = ribo, index = ribo_index)
message("Calling Frames")
if (cores > 1) {
ribo <- mclapply(ribo, get_frames, mc.cleanup = T, mc.cores = cores)
}
else {
ribo <- lapply(ribo, get_frames)
}
invisible(gc())
message("Done")
if (is.na(rna)) {
warning("No rna data")
coverages <- NULL
}
else {
message("Generating coverage list")
cov_list <- coverage(rna)
message("Done")
message("Split into individual genes")
genes <- genedf$gene
coverages <- lapply(X = genes, FUN = get_coverage)
names(coverages) <- genedf$gene
invisible(gc())
message("Done")
}
invisible(gc)
data <- list(ribo = ribo, rna = coverages)
invisible(data)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.